Summary

In this chapter, you have studied various Big Data validation and cleansing techniques that deal with the detection and cleansing of incorrect or inaccurate records from the data. These techniques ensure that the inconsistencies in the data are identified by validating the data against a set of rules before the data is used in the analytics process, and then the inconsistent data is replaced, modified, or deleted as per the business rule to make it more consistent. In this chapter, we build upon our learnings from the previous chapter on data profiling.

In the next chapter, we will focus on the data transformation patterns that can be applied to a variety of data formats. After reading this chapter, readers will be able to choose the right ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.