Rationale for using Pig in data profiling

Implementing the profiling code within the Hadoop environment reduces the dependency on external systems for quality checks. The high-level overview of implementation is depicted in the following diagram:

Rationale for using Pig in data profiling

Implementing profiling in Pig

The following are the advantages of performing data profiling within the Hadoop environment using Pig:

  • Implementing the design patterns in Pig reduces data movement by moving the profiling code directly to the data, resulting in performance gains and speeding up the analytics development process.
  • By implementing the pattern in Pig, the data quality effort is performed alongside ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.