Chapter 6. Understanding Data Reduction Patterns

In the previous chapter, we learned about the various Big Data transformation techniques that dealt with transforming the structure of the data to a hierarchical representation. This was done in order to take advantage of Hadoop's capability to process semistructured data. We have seen the importance of performing normalization on the data before performing analysis on it. We then discussed using joins to denormalize the data. CUBE and ROLLUP perform multiple aggregations on the data; these aggregations provide a snapshot of the data. In the data generalization section, we discussed various generalization techniques for numerical and categorical data.

In this chapter, we will discuss design patterns ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.