CHAPTER 11

image

Data Processing Using Pig

So far in this book, we have explored how to develop MapReduce programs using Java. Chapter 10 introduced Hive, the SQL engine on top of the HDFS. You learned how the Hive compiler converts high–level SQL commands into MapReduce programs, which avoids having to write low–level Java programs; you can instead focus on high–level business requirements. Hive is suitable for BI developers who want to treat the HDFS as a data warehouse.

This chapter focuses on another type of developer: the ETL developer who sees data as flowing through a complex data pipeline. SQL is a declarative language, and an ETL developer ...

Get Pro Apache Hadoop, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.