Chapter 7

Big Data Tools and Techniques

This chapter discusses and provides a high-level overview of the big data tool ecosystem, delving into details using Hadoop as the example. This chapter provides an overview of high performance architecture, and then discusses aspects of the way that different aspects of Hadoop and associated tools address application development and deployment needs. The chapter discusses the Hadoop Distributed File System (HDFS), introduces the programming model provided by MapReduce and YARN, and then walks through associated tools such as Zookeeper (used for synchronization and control), the table-based data management scheme defined using HBase, an alternate data management scheme called Hive that can be used for ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.