Chapter 12. Analytics Using Hadoop

Hadoop has come to the fore because of its capability to aid in data analytics. As data grows in the dimensions of volume, velocity, and variety, there needs to be systems that are capable of analyzing this data efficiently and effectively. Vertically scaling hardware to handle this data is not viable because it is expensive and difficult to manage. Distributed computing and horizontal scaling are good options, and frameworks such as Hadoop automatically cater to the fault tolerance, scaling, and distribution needs of such a system.

Analytics is all about data. A question that frequently arises is when does Hadoop become overkill? Typically, it is recommended that you use Hadoop for datasets of 1 TB and upwards. ...

Get Mastering Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.