Summary

Hadoop is a very useful tool for big data transformation and processing. It can come in handy at almost all the stages of the data analytics workflow. Data analytics is not about the algorithms but more about the data. Larger data can yield almost two-fold improvements in prediction. A data scientist should worry more about the cleansing, transformation, feature engineering, and validation of results rather than the actual algorithm that will be used to do the analysis. This does not mean that the analysis algorithm choice is not important. Instead, it means that there are other players that are equally important and vital for healthy decision making.

In this chapter, the key takeaways are as follows:

  • Hadoop is generally used for analytics ...

Get Mastering Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.