Part 4. Data science

The ultimate challenge to working with Hadoop and big data is how to mine useful information about your data. The objective of this part of the book is to present techniques to address nontrivial questions asked about your data, and to create new insights into your data.

Data modeling and algorithms are the pillars on which data science is built, and chapter 7 examines how graphs can be represented and utilized in Map-Reduce to implement algorithms such as Friends-of-Friends and PageRank.

R is a tool data scientists use that has attained popularity through its large array of statistical and data-mining packages. Chapter 8 explores how R and Map-Reduce can work in concert to quickly bring data scientists to the Hadoop table. ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.