Cover by Joseph Adler

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

R and Hadoop

Over the past few years, Hadoop has become the de facto standard for processing big data. For many people, Hadoop is Big Data. You may have heard of Hadoop. But you may not know what it is, what it’s good for, and how you can you use it with R. That’s what this section is all about.

Overview of Hadoop

Hadoop is a system for working with huge data sets. Facebook uses it to store photos, LinkedIn uses it to generate recommendations, and Amazon uses it to generate search indexes. It’s a very useful system to use when you have a very large amount of data.

Hadoop is a system that lets you store a lot of data and solve really big problems. It works by connecting many different computers together, but it lets you work with them as if they were one giant computer. Working with parallel and distributed systems is tricky and complicated; Hadoop hides a lot of complexity from you so that you can worry about solving your problem.

In terms of the laundry analogy above, Hadoop is like a commercial laundry service. You give the service many loads of dirty laundry, and it sends you back bags of clean laundry the next day.

Map/Reduce

To help make it easier to write efficient parallel programs, Hadoop uses a model called Map/Reduce to process large amounts of data. Many common data processing tasks (including filtering data, merging data, and aggregating data) fit easily into Map/Reduce. Many (but not all) mathematical and machine learning algorithms can also use the Map/Reduce framework. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required