Loading data into Hadoop

Hadoop is at the heart of the Big Data movement. Being derived from Google's white papers on MapReduce and Google File System, Hadoop is able to scale up beyond petabytes of data and provide the backbone for fast and effective data analysis.

Pentaho was one of the first companies to provide support for Hadoop and has open sourced those capabilities, along with steps for other Big Data sources.

Note

There are a lot of great tutorials and videos on Pentaho's Big Data wiki available at http://wiki.pentaho.com/display/BAD/Pentaho+Big+Data+Community+Home.

Getting ready

Before we actually try to connect to Hadoop, we have to set up an appropriate environment. Companies like Hortonworks and Cloudera have been at the forefront of providing ...

Get Pentaho Data Integration Cookbook Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.