O'Reilly logo

Pentaho Data Integration Cookbook Second Edition by María Carina Roldán, Adrián Sergio Pulvirenti, Alex Meadows

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading data into Hadoop

Hadoop is at the heart of the Big Data movement. Being derived from Google's white papers on MapReduce and Google File System, Hadoop is able to scale up beyond petabytes of data and provide the backbone for fast and effective data analysis.

Pentaho was one of the first companies to provide support for Hadoop and has open sourced those capabilities, along with steps for other Big Data sources.

Note

There are a lot of great tutorials and videos on Pentaho's Big Data wiki available at http://wiki.pentaho.com/display/BAD/Pentaho+Big+Data+Community+Home.

Getting ready

Before we actually try to connect to Hadoop, we have to set up an appropriate environment. Companies like Hortonworks and Cloudera have been at the forefront of providing ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required