O'Reilly logo

Pentaho Data Integration Cookbook Second Edition by María Carina Roldán, Adrián Sergio Pulvirenti, Alex Meadows

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading data into HBase

HBase is another component in the Hadoop ecosystem. It is a columnar database, which stores datasets based on the columns, instead of the rows that make it up. This allows for higher compression and faster searching, making columnar databases ideal for the kinds of analytical queries that can cause significant performance issues in traditional relational databases.

Note

For this recipe we will be using the Baseball Dataset loaded into Hadoop in the recipe Loading data into Hadoop, (also in this chapter). It is recommended that the recipe Loading data into Hadoop is performed before continuing.

Getting ready

In this recipe, we will be loading the Schools.csv, Master.csv, and SchoolsPlayers.csv files. The data relates (via the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required