HBase is another component in the Hadoop ecosystem. It is a columnar database, which stores datasets based on the columns, instead of the rows that make it up. This allows for higher compression and faster searching, making columnar databases ideal for the kinds of analytical queries that can cause significant performance issues in traditional relational databases.
For this recipe we will be using the Baseball Dataset loaded into Hadoop in the recipe Loading data into Hadoop, (also in this chapter). It is recommended that the recipe Loading data into Hadoop is performed before continuing.
In this recipe, we will be loading the
SchoolsPlayers.csv files. The data relates (via the ...