Batch data analytics

Now let's start looking at the implementation of batch data analytics. Batch data analytics consists of two important elements:

  1. Loading streams of sensor data from Kafka topics to HDFS.
  2. Using Hive to perform analytics on inserted data.

Loading streams of sensor data from Kafka topics to HDFS

Let's assume that the sensors are enabled to write data to Kafka topics. Microcomputers such as the Raspberry Pi can be used to develop the interface between sensors and Kafka. In this section, we are going to see how we get the data from Kafka topics and write it to the HDFS folder.

To import the data from Kafka, first you need to have Kafka running on your machine. The following command starts Kafka and Zookeeper:

bin/zookeeper-server-start.sh ...

Get Hadoop Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.