Get started with Hadoop 2 and importing data with Sqoop and Flume
Apache Hadoop is an open source framework for distributed storage and processing of Big Data. Apache Sqoop and Apache Flume, one of the many widely used components of the famed Hadoop ecosystem, are used to import data into Hadoop from external sources.
We first begin by setting up Hadoop by downloading, installing, and configuring it and exploring Hue, which is an interface for analyzing Hadoop data and a very useful tool. We then move on to learning about importing data in Hadoop, where we first learn how to manually import data. Next, we learn how to import databases using Apache Sqoop. Finally, we move on to importing real-time data and streaming data, using Apache Flume, into Hadoop.
By the end of this Learning Path, you will have learnt all about configuring and setting up the Hadoop framework from scratch, as well as importing data into the Hadoop distributed storage environment from various sources, so as to get started with processing the data.
Prerequisites: Basic knowledge of the Elastic stack and Elasticsearch.
Resources: Code Downloads: