Data ingestion in Hadoop

In Hadoop, storage is never an issue, but managing the data is the driven force around which different solutions can be designed differently with different systems, hence managing data becomes extremely critical. A better manageable system can help a lot in terms of scalability, reusability, and even performance. In a Hadoop ecosystem, we have two widely used tools: Sqoop and Flume, both can help manage the data and can import and export data efficiently, with a good performance. Sqoop is usually used for data integration with RDBMS systems, and Flume usually performs better with streaming log data.

We will cover the concept of Sqoop and Flume in greater detail in Chapter 6, Data Ingestion in Hadoop—Sqoop and Flume.

Get Hadoop Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.