Who this book is for

This book is for people responsible for implementing the automatic movement of data from various systems into a Hadoop cluster. If it is your job to load data into Hadoop on a regular basis, this book should help you code yourself out of manual monkey-work or from writing a custom tool you’ll be supporting for as long as you work at your company.

Only basic Hadoop knowledge of HDFS is required. Some custom implementations are covered should your needs necessitate it. For this level of implementation, you will need to know how to program in Java.

Finally, you’ll need your favorite text editor since most of this book covers how to configure various Flume components via the agent’s text configuration file.

Get Apache Flume: Distributed Log Collection for Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.