Who this book is for

This book is for people responsible for implementing the automatic movement of data from various systems to a Hadoop cluster. If it is your job to load data into Hadoop on a regular basis, this book should help you to code yourself out of manual monkey work or from writing a custom tool you'll be supporting for as long as you work at your company.

Only basic knowledge of Hadoop and HDFS is required. Some custom implementations are covered, should your needs necessitate them. For this level of implementation, you will need to know how to program in Java.

Finally, you'll need your favorite text editor, since most of this book covers how to configure various Flume components via an agent's text configuration file.

Get Apache Flume: Distributed Log Collection for Hadoop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.