Book description
Design and implement a series of Flume agents to send streamed data into Hadoop
In Detail
Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.
This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.
A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features.
What You Will Learn
- Understand the Flume architecture, and also how to download and install open source Flume from Apache
- Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
- Learn tips and tricks for transporting logs and data in your production environment
- Understand and configure the Hadoop File System (HDFS) Sink
- Use a morphline-backed Sink to feed data into Solr
- Create redundant data flows using sink groups
- Configure and use various sources to ingest data
- Inspect data records and move them between multiple destinations based on payload content
- Transform data en-route to Hadoop and monitor your data flows
Table of contents
-
Apache Flume: Distributed Log Collection for Hadoop Second Edition
- Table of Contents
- Apache Flume: Distributed Log Collection for Hadoop Second Edition
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Overview and Architecture
- 2. A Quick Start Guide to Flume
- 3. Channels
- 4. Sinks and Sink Processors
- 5. Sources and Channel Selectors
- 6. Interceptors, ETL, and Routing
- 7. Putting It All Together
- 8. Monitoring Flume
- 9. There Is No Spoon – the Realities of Real-time Distributed Data Collection
- Index
Product information
- Title: Apache Flume: Distributed Log Collection for Hadoop - Second Edition
- Author(s):
- Release date: February 2015
- Publisher(s): Packt Publishing
- ISBN: 9781784392178
You might also like
book
Apache Flume: Distributed Log Collection for Hadoop
Stream data to Hadoop using Apache Flume Integrate Flume with your data sources Transcode your data …
book
Linux® Bible 2011 Edition: Boot up to Ubuntu®, Fedora®, KNOPPIX, Debian®, openSUSE®, and 13 Other Distributions
The most up-to-date guide on the latest version of Linux Linux is an excellent, low-cost alternative …
book
Expert Hadoop® Administration
The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop …
video
Building Apache HBase Applications
In this Building Apache HBase Applications training course, expert author Jonathan Hsieh will teach you how …