Designing Real-Time Streaming Data Pipelines

The first three chapters of this book all dealt with batch data. Having learned about the installation of Hadoop, data ingestion tools and techniques, and data stores, let's turn to data streaming. Not only will we look at how we can handle real-time data streams, but also how to design pipelines around them.

In this chapter, we will cover the following topics:

  • Real-time streaming concepts
  • Real-time streaming components
  • Apache Flink versus Spark
  • Apache Spark versus Storm

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.