The first three chapters of this book all dealt with batch data. Having learned about the installation of Hadoop, data ingestion tools and techniques, and data stores, let's turn to data streaming. Not only will we look at how we can handle real-time data streams, but also how to design pipelines around them.
In this chapter, we will cover the following topics:
- Real-time streaming concepts
- Real-time streaming components
- Apache Flink versus Spark
- Apache Spark versus Storm