Summary

In this chapter, we started with a detailed understanding of real-time stream processing concepts, including data stream, batch vs. real-time processing, CEP, low latency, continuous availability, horizontal scalability, storage, and so on. Later, we learned about Apache Kafka, which is a very important component of modern real-time stream data pipelines. The main features of Kafka are scalability, durability, reliability, and high throughput.

We also learned about Kafka Connect; its architecture, data flow, sources, and connectors. We studied case studies to design a data pipeline with Kafka Connect using file source, file Sink, JDBC source, and file Sink Connectors.

In the later sections, we learned about various open source real-time ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.