So what is Kafka?

Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. Apache Kafka is an open-source stream-processing project. It provides a unified, high-throughput, and is a low-latency platform for handling real-time data streams. It provides a distributed storage layer, which supports massively scalable pub/sub message queues. Kafka Connect supports data import and export by connecting to external systems. Kafka Streams provides Java APIs for stream processing. Kafka works in combination with Apache Spark, Apache Cassandra, Apache HBase, Apache Spark, and more for real-time stream processing.

Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. In November ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.