Currently one of the hottest projects across the Hadoop ecosystem, Apache Kafka is a distributed, real-time data system that functions in a manner similar to a pub/sub messaging service, but with better throughput, built-in partitioning, replication, and fault tolerance. In this video course, host Gwen Shapira from Cloudera shows developers and administrators how to integrate Kafka into a data processing pipeline.
You’ll start with Kafka basics, walk through code examples of Kafka producers and consumers, and then learn how to integrate Kafka with Hadoop. By the end of this course, you’ll be ready to use this service for large-scale log collection and stream processing.
Gwen Shapira is a software engineer at Cloudera with 15 years of experience working with customers to design scalable data architectures. Working as a data warehouse DBA, ETL developer, and a senior consultant, Gwen specializes in building scalable data processing pipelines and integrating existing data systems with Hadoop. She’s a committer to Apache Sqoop and an active contributor to Apache Kafka.