O'Reilly logo

Apache Kafka by Nishant Garg

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Kafka integration with Hadoop

Resource sharing, stability, availability, and scalability are a few of the many challenges of distributed computing. Nowadays, an additional challenge is to process extremely large volumes of data in TBs or PBs.

Introduction to Hadoop

Hadoop is a large-scale distributed batch processing framework which parallelizes data processing across many nodes and addresses the challenges for distributed computing, including big data.

Hadoop works on the principle of the MapReduce framework (introduced by Google), which provides a simple interface for the parallelization and distribution of large-scale computations. Hadoop has its own distributed data filesystem called HDFS (Hadoop Distributed File System). In any ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required