O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Spark ecosystem

Apache Spark powers a number of tools, both as a library and as an execution engine.

Spark Streaming

Spark Streaming (found at http://spark.apache.org/docs/latest/streaming-programming-guide.html) is an extension of the Scala API that allows data ingestion from streams such as Kafka, Flume, Twitter, ZeroMQ, and TCP sockets.

Spark Streaming receives live input data streams and divides the data into batches (arbitrarily sized time windows), which are then processed by the Spark core engine to generate the final stream of results in batches. This high-level abstraction is called DStream (org.apache.spark.streaming.dstream.DStreams) and is implemented as a sequence of RDDs. DStream allows for two kinds of operations: transformations ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required