In this section, we will implement the examples from Chapter 3, Processing – MapReduce and Beyond, using the Scala API. We will consider both the batch and real-time processing scenarios. We will show you how Spark Streaming can be used to compute statistics on the live Twitter stream.
Scala source code for the examples can be found at https://github.com/learninghadoop2/book-examples/tree/master/ch5. We will be using
sbt to build, manage, and execute code.
build.sbt file controls the codebase metadata and software dependencies; these include the version of the Scala interpreter that Spark links to, a link to the Akka package repository used to resolve implicit dependencies, as ...