Chapter 7. Going Further

In this chapter, we will cover the following recipes:

  • Using Spark Streaming to subscribe to a Twitter stream
  • Using Spark as an ETL tool (pulling data from ElasticSearch and publishing it to Kafka)
  • Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream
  • Using GraphX to analyze Twitter data
  • Watching other Scala libraries of interest

Introduction

So far, the entire book has concentrated a little around Breeze and a lot around Spark, specifically DataFrames and machine learning. However, there are a whole lot of other libraries, both in Java and Scala that could be leveraged while analyzing data from Scala. This chapter goes a little more into Spark's other components, streaming and GraphX. ...

Get Scala: Guide for Data Science Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.