Chapter 9. Integration with Apache Spark

In this chapter, we'll take a look at the following recipes:

  • Running Spark standalone
  • Running Spark on YARN
  • Performing Olympics Athletes analytics using the Spark Shell
  • Creating Twitter trending topics using Spark Streaming
  • Analyzing Parquet files using Spark
  • Analyzing JSON data using Spark
  • Processing graphs using Graph X
  • Conducting predictive analytics using Spark MLib

Introduction

In the previous chapter, we talked about how to use Mahout and R to solve machine learning problems. In this chapter, we are going to talk about the latest sensation in the Big Data industry called Apache Spark. By now, everyone is aware, and has acknowledged the power of Apache Spark. This is a general and fast engine that processes ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.