Chapter 3. Writing Spark applications

This chapter covers

  • Generating a new Spark project in Eclipse
  • Loading a sample dataset from the GitHub archive
  • Writing an application that analyzes GitHub logs
  • Working with DataFrames in Spark
  • Submitting your application to be executed

In this chapter, you’ll learn to write Spark applications. Most Spark programmers use an integrated development environment (IDE), such as IntelliJ or Eclipse. There are readily available resources online that describe how to use IntelliJ IDEA with Spark, whereas Eclipse resources are still hard to come by. That is why, in this chapter, you’ll learn how to use Eclipse for writing Spark programs. Nevertheless, if you choose to stick to IntelliJ, you’ll still be able ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.