Building your Spark job with Maven

Maven is an open source Apache project that builds the Spark jobs in Java or Scala. As of Version 2.0.0, the building Spark site states that Maven is the official recommendation for packaging Spark and is the "build of reference" too. As with sbt, you can include the Spark dependency through Maven Central, simplifying our build process. Also, similar to sbt is the ability of Spark and all of our dependencies to put everything in a single JAR file using a plugin or build Spark as a monolithic JAR file using the sbt/sbt assembly for inclusion.

To illustrate the build process for the Spark jobs with Maven, this section will use Java as an example, as Maven is more commonly used to build the Java tasks. As a first ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.