Apache Spark programming

Apache Spark has very good programming language support. It provides first-class support for Java, Scala, Python, and R programming languages. Even though the data structures and operators that are available with the programming languages are similar in nature, we have to use programming-language-specific constructs to achieve the desired logic. Throughout this chapter, we will use Python as the programming language of choice. However, Spark itself is agnostic to these programming languages and produces the same results regardless of the programming language used.

Apache Spark with Python can be used in two different ways. The first way is to launch the pyspark interactive shell, which helps us run Python instructions. ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.