Operating on DataFrames programmatically without SQL

In this recipe, we explore how to manipulate DataFrame with code and method calls only (without SQL). The DataFrames have their own methods that allow you to perform SQL-like operations using a programmatic approach. We demonstrate some of these commands such as select(), show(), and explain() to get the point across that the DataFrame itself is capable of wrangling and manipulating the data without using SQL.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.