Simple DataFrame queries

Now that you have created the swimmersJSON DataFrame, we will be able to run the DataFrame API, as well as SQL queries against it. Let's start with a simple query showing all the rows within the DataFrame.

DataFrame API query

To do this using the DataFrame API, you can use the show(<n>) method, which prints the first n rows to the console:

Tip

Running the.show() method will default to present the first 10 rows.

# DataFrame API
swimmersJSON.show()

This gives the following output:

DataFrame API query

SQL query

If you prefer writing SQL statements, you can write the following query:

spark.sql("select * from swimmersJSON").collect()

This will give the following ...

Get Learning PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.