DataFrame operations

In the previous section of this chapter, we learnt many different ways of creating DataFrames. In this section, we will focus on various operations that can be performed on DataFrames. Developers chain multiple operations to filter, transform, aggregate, and sort data in the DataFrames. The underlying Catalyst optimizer ensures efficient execution of these operations. These functions you find here are similar to those you commonly find in SQL operations on tables:

Python:

//Create a local collection of colors first >>> colors = ['white','green','yellow','red','brown','pink'] //Distribute the local collection to form an RDD //Apply map function on that RDD to get another RDD containing colour, length tuples and convert that ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.