Other than filtering, conversions, and transformations (with DataFrames which we saw in Chapter 2, Getting Started with Apache Spark DataFrames) , let's see a few more data preparation tricks in this recipe. We'll also be looking at specific data preparation in Chapter 5, Learning from Data, where we will focus on using various machine learning algorithms.
While preprocessing data, we may be required to:
We'll use the
StudentPrep2.csv datasets for the first four tasks, and ...