O'Reilly logo

Scala Data Analysis Cookbook by Arun Manivannan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preparing data in Dataframes

Other than filtering, conversions, and transformations (with DataFrames which we saw in Chapter 2, Getting Started with Apache Spark DataFrames) , let's see a few more data preparation tricks in this recipe. We'll also be looking at specific data preparation in Chapter 5, Learning from Data, where we will focus on using various machine learning algorithms.

How to do it...

While preprocessing data, we may be required to:

  • Merge two different datasets
  • Perform set operations on two datasets
  • Sort the DataFrame by casting an attribute value
  • Choose a member from one dataset over another based on the predicate
  • Parse arbitrary date/time inputs

We'll use the StudentPrep1.csv and StudentPrep2.csv datasets for the first four tasks, and ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required