How it works...

The filter() API is demonstrated using several examples. In the first example we went through an RDD and output odd numbers by using a lambda expression .filter ( i => (i%2) == 1) which takes advantage of the mod (modulus) function.

In the second example we made it a bit interesting by mapping the result to a square function using a lambda expression num.map(pow(_,2)).filter(_ %2 == 1).

In the third example, we went through the text and filtered out short lines (for example, lines under 30 character) using the lambda expression .filter(_.length < 30).filter(_.length > 0) to print short versus total number of lines (.count() ) as output.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.