Join transformation with paired key-value RDDs

In this recipe, we introduce the KeyValueRDD pair RDD and the supporting join operations such as join(), leftOuterJoin and rightOuterJoin(), and fullOuterJoin() as an alternative to the more traditional and more expensive set operations available via the set operation API, such as intersection(), union(), subtraction(), distinct(), cartesian(), and so on.

We'll demonstrate join(), leftOuterJoin and rightOuterJoin(), and fullOuterJoin(), to explain the power and flexibility of key-value pair RDDs.

println("Full Joined RDD = ") 
val fullJoinedRDD = keyValueRDD.fullOuterJoin(keyValueCity2RDD) 
fullJoinedRDD.collect().foreach(println(_)) 

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.