Step 4 - Prepare training and test rating data and check the counts

The following code splits the ratings RDD into training data RDD (75%) and test data RDD (25%). Seed here is optional but is required for reproducibility purposes:

// Split ratings RDD into training RDD (75%) & test RDD (25%) 
val splits = ratingsDF.randomSplit(Array(0.75, 0.25), seed = 12345L) 
val (trainingData, testData) = (splits(0), splits(1)) 
val numTraining = trainingData.count() 
val numTest = testData.count() 
println("Training: " + numTraining + " test: " + numTest)

You should notice that there are 78,792 ratings in training and 26,547 ratings in the test DataFrame.

