Sorted word count

Using the same script with a slight modification, we can make one more call and have sorted results. The script now looks like this:

import pyspark
if not 'sc' in globals():
    sc = pyspark.SparkContext()
text_file = sc.textFile("Spark File Words.ipynb")
sorted_counts = text_file.flatMap(lambda line: line.split(" ")) \
            .map(lambda word: (word, 1)) \
            .reduceByKey(lambda a, b: a + b) \
            .sortByKey()
for x in sorted_counts.collect():
    print x

Here, we have added another function call to the RDD creation, sortByKey(). So, after we have map/reduced and arrived at list of words and occurrence, we can easily sort the results.

The resultant output looks like this:

Get Learning Jupyter now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.