Sorted word count

Using the same script with a slight modification, we can make one more call and have sorted results. The script now looks like this:

import pyspark
if not 'sc' in globals():
    sc = pyspark.SparkContext()
text_file = sc.textFile("Spark File Words.ipynb")
sorted_counts = text_file.flatMap(lambda line: line.split(" ")) \
            .map(lambda word: (word, 1)) \
            .reduceByKey(lambda a, b: a + b) \
            .sortByKey()
for x in sorted_counts.collect():
    print x

Here, we have added another function call to the RDD creation, sortByKey(). So, after we have map/reduced and arrived at list of words and occurrence, we can easily sort the results.

The resultant output looks like this:

Get Learning Jupyter now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Jupyter by Dan Toomey

Sorted word count

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly