Estimate Pi

We can use map/reduce to estimate the Pi. Suppose we have code like this:

import pyspark
import random
if not 'sc' in globals():
    sc = pyspark.SparkContext()
NUM_SAMPLES = 1000
def sample(p):
    x,y = random.random(),random.random()
    return 1 if x*x + y*y < 1 else 0
count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
            .map(sample) \
            .reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

This code has the same preamble. We are using the random Python package. There is a constant for the number of samples to attempt.

We are building an RDD called count. We call upon the parallelize function to split up this process over the nodes available. The code just maps the result of the sample function call. Finally, we ...

Get Learning Jupyter now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.