O'Reilly logo
  • John Jansen thinks this is interesting:

sc.parallelize(data).reduceByKey((x, y) => x + y)    // Custom parallelism

From

Cover of Learning Spark

Note

Really? How? maybe this should include numPartitions?

def reduceByKey(func: (V, V) ⇒ V, numPartitions: Int): RDD[(K, V)]