O'Reilly logo
  • Richard Williams thinks this is interesting:

In systems like Hadoop MapReduce, developers often have to spend a lot of time considering how to group together operations to minimize the number of MapReduce passes. In Spark, there is no substantial benefit to writing a single complex map instead of chaining together many simple operations. Thus, users are free to organize their program into smaller, more manageable operations.


Cover of Learning Spark


Once an evaluation of an RDD begins, does the Spark framework optimize the evaluation, i.e. if the operations were xRDD = yRDD + const, zRDD = xRDD - const, optimizer would simply provide zRDD = yRDD as the evaluation and disregard the two intermediate operations?