O'Reilly logo
  • Danyel Lawson thinks this is interesting:

Example 3-58. Round trip through RDD to cut query plan
    val rdd = df.rdd
    rdd.cache()
    sqlCtx.createDataFrame(rdd, df.schema)

From

Cover of High Performance Spark

Note

use this instead of parquet save and then load to get a spark job to complete successfully - I think it frees up memory but without requiring the write