O'Reilly logo
  • Yigang Zhang thinks this is interesting:

As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated by using the size of the data block read from HDFS. Note that the size of a decompressed block is often two or three times the size of the block. So if you want to have three or four tasks’ worth of working space, and the HDFS block size is 128 MB, we can estimate size of Eden to be 43,128 MB.

From

Cover of Spark: The Definitive Guide

Note

How is this 43,128MB calculated?