It should be noted that not all regression forms have a closed form formula or become very inefficient (that is, impractical) with a large number of parameters on large datasets - this is the reason we use optimization techniques such as SGD or L-BFGS.
It is critical to recall from the previous recipes that you should make sure you cache any RDD or data structure associated with machine learning algorithms to avoid lazy instantiation due to the way Spark optimizes and maintains lineage (that is, lazy instantiation).