There's more...

It should be noted that not all regression forms have a closed form formula or become very inefficient (that is, impractical) with a large number of parameters on large datasets - this is the reason we use optimization techniques such as SGD or L-BFGS.

It is critical to recall from the previous recipes that you should make sure you cache any RDD or data structure associated with machine learning algorithms to avoid lazy instantiation due to the way Spark optimizes and maintains lineage (that is, lazy instantiation).

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.