In this recipe, we use admission data from the UCI Machine Library Repository to build and then train a model to predict student admissions based on a given set of features (GRE, GPA, and Rank) used during the admission process using the RDD-based LogisticRegressionWithSGD() Apache Spark API set.
This recipe demonstrates both optimization (SGD) and regularization (penalizing the model for complexity or over-fitting). We emphasize that they are two different things and often cause confusion to beginners. In the upcoming chapter, we demonstrate both concepts in more detail since understanding both is fundamental to a successful study of ML.