Understanding how optimization works is fundamental for a successful career in machine learning. We picked the Gradient Descent (GD) method for an end-to-end deep dive to demonstrate the inner workings of an optimization technique. We will develop the concept using three recipes that walk the developer from scratch to a fully developed code to solve an actual problem with real-world data. The fourth recipe explores an alternative to GD using Spark and normal equations (limited scaling for big data problems) to solve a regression problem.
Let's get started. How does a machine learn anyway? Does it really learn from its mistakes? What does it mean when the machine finds a solution using optimization?
At a high level, machines learn ...