To get you started thinking algorithmically about the Explore-Exploit dilemma, we’re going to teach you how to code up one of the simplest possible algorithms for trading off exploration and exploitation. This algorithm is called the epsilon-Greedy algorithm. In computer science, a greedy algorithm is an algorithm that always takes whatever action seems best at the present moment, even when that decision might lead to bad long term consequences. The epsilon-Greedy algorithm is almost a greedy algorithm because it generally exploits the best available option, but every once in a while the epsilon-Greedy algorithm explores the other available options. As we’ll see, the term epsilon in the algorithm’s name refers to the odds that the algorithm explores instead of exploiting.
Let’s be more specific. The epsilon-Greedy algorithm works by randomly oscillating between Cynthia’s vision of purely randomized experimentation and Bob’s instinct to maximize profits. The epsilon-Greedy algorithm is one of the easiest bandit algorithms to understand because it tries to be fair to the two opposite goals of exploration and exploitation by using a mechanism that even a little kid could understand: it just flips a coin. While there are a few details we’ll have to iron out to make that statement precise, the big idea behind the epsilon-Greedy algorithm really is that simple: if you flip a coin and it comes up heads, you ...