You are previewing Bandit Algorithms for Website Optimization.

Bandit Algorithms for Website Optimization

Cover of Bandit Algorithms for Website Optimization by John Myles White Published by O'Reilly Media, Inc.
  1. Bandit Algorithms for Website Optimization
  2. Preface
    1. Finding the Code for This Book
    2. Dealing with Jargon: A Glossary
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
    7. Acknowledgments
  3. 1. Two Characters: Exploration and Exploitation
    1. The Scientist and the Businessman
      1. Cynthia the Scientist
      2. Bob the Businessman
      3. Oscar the Operations Researcher
    2. The Explore-Exploit Dilemma
  4. 2. Why Use Multiarmed Bandit Algorithms?
    1. What Are We Trying to Do?
    2. The Business Scientist: Web-Scale A/B Testing
  5. 3. The epsilon-Greedy Algorithm
    1. Introducing the epsilon-Greedy Algorithm
    2. Describing Our Logo-Choosing Problem Abstractly
      1. What’s an Arm?
      2. What’s a Reward?
      3. What’s a Bandit Problem?
    3. Implementing the epsilon-Greedy Algorithm
    4. Thinking Critically about the epsilon-Greedy Algorithm
  6. 4. Debugging Bandit Algorithms
    1. Monte Carlo Simulations Are Like Unit Tests for Bandit Algorithms
    2. Simulating the Arms of a Bandit Problem
    3. Analyzing Results from a Monte Carlo Study
      1. Approach 1: Track the Probability of Choosing the Best Arm
      2. Approach 2: Track the Average Reward at Each Point in Time
      3. Approach 3: Track the Cumulative Reward at Each Point in Time
    4. Exercises
  7. 5. The Softmax Algorithm
    1. Introducing the Softmax Algorithm
    2. Implementing the Softmax Algorithm
    3. Measuring the Performance of the Softmax Algorithm
    4. The Annealing Softmax Algorithm
    5. Exercises
  8. 6. UCB – The Upper Confidence Bound Algorithm
    1. Introducing the UCB Algorithm
    2. Implementing UCB
    3. Comparing Bandit Algorithms Side-by-Side
    4. Exercises
  9. 7. Bandits in the Real World: Complexity and Complications
    1. A/A Testing
    2. Running Concurrent Experiments
    3. Continuous Experimentation vs. Periodic Testing
    4. Bad Metrics of Success
    5. Scaling Problems with Good Metrics of Success
    6. Intelligent Initialization of Values
    7. Running Better Simulations
    8. Moving Worlds
    9. Correlated Bandits
    10. Contextual Bandits
    11. Implementing Bandit Algorithms at Scale
  10. 8. Conclusion
    1. Learning Life Lessons from Bandit Algorithms
    2. A Taxonomy of Bandit Algorithms
    3. Learning More and Other Topics
  11. Colophon
  12. Copyright
O'Reilly logo

Chapter 6. UCB – The Upper Confidence Bound Algorithm

Introducing the UCB Algorithm

The algorithms we’ve presented so far have one systematic weakness: they don’t keep track of how much they know about any of the arms available to them. They pay attention only to how much reward they’ve gotten from the arms. This means that they’ll underexplore options whose initial experiences were not rewarding, even though they don’t have enough data to be confident about those arms. We can do better by using an algorithm that pays attention to not only what it knows, but also how much it knows.

The algorithm, UCB, that we’ll present in this chapter does exactly this. Before we describe how the UCB algorithm keeps track of how much it knows, let’s look back at the epsilon-Greedy and Softmax algorithms and take a more abstract perspective on them. Both the epsilon-Greedy algorithm and the Softmax algorithm share the following broad properties:

  • The algorithm’s default choice is to select the arm that currently has the highest estimated value.
  • The algorithm sometimes decides to explore and chooses an option that isn’t the one that currently seems best:

    • The epsilon-Greedy algorithm explores by selecting from all of the arms completely at random. It makes one of these random exploratory decisions with probability epsilon.
    • The Softmax algorithm explores by randomly selecting from all of the available arms with probabilities that are more-or-less proportional to the estimated value of each of the arms. ...

The best content for your career. Discover unlimited learning on demand for around $1/day.