You are previewing Bandit Algorithms for Website Optimization.

Bandit Algorithms for Website Optimization

Cover of Bandit Algorithms for Website Optimization by John Myles White Published by O'Reilly Media, Inc.
  1. Bandit Algorithms for Website Optimization
  2. Preface
    1. Finding the Code for This Book
    2. Dealing with Jargon: A Glossary
    3. Conventions Used in This Book
    4. Using Code Examples
    5. Safari® Books Online
    6. How to Contact Us
    7. Acknowledgments
  3. 1. Two Characters: Exploration and Exploitation
    1. The Scientist and the Businessman
      1. Cynthia the Scientist
      2. Bob the Businessman
      3. Oscar the Operations Researcher
    2. The Explore-Exploit Dilemma
  4. 2. Why Use Multiarmed Bandit Algorithms?
    1. What Are We Trying to Do?
    2. The Business Scientist: Web-Scale A/B Testing
  5. 3. The epsilon-Greedy Algorithm
    1. Introducing the epsilon-Greedy Algorithm
    2. Describing Our Logo-Choosing Problem Abstractly
      1. What’s an Arm?
      2. What’s a Reward?
      3. What’s a Bandit Problem?
    3. Implementing the epsilon-Greedy Algorithm
    4. Thinking Critically about the epsilon-Greedy Algorithm
  6. 4. Debugging Bandit Algorithms
    1. Monte Carlo Simulations Are Like Unit Tests for Bandit Algorithms
    2. Simulating the Arms of a Bandit Problem
    3. Analyzing Results from a Monte Carlo Study
      1. Approach 1: Track the Probability of Choosing the Best Arm
      2. Approach 2: Track the Average Reward at Each Point in Time
      3. Approach 3: Track the Cumulative Reward at Each Point in Time
    4. Exercises
  7. 5. The Softmax Algorithm
    1. Introducing the Softmax Algorithm
    2. Implementing the Softmax Algorithm
    3. Measuring the Performance of the Softmax Algorithm
    4. The Annealing Softmax Algorithm
    5. Exercises
  8. 6. UCB – The Upper Confidence Bound Algorithm
    1. Introducing the UCB Algorithm
    2. Implementing UCB
    3. Comparing Bandit Algorithms Side-by-Side
    4. Exercises
  9. 7. Bandits in the Real World: Complexity and Complications
    1. A/A Testing
    2. Running Concurrent Experiments
    3. Continuous Experimentation vs. Periodic Testing
    4. Bad Metrics of Success
    5. Scaling Problems with Good Metrics of Success
    6. Intelligent Initialization of Values
    7. Running Better Simulations
    8. Moving Worlds
    9. Correlated Bandits
    10. Contextual Bandits
    11. Implementing Bandit Algorithms at Scale
  10. 8. Conclusion
    1. Learning Life Lessons from Bandit Algorithms
    2. A Taxonomy of Bandit Algorithms
    3. Learning More and Other Topics
  11. Colophon
  12. Copyright

Chapter 1. Two Characters: Exploration and Exploitation

To set the stage for this book, I’m going to tell you a short story about a web developer, Deborah Knull, who ran a small web business that provided most of her income. Deb Knull’s story will introduce the core concepts that come up when studying bandit algorithms, which are called exploration and exploitation. To make those ideas concrete, I’m going to associate them with two types of people: a scientist who explores and a businessman who exploits. My hope is that these two characters will help you understand why you need to find a way to balance the desires of both of these types of people in order to build a better website.

The Scientist and the Businessman

One Sunday morning, a young web enterpreneur, Deb Knull, came to suspect that changing the primary color of her site’s logo would make her site’s users feel more comfortable. Perhaps more importantly, she thought that making her customers feel more comfortable would make them buy more of the products her site was selling.

But Deb Knull worried that a new color could potentially disorient users and make them feel less comfortable. If that were true, her clever idea to increase sales might actually make her users buy fewer products instead. Unsure which of her instincts to trust, she asked for advice from two of her friends: Cynthia, a scientist, and Bob, a businessman.

Cynthia the Scientist

Cynthia, the scientist, loved Deb’s proposed logo change. Excited by the opportunity to try out someting new, Cynthia started to lecture Deb about how to test her change carefully: "You can’t just switch your logo to a new color and then assume that the change in the logo’s color is responsible for whatever happens next. You’ll need to run a controlled experiment. If you don’t test your idea with a controlled experiment, you’ll never know whether the color change actually helped or hurt your sales. After all, it’s going to be Christmas season soon. If you change the logo now, I’m sure you’ll see a huge increase in sales relative to the last two months. But that’s not informative about the merits of the new logo: for all you know, the new color for your logo might actually be hurting sales."

"Christmas is such a lucrative time of year that you’ll see increased profits despite having made a bad decision by switching to a new color logo. If you want to know what the real merit of your idea is, you need to make a proper apples-to-apples comparison. And the only way I know how to do that is to run a traditional randomized experiment: whenever a new visitor comes to your site, you should flip a coin. If it comes up heads, you’ll put that new visitor into Group A and show them the old logo. If it comes up tails, you’ll put the visitor into Group B and show them the new logo. Because the logo you show each user is selected completely randomly, any factors that might distort the comparison between the old logo and new logo should balance out over time. If you use a coinflip to decide which logo to show each user, the effect of the logo won’t be distorted by the effects of other things like the Christmas season."

Deb agreed that she shouldn’t just switch the color of her logo over; as Cynthia the scientist was suggesting, Deb saw that she needed to run a controlled experiment to assess the business value of changing her site’s logo.

In Cynthia’s proposed A/B testing setup, Groups A and B of users would see slightly different versions of the same website. After enough users had been exposed to both designs, comparisons between the two groups would allow Deb to decide whether the proposed change would help or hurt her site.

Once she was convinced of the merits of A/B testing, Deb started to contemplate much larger scale experiments: instead of running an A/B test, she started to consider comparing her old black logo with six other colors, including some fairly quirky colors like purple and chartreuse. She’d gone from A/B testing to A/B/C/D/E/F/G testing in a matter of minutes.

Running careful experiments about each of these ideas excited Cynthia as a scientist, but Deb worried that some of the colors that Cynthia had proposed testing seemed likely to be much worse than her current logo. Unsure what to do, Deb raised her concerns with Bob, who worked at a large multinational bank.

Bob the Businessman

Bob heard Deb’s idea of testing out several new logo colors on her site and agreed that experimentation could be profitable. But Bob was also very skeptical about the value of trying out some of the quirkier of Cynthia’s ideas.

"Cynthia’s a scientist. Of course she thinks that you should run lots of experiments. She wants to have knowledge for knowledge’s sake and never thinks about the costs of her experiments. But you’re a businesswoman, Deb. You have a livelihood to make. You should try to maximize your site’s profits. To keep your checkbook safe, you should only run experiments that could be profitable. Knowledge is only valuable for profit’s sake in business. Unless you really believe a change has the potential to be valuable, don’t try it at all. And if you don’t have any new ideas that you have faith in, going with your traditional logo is the best strategy."

Bob’s skepticism of the value of large-scale experimentation rekindled Deb’s concerns earlier: the threat of losing customers was greater than Deb had felt when energized by Cynthia’s passion for designing experiments. But Deb also wasn’t clear how to decide which changes would be profitable without trying them out, which seemed to lead her back to Cynthia’s original proposal and away from Bob’s preference for tradition.

After spending some time weighing Cynthia and Bob’s arguments, Deb decided that there was always going to be a fundamental trade-off between the goals that motivated Cynthia and Bob: a small business couldn’t afford to behave like a scientist and spend money gaining knowledge for knowledge’s sake, but it also couldn’t afford to focus short-sightedly on current profits and to never try out any new ideas. As far as she could see, Deb felt that there was never going to be a simple way to balance the need to (1) learn new things and (2) profit from old things that she’d already learned.

Oscar the Operations Researcher

Luckily, Deb had one more friend she knew she could turn to for advice: Oscar, a professor who worked in the local Department of Operations Research. Deb knew that Oscar was an established expert in business decision-making, so she suspected the Oscar would have something intelligent to say about her newfound questions about balancing experimentation with profit-maximization.

And Oscar was indeed interested in Deb’s idea:

"I entirely agree that you have to find a way to balance Cynthia’s interest in experimentation and Bob’s interest in profits. My colleagues and I call that the Explore-Exploit trade-off."

"Which is?"

"It’s the way Operations Researchers talk about your need to balance experimentation with profit-maximization. We call experimentation exploration and we call profit-maximization exploitation. They’re the fundamental values that any profit-seeking system, whether it’s a person, a company or a robot, has to find a way to balance. If you do too much exploration, you lose money. And if you do too much exploitation, you stagnate and miss out on new opportunities."

"So how do I balance exploration and exploitation?"

"Unfortunately, I don’t have a simple answer for you. Like you suspected, there is no universal solution to balancing your two goals: to learn which ideas are good or bad, you have to explore — at the risk of losing money and bringing in fewer profits. The right way to choose between exploring new ideas and exploiting the best of your old ideas depends on the details of your situation. What I can tell you is that your plan to run A/B testing, which both Cynthia and Bob seem to be taking for granted as the only possible way you could learn which color logo is best, is not always the best option."

"For example, a trial period of A/B testing followed by sticking strictly to the best design afterwards only makes sense if there is a definite best design that consistently works across the Christmas season and the rest of the year. But imagine that the best color scheme is black/orange near Halloween and red/green near Christmas. If you run an A/B experiment during only one of those two periods of time, you’ll come to think there’s a huge difference — and then your profits will suddenly come crashing down during the other time of year."

"And there are other potential problems as well with naive A/B testing: if you run an experiment that streches across both times of year, you’ll see no average effect for your two color schemes — even though there’s a huge effect in each of the seasons if you had examined them separately. You need context to design meaningful experiments. And you need to experiment intelligently. Thankfully, there are lots of algorithms you can use to help you design better experiments."

The Explore-Exploit Dilemma

Hopefully the short story I’ve just told you has made it clear to you that you have two completely different goals you need to address when you try to optimize a website: you need to (A) learn about new ideas (which we’ll always call exploring from now on), while you also need to (B) take advantage of the best of your old ideas (which we’ll always call exploiting from now on). Cynthia the scientist was meant to embody exploration: she was open to every new idea, including the terrible ideas of using a purple or chartreuse logo. Bob was meant to embody exploitation, because he closes his mind to new ideas prematurely and is overly willing to stick with tradition.

To help you build better websites, we’ll do exactly what Oscar would have done to help Deborah: we’ll give you a crash course in methods for solving the Explore-Exploit dilemma. We’ll discuss two classic algorithms, one state-of-the-art algorithm and then refer you to standard textbooks with much more information about the huge field that’s arisen around the Exploration-Exploitation trade-off.

But, before we start working with algorithms for solving the Exploration-Exploitation trade-off, we’re going to focus on the differences between the bandit algorithms we’ll present in this book and the tradition A/B testing methods that most web developers would use to explore new ideas.

The best content for your career. Discover unlimited learning on demand for around $1/day.