A randomized probability matching algorithm

The randomized probability matching bandit algorithm is a Bayesian statistical approach to the problem of figuring out when to explore our options and when to exploit them for a nice payoff. It works by sampling a probability distribution that describes the probable mean of the payoff. As we gain more data, the variance of the possible means narrows significantly. As we go through the rest of this chapter, we'll delve deeper into how this algorithm works.

As a concrete example, we can run some simulations. The following is a histogram of repeatedly sampling means with 100 samples from a normal distribution:

plt.title('Distribution of means for N(35,5) distribution (sampling 100 vs 500 data points)') plt.xlabel('') ...

Get Test-Driven Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.