Exploration and exploitation

The exploration-exploitation trade-off is another problem that has its apparent origin within gambling, even though the real applications range from allocation of funding to research projects to self-driving cars. The traditional formulation is a multi-armed bandit problem, which refers to an imaginary slot machine with one or more arms. Sequential plays of each arm generate i.i.d . returns with unknown probabilities for each arm; the successive plays are independent in the simplified models. The rewards are assumed to be independent across the arms. The goal is to maximize the reward—for example, the amount of money won, and to minimize the learning loss, or the amount spend on the arms with less than optimal winning ...

Get Mastering Scala Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.