Policy

A policy is formally a deterministic or stochastic law that the agent follows in order to maximize its return. Conventionally, all policies are denoted with the letter π. A deterministic policy is usually a function of the current state that outputs a precise action:

A stochastic policy, analogously to environments, outputs the probability of each action (in this case, we are assuming we work with a finite MPD):

However, contrary to the environment, an agent must always pick a specific action, transforming any stochastic policy into ...

Get Mastering Machine Learning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.