The likelihood ratio trick

The policy represented by is assumed to be a differentiable function whenever it is non-zero, but computing the gradient of the policy with respect to theta, , may not be straightforward. We can multiply and divide by policy on both sides to get the following:

From calculus, we know that the gradient of the log of ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.