Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a policy gradient-based method and is one of the algorithms that have been proven to be stable as well as scalable. In fact, PPO was the algorithm used by the OpenAI Five team of agents that played (and won) against several human DOTA II players, which we discussed in our previous chapter.

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.