An extension - asynchronous deep n-step advantage actor-critic 

One easy extension that we can make to our agent implementation is to launch several instances of our agent, each with their own instance of the learning environment, and send back updates from what they have learned in an asychronous manner, that is, whenever they are available, without any need for time syncing. This algorithm is popularly known as the A3C algorithm, which is short for asynchronous advantage actor-critic.

One of the motivations behind this extension stems from what we learned in Chapter 6, Implementing an Intelligent Agent for optimal discrete control using Deep Q Learning, with the use of the experience replay memory. Our deep Q-learning agent was able to ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.