Random no-ops on reset

When the environment is reset, the agent usually starts from the same initial state and therefore receives the same observation on reset. The agent may memorize or get used to the starting state in one game level so much that they might start performing poorly they start in a slightly different position or game level. Sometimes, it was found to be helpful to randomize the initial state, such as sampling different initial states from which the agent starts the episode. To make that happen, we can add a Gym wrapper that performs a random number of "no-ops" before sending out the first observation after the reset. The Arcade Learning Environment for the Atari 2600 that the Gym library uses for the Atari environment supports ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.