Training the reinforcement learning agent at the Gym

The procedure to train the Q-learning agent may look familiar to you already, because it has many of the same lines of code as, and also a similar structure to, the boilerplate code that we used before. Instead of choosing a random action from the environment's actions space, we now get the action from the agent using the agent.get_action(obs) method. We also call the agent.learn(obs, action, reward, next_obs) method after sending the agent's action to the environment and receiving the feedback. The training function is listed here:

def train(agent, env):    best_reward = -float('inf')    for episode in range(MAX_NUM_EPISODES):        done = False        obs = env.reset()        total_reward = 0.0 while not done: ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.