Results

After the first 250 episodes, we will see that the total rewards for the episode approach 200 and the episode steps also approach 200. This means that the agent has learned to balance the pole on the cart until the environment ends at a maximum of 200 steps.

It's of course fun to watch our success, so we can use the DQNAgent .test() method to evaluate for some number of episodes. The following code is used to define this method:

dqn.test(env, nb_episodes=5, visualize=True)

Here we've set visualize=True so we can watch our agent balance the pole, as shown in the following image:

There we go, that's one balanced pole! Alright, I know, ...

Get Deep Learning Quick Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.