SARSA in the checkerboard environment

We can now test the SARSA algorithm in the original tunnel environment (all of the elements that are not redefined are the same as the previous chapter). The first step is defining the Q(s, a) array and the constants employed in the training process:

import numpy as npnb_actions = 4Q = np.zeros(shape=(height, width, nb_actions))x_start = 0y_start = 0max_steps = 2000alpha = 0.25

As we want to employ a ε-greedy policy, we can set the starting point to (0, 0), forcing the agent to reach the positive final state. We can now define the functions needed to perform a training step:

import numpy as npdef is_final(x, y):    if (x, y) in zip(x_wells, y_wells) or (x, y) == (x_final, y_final):        return True return False ...

Get Mastering Machine Learning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.