Checkerboard environment in Python

We are going to consider an example based on a checkerboard environment representing a tunnel. The goal of the agent is to reach the ending state (lower-right corner), avoiding 10 wells that are negative absorbing states. The rewards are:

  • Ending state: +5.0
  • Wells: -5.0
  • All other states: -0.1

Selecting a small negative reward for all non-terminal states is helpful to force the agent to move forward until the maximum (final) reward has been achieved. Let's start modeling an environment that has a 5 × 15 matrix:

import numpy as npwidth = 15height = 5y_final = width - 1x_final = height - 1y_wells = [0, 1, 3, 5, 5, 7, 9, 11, 12, 14]x_wells = [3, 1, 2, 0, 4, 1, 3, 2, 4, 1]standard_reward = -0.1tunnel_rewards ...

Get Mastering Machine Learning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.