Before our Q_Learner class declaration, we will initialize a few useful hyperparameters. Here are the hyperparameters that we will be using for our Q_Learner implementation:
- EPSILON_MIN: This is the minimum value of the epsilon value that we want the agent to use while following an epsilon-greedy policy.
- MAX_NUM_EPISODES:The maximum number of episodes that we want the agent to interact with the environment for.
- STEPS_PER_EPISODE: This is the number of steps in each episode. This could be the maximum number of steps that an environment will allow per episode or a custom value that we want to limit based on some time budget. Allowing a higher number of steps per episode means each episode might take longer to complete ...