We want the agent to taken an action given an observation. get_action(self, obs)is the function we define to generate an action, given an observation in obs.The most widely used action selection policy is the epsilon-greedy policy, which takes the best action as per the agent's estimate with a (high) probability of 1-, and takes a random action with a (small) probability given by epsilon . We implement the epsilon-greedy policy using the random() method from NumPy's random module, like this:
def ...