Implementing the Q_Learner's get_action method

We want the agent to taken an action given an observation. get_action(self, obs)is the function we define to generate an action, given an observation in obs.The most widely used action selection policy is the epsilon-greedy policy, which takes the best action as per the agent's estimate with a (high) probability of 1-, and takes a random action with a (small) probability given by epsilon . We implement the epsilon-greedy policy using the random() method from NumPy's random module, like this:

 def ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.