A reward, denoted by , is usually a scalar quantity that is provided as feedback to the agent to drive its learning. The goal of the agent is to maximize the sum of the reward, and this signal indicates how well the agent is doing at time step . The following examples of reward signals for different tasks may help you get a more intuitive understanding:
- For the Atari games we discussed before, or any computer games in general, the reward signal can be +1 for every increase in score and -1 for every decrease in score.
- For stock trading, ...