State-Action-Reward-State-Action (SARSA) is an on-policy algorithm where the same policy that generated previous actions can generate the next action. This is unlike the Q-learning where the algorithm is off-policy and only considers current state and rewards along with available next actions without any consideration to the ongoing policy.
At each step within SARSA, the agent's action is evaluated and improved by improving Q-function estimates. The Q-value is updated as a result of the error and adjusted by a factor of learning rate termed as . In this case, the Q-values represent potential reward from the next state transition ...