SARSA learning

State-Action-Reward-State-Action (SARSA) is an on-policy algorithm where the same policy that generated previous actions can generate the next action. This is unlike the Q-learning where the algorithm is off-policy and only considers current state and rewards along with available next actions without any consideration to the ongoing policy.

At each step within SARSA, the agent's action is evaluated and improved by improving Q-function estimates. The Q-value is updated as a result of the error and adjusted by a factor of learning rate termed as . In this case, the Q-values represent potential reward from the next state transition ...

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.