From the description of the n-step deep actor-critic algorithm we went over previously, you may remember that the critic, represented using a neural network, is trying to solve a problem that is similar to what we saw in Chapter 6, Implementing an Intelligent Agent for Optimal Discrete Control using Deep Q-Learning, which is to represent the value function (similar to the action-value function we used in this chapter, but a bit simpler). We can use the standard Mean Squared Error (MSE) loss or the smoother L1 loss/Huber loss, calculated based on the critic's predicted values and the n-step returns (TD targets) computed in the previous step.
For the actor, we will use the results obtained with the ...