In Chapter 8, Implementing an Intelligent Autonomous Car Driving Agent Using the Deep Actor-Critic algorithm, we implemented the n-step return TD return method and discussed how forward-view multi-step targets can be used in place of a single/one-step TD target. We can use that n-step return with DQN, and that is essentially the idea behind this extension. Recall that the truncated n-step return from state is given as follows:
Using this equation, a multi-step variant of DQN can be defined to minimize the ...