Even with deep Q-learning, there are some limitations, no matter whether you approximate your Q function by deriving it from visual images or other observations about the environment:
- The approximation takes a long time to converge, and sometimes it doesn't achieve it smoothly: you may even witness the learning indicators of the neural network worsening instead of getting better for many epochs.
- Being based on a greedy approach, the approach offered by Q-learning is not dissimilar from a heuristic: it points out the best direction but it cannot provide detailed planning. When dealing with long-term goals or goals that have to be articulated into sub-goals, Q-learning performs badly.
- Another ...