cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can...
6
cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"