6

cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 2: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 3: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 4: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 5: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 6: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"