Upload
vutuyen
View
227
Download
0
Embed Size (px)
Citation preview
Lab 5: Windy Frozen LakeNondeterministic world!
Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>
Deterministic
Stochastic (non-deterministic)
Q-learning algorithm for deterministic
Our previous Q-learning does not work
Score over time: 0.0165
Q-learning algorithm
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
Q-learning algorithm
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
Code: Setup
Code: Q-learning
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
Code: Report results
Score over time: 0.653
Next
(Deep) Q-Network
with TensorFlow!