11
Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

  • Upload
    vutuyen

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Lab 5: Windy Frozen LakeNondeterministic world!

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Deterministic

Page 3: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Stochastic (non-deterministic)

Page 4: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Q-learning algorithm for deterministic

Page 5: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Our previous Q-learning does not work

Score over time: 0.0165

Page 6: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Q-learning algorithm

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Page 7: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Q-learning algorithm

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Page 8: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Code: Setup

Page 9: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Code: Q-learning

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Page 10: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Code: Report results

Score over time: 0.653

Page 11: Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

Next

(Deep) Q-Network

with TensorFlow!