11
Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Lab 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 3: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 4: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Discounted reward ( = 0.9)

1

Page 5: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Q-learning algorithm

Page 6: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: setup

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 7: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: Q learning

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 8: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 9: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 10: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 11: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Next

Nondeterministic/

Stochastic worlds