11

Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Download PDF Report

Upload
others
View
9
Download
0

Embed Size (px)

Citation preview

Page 1: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Lab 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 3: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 4: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Discounted reward ( = 0.9)

1

Page 5: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Q-learning algorithm

Page 6: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: setup

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 7: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: Q learning

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 8: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 9: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 10: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 11: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Next

Nondeterministic/

Stochastic worlds

Manix Au - Automatic State Construction using DT for RL agen · Manix Au 04.03.2004 . 8 Introduction To The Thesis Reinforcement Learning Reinforcement Learning (RL) is a computational

Manix Au - Automatic State Construction using DT for RL agen · Manix Au 04.03.2004 . 8 Introduction To The Thesis Reinforcement Learning Reinforcement Learning (RL) is a computational

Documents

AI-based Decision Support for Sustainable Operation of ......Learning (RL), a machine learning algorithm suitable for complex and large scale optimization problems [25]. RL outperforms

AI-based Decision Support for Sustainable Operation of ......Learning (RL), a machine learning algorithm suitable for complex and large scale optimization problems [25]. RL outperforms

Documents

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Documents

RC and RL Circuits - Electronics â€“ Online Distance Learning

RC and RL Circuits - Electronics â€“ Online Distance Learning

Documents

IERG 5350 Reinforcement Learning Lecture 1: Course Overview · •RL algorithms: Q-learning, policy gradients, actor-critic. •Others advanced topics: inverse RL, imitation learning,

IERG 5350 Reinforcement Learning Lecture 1: Course Overview · •RL algorithms: Q-learning, policy gradients, actor-critic. •Others advanced topics: inverse RL, imitation learning,

Documents

FOR ISSUED - @NSWDepartmentofEducation · 2019-10-10 · learning learning assembly / courtyard street play pedestrian entry rl 50,150 rl 50,150 rl 50,500 rl 50,500 rl 50,500 rl 51,000

FOR ISSUED - @NSWDepartmentofEducation · 2019-10-10 · learning learning assembly / courtyard street play pedestrian entry rl 50,150 rl 50,150 rl 50,500 rl 50,500 rl 50,500 rl 51,000

Documents

Deep Reinforcement Learning - GitHub Pages · 1 Q-Learning 2 Value based Deep RL 3 Policy based Deep RL 4 Lessons Srivatsan Srinivasan (IACS) Deep RL March 27, 2019 2 / 31. Q-Learning

Deep Reinforcement Learning - GitHub Pages · 1 Q-Learning 2 Value based Deep RL 3 Policy based Deep RL 4 Lessons Srivatsan Srinivasan (IACS) Deep RL March 27, 2019 2 / 31. Q-Learning

Documents

Reinforcement Learning - RL in finite MDPshome.deib.polimi.it/restelli/MyWebSite/pdf/rl4.pdf · Reinforcement Learning RL in ﬁnite MDPs ... i 0 i = 1and P i 0 2 i

Reinforcement Learning - RL in finite MDPshome.deib.polimi.it/restelli/MyWebSite/pdf/rl4.pdf · Reinforcement Learning RL in ﬁnite MDPs ... i 0 i = 1and P i 0 2 i

Documents

Distributed Reinforcement Learning with ADMM-RL › docs › fy19osti › 74798.pdfDistributed Reinforcement Learning with ADMM-RL Peter Graf, Jen Annoni, Chris Bay, Devon Sigler,

Distributed Reinforcement Learning with ADMM-RL › docs › fy19osti › 74798.pdfDistributed Reinforcement Learning with ADMM-RL Peter Graf, Jen Annoni, Chris Bay, Devon Sigler,

Documents

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Realopenaccess.thecvf.com/content_CVPR_2020/papers/Rao_RL... · 2020-06-07 · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Realopenaccess.thecvf.com/content_CVPR_2020/papers/Rao_RL... · 2020-06-07 · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Documents

Logically-Constrained Neural Fitted Q-iteration · Reinforcement Learning (RL) is a successful paradigm for training ... mediate application of LTL in RL is for safe learning: for

Logically-Constrained Neural Fitted Q-iteration · Reinforcement Learning (RL) is a successful paradigm for training ... mediate application of LTL in RL is for safe learning: for

Documents

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Documents

Lab 2: Playing OpenAI Gym Games - GitHub Pageshunkim.github.io/ml/RL/rl-l02.pdf · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 2: Playing OpenAI Gym Games - GitHub Pageshunkim.github.io/ml/RL/rl-l02.pdf · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Documents

Reinforcement Learning. Overview Introduction Q-learning Exploration Exploitation Evaluating RL algorithms On-Policy learning: SARSA Model-based

Reinforcement Learning. Overview Introduction Q-learning Exploration Exploitation Evaluating RL algorithms On-Policy learning: SARSA Model-based

Documents

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded

Documents

Meger ősítéses Tanulás = Reinforcement Learning (RL)

Meger ősítéses Tanulás = Reinforcement Learning (RL)

Documents

Deep Reinforcement Learning - David Silver · Reinforcement Learning: AI = RL RL is a general-purpose framework for arti cial intelligence I RL is for an agent with the capacity to

Deep Reinforcement Learning - David Silver · Reinforcement Learning: AI = RL RL is a general-purpose framework for arti cial intelligence I RL is for an agent with the capacity to

Documents

Model-Based RL and Policy Learningrll.berkeley.edu/deeprlcourse/f17docs/lecture_10... · · 2017-09-27Model-Based RL and Policy Learning CS 294-112: Deep Reinforcement Learning

Model-Based RL and Policy Learningrll.berkeley.edu/deeprlcourse/f17docs/lecture_10... · · 2017-09-27Model-Based RL and Policy Learning CS 294-112: Deep Reinforcement Learning

Documents

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Documents

Sequence-to-Sequence Reinforcement Learning Seq2seq RL for

Sequence-to-Sequence Reinforcement Learning Seq2seq RL for

Documents

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real · 2020. 6. 29. · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real Kanishka Rao1, Chris Harris1, Alex Irpan1,

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real · 2020. 6. 29. · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real Kanishka Rao1, Chris Harris1, Alex Irpan1,

Documents

Offline Reinforcement Learning and Offline Multi-Task RL

Offline Reinforcement Learning and Offline Multi-Task RL

Documents

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overﬁtting Sung Kim

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overﬁtting Sung Kim

Documents

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

Documents

A Brief Introduction to Reinforcement Learningais.informatik.uni-freiburg.de/.../deep_learning... · Outline • Characteristics of Reinforcement Learning (RL) • The RL Problem

A Brief Introduction to Reinforcement Learningais.informatik.uni-freiburg.de/.../deep_learning... · Outline • Characteristics of Reinforcement Learning (RL) • The RL Problem

Documents

A2-RL: Aesthetics Aware Reinforcement Learning for Image …openaccess.thecvf.com/content_cvpr_2018/papers/Li_A2-RL... · 2018-06-11 · A2-RL: Aesthetics Aware Reinforcement Learning

A2-RL: Aesthetics Aware Reinforcement Learning for Image …openaccess.thecvf.com/content_cvpr_2018/papers/Li_A2-RL... · 2018-06-11 · A2-RL: Aesthetics Aware Reinforcement Learning

Documents

SM RL: S M R LEARNING IN UNSTABLE ENVIRONMENTS

SM RL: S M R LEARNING IN UNSTABLE ENVIRONMENTS

Documents

Reinforcement Learning Dealing with Complexity and Safety in RL

Reinforcement Learning Dealing with Complexity and Safety in RL

Documents

A2-RL: Aesthetics Aware Reinforcement Learning for Image … · 2018-03-13 · A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping Debang Li 1;2, Huikai Wu , Junge Zhang

A2-RL: Aesthetics Aware Reinforcement Learning for Image … · 2018-03-13 · A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping Debang Li 1;2, Huikai Wu , Junge Zhang

Documents

A2-RL: Aesthetics Aware Reinforcement Learning for Image ...openaccess.thecvf.com/content_cvpr_2018/papers_backup/Li_A2-RL... · tomatic image cropping. The A2-RL model can ﬁnish

A2-RL: Aesthetics Aware Reinforcement Learning for Image ...openaccess.thecvf.com/content_cvpr_2018/papers_backup/Li_A2-RL... · tomatic image cropping. The A2-RL model can ﬁnish

Documents