21

Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Download PPT Report

Upload
lester-kristopher-bryant
View
220
Download
1

Embed Size (px)

Citation preview

Page 1: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Reinforcement LearningMichael Roberts

With Material From: Reinforcement Learning: An Introduction

Sutton & Barto (1998)

Page 2: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

What is RL?

• Trial & error learning– without model– with model

• Structure

s1 s2

s3

s4

r1

r2

r3

Page 3: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

RL vs. Supervised Learning

• Evaluative vs. Instructional feedback

• Role of exploration

• On-line performance

Page 4: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

K-armed Bandit Problem

Agent

Actions

Average Rewards

10

-5

100

0

0, 0, 5, 10, 35

5, 10, -15, -15, -10

Page 5: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

K-armed Bandit Cont.

• Greedy exploration• ε-greedy • Softmax

Average Reward:

Incremental formula:

where: α = 1 / (k+1)

Probability of choosing action a:

Page 6: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

More General Problems

• More than one state• Delayed rewards

• Markov Decision Process (MDP)– Set of states – Set of actions– Reward function– State transition function

• Table or Function Approximation

Page 7: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Page 8: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Example: Recycling Robot

Page 9: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Recycling Robot: Transition Graph

Page 10: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Dynamic Programming

Page 11: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup Diagram

.25.25.25

.5.5.3.7.6.4

Rewards 10 5 200 200 -10 1000

Page 12: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Dynamic Programming:Optimal Policy

Page 13: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup for Optimal Policy

Page 14: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Performance Metrics

• Eventual convergence to optimality

• Speed of convergence to optimality

• Regret

(Kaelbling, L., Littman, M., & Moore, A. 1996)

Page 15: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Gridworld Example

Page 16: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Initialize V arbitrarily, e.g. , for all

Repeat

For each

until (a small positive number)

Output a deterministic policy, such that:

Page 17: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Temporal Difference Learning

• RL without a model• Issue of: temporal credit assignment• Bootstraps like DP

• TD(0):

Page 18: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

TD Learning

• Again, TD(0) =

TD(λ) =

where e is called an eligibility trace

Page 19: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Backup Diagram for TD(λ)

Page 20: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

TD-Gammon (Tesauro)

Page 21: Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Additional Work

• POMDP’s

• Macros

• Multi-agent rl

• Multiple reward structures

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction

Documents

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS · Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto SINGLE AGENT LEARNING . 14 • Agents interact both

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS · Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto SINGLE AGENT LEARNING . 14 • Agents interact both

Documents

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reinforcement Learning Slides from R.S. Sutton and A.G. Barto Reinforcement Learning:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reinforcement Learning Slides from R.S. Sutton and A.G. Barto Reinforcement Learning:

Documents

Reinforcement Learning: An Introduction · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2014, 2015 A Bradford Book

Reinforcement Learning: An Introduction · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2014, 2015 A Bradford Book

Documents

Reinforcement Learning for RoboCup ... - Richard S. Suttonincompleteideas.net/papers/SSK-05.pdf · 1 Introduction Reinforcement learning (Sutton & Barto, 1998) is a the- oretically-grounded

Reinforcement Learning for RoboCup ... - Richard S. Suttonincompleteideas.net/papers/SSK-05.pdf · 1 Introduction Reinforcement learning (Sutton & Barto, 1998) is a the- oretically-grounded

Documents

Reinforcement Learning an Introduction - Richard S. Sutton , Andrew G. Barto

Reinforcement Learning an Introduction - Richard S. Sutton , Andrew G. Barto

Documents

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Documents

Reinforcement Learning - Multi-Agent Reinforcement

Reinforcement Learning - Multi-Agent Reinforcement

Documents

New An Introduction tohgeffner/Andy2.pdf · 2019. 12. 4. · A. G. Barto, Barcelona Lectures, April 2006. Based on R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction,

New An Introduction tohgeffner/Andy2.pdf · 2019. 12. 4. · A. G. Barto, Barcelona Lectures, April 2006. Based on R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction,

Documents

Reinforcement Learning: An Introduction · 2017-03-20 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Reinforcement Learning: An Introduction · 2017-03-20 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Documents

Reinforcement Learning: Learning algorithms

Reinforcement Learning: Learning algorithms

Documents

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary

Documents

Autonomous Learning Laboratory – Department of Computer Science Perspectives on Computational Reinforcement Learning Andrew G. Barto Autonomous Learning

Autonomous Learning Laboratory – Department of Computer Science Perspectives on Computational Reinforcement Learning Andrew G. Barto Autonomous Learning

Documents

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Formulating MDPs pFormulating MDPs Rewards Returns Values pEscalator pElevators

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Formulating MDPs pFormulating MDPs Rewards Returns Values pEscalator pElevators

Documents

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto {ravi,barto}@cs.umass.edu Autonomous Learning Laboratory

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto {ravi,barto}@cs.umass.edu Autonomous Learning Laboratory

Documents

Inverse Reinforcement Learning - Peoplecbfinn/_files/bootcamp_inverserl.pdf · Apprenticeship Learning via Inverse Reinforcement Learning. Good introduction to inverse reinforcement

Inverse Reinforcement Learning - Peoplecbfinn/_files/bootcamp_inverserl.pdf · Apprenticeship Learning via Inverse Reinforcement Learning. Good introduction to inverse reinforcement

Documents

Deep Reinforcement Learning Variants of Multi-Agent ... · List of Figures 2.1 The reinforcement learning sensory-action loop (Sutton and Barto, 1998). 11 2.2 The architecture of

Deep Reinforcement Learning Variants of Multi-Agent ... · List of Figures 2.1 The reinforcement learning sensory-action loop (Sutton and Barto, 1998). 11 2.2 The architecture of

Documents

Reinforcement Learning: An Introduction · 2013-02-16 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Reinforcement Learning: An Introduction · 2013-02-16 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Documents

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction

Documents

Reinforcement Learning & Apprenticeship Learning

Reinforcement Learning & Apprenticeship Learning

Documents

Multi-Objective Reinforcement Learning using Sets of Pareto … · 2020. 10. 19. · learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement

Multi-Objective Reinforcement Learning using Sets of Pareto … · 2020. 10. 19. · learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement

Documents

Lecture 8: Policy Gradient I 1 - Stanford UniversityLecture 8: Policy Gradient I 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Additional reading: Sutton and Barto 2018

Lecture 8: Policy Gradient I 1 - Stanford UniversityLecture 8: Policy Gradient I 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Additional reading: Sutton and Barto 2018

Documents

Reinforcement Learning1 Mainly based on “Reinforcement Learning – An Introduction” by Richard Sutton and Andrew Barto Slides are mainly based on the course

Reinforcement Learning1 Mainly based on “Reinforcement Learning – An Introduction” by Richard Sutton and Andrew Barto Slides are mainly based on the course

Documents

Reinforcement Learning and Deep Reinforcement Learningcse.ucdenver.edu/.../Class-22-Reinforcement-learning-DL.pdf · 2018. 11. 28. · Outlines 1 Principles of Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learningcse.ucdenver.edu/.../Class-22-Reinforcement-learning-DL.pdf · 2018. 11. 28. · Outlines 1 Principles of Reinforcement Learning

Documents

Backgammon, Go, - Ferdowsi University of Mashhadfumblog.um.ac.ir/gallery/839/Chapter 06.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Monte Carlo

Backgammon, Go, - Ferdowsi University of Mashhadfumblog.um.ac.ir/gallery/839/Chapter 06.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Monte Carlo

Documents

Reinforcement Learning or Active Inference?karl/Reinforcement Learning or Active... · Reinforcement Learning or Active Inference? ... From the point of view of reinforcement learning

Reinforcement Learning or Active Inference?karl/Reinforcement Learning or Active... · Reinforcement Learning or Active Inference? ... From the point of view of reinforcement learning

Documents

Reinforcement Learning: Learning from Interaction Winter School on Machine Learning and Vision, 2010 B. Ravindran Many slides adapted from Sutton and Barto

Reinforcement Learning: Learning from Interaction Winter School on Machine Learning and Vision, 2010 B. Ravindran Many slides adapted from Sutton and Barto

Documents

Deep Learning for Reinforcement Learning in · PDF fileDeep Learning for Reinforcement Learning in ... Deep Learning for Reinforcement Learning in Pacman Deep Learning für ... Während

Deep Learning for Reinforcement Learning in · PDF fileDeep Learning for Reinforcement Learning in ... Deep Learning for Reinforcement Learning in Pacman Deep Learning für ... Während

Documents

Reinforcement Learning with Action-Derived Rewards for ...pratiks/mlhc_2018/...P t final t0=t t0 tr t0(Sutton and Barto,1998). Q-learning is an o -policy method for calculating an

Reinforcement Learning with Action-Derived Rewards for ...pratiks/mlhc_2018/...P t final t0=t t0 tr t0(Sutton and Barto,1998). Q-learning is an o -policy method for calculating an

Documents

Reinforcement Learning Das Reinforcement Learning-Problem Alexander Schmid

Reinforcement Learning Das Reinforcement Learning-Problem Alexander Schmid

Documents