(PPT) Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Download PPT Report

View
226
Download
1

Tags:

Embed Size (px)

Citation preview

Reinforcement Learning

Mitchell, Ch. 13

(see also Barto & Sutton book on-line)

Rationale

• Learning from experience

• Adaptive control

• Examples not explicitly labeled, delayed feedback

• Problem of credit assignment – which action(s) led to payoff?

• tradeoff short-term thinking (immediate reward) for long-term consequences

Page 3: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

• Transition function – T:SxA->S, environment• Reward function R:SxA->real, payoff• Stochastic but Markov

• Policy=decision function, :S->A• “rationality” – maximize long term expected

reward– Discounted long-term reward (convergent series)– Alternatives: finite time horizon, uniform weights

Agent Model

Page 4: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

R,T

Page 5: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Markov Decision Processes (MDPs)• if know R and T(=P), solve for value func V(s)• policy evaluation• Bellman Equations • dynamic programming (|S| eqns in |S| unknowns)

Page 6: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

• finding optimal policies

• Value iteration – update V(s) iteratively until (s)=argmaxa V(s) stops changing

• Policy iteration – iterate between choosing and updating V over all states

• Monte Carlo sampling: run random scenarios using and take average rewards as V(s)

MDPs

Page 7: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Q-learning: model-free• Q-function: reformulate as value function

of S and A, independent of R and T(=)

Page 8: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Q-learning algorithm

Page 9: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Page 10: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Convergence

• Theorem: Q converges to Q*, after visiting each state infinitely often (assuming |r|<)

• Proof: with each iteration (where all SxA visited), magnitude of largest error in Q table decreases by at least

Page 11: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Training• “on-policy”– exploitation vs. exploration– will relevant parts of the space be explored if stick to

current (sub-optimal) policy?– -greedy policies: choose action with max Q value

most of the time, or random action % of the time

• “off-policy”– learn from simulations or traces– SARSA: training example database: <s,a,r,s’,a’>

• Actor-critic

Page 12: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Non-deterministic case

Page 13: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Temporal Difference Learning

Page 14: Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

• convergence is not the problem

• representation of large Q table is the problem (domains with many states or continuous actions)

• how to represent large Q tables?– neural network– function approximation– basis functions– hierarchical decomposition of state space

Reinforcement Learning: An Introduction · 2017-03-20 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Documents

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS · Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto SINGLE AGENT LEARNING . 14 • Agents interact both

Documents

Reinforcement Learning: An Introduction · 2013-02-16 · i Reinforcement Learning: An Introduction Second edition, in progress Richard S. Sutton and Andrew G. Barto c 2012 A Bradford

Documents

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto {ravi,barto}@cs.umass.edu Autonomous Learning Laboratory

Documents

Arniya Barto, Pavel Barto – The Grubby Girl

Documents

Sutton & Barto, Chapter 4 Dynamic Programming. Programming Assignments? Course Discussions?

Documents

Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

Documents

New An Introduction tohgeffner/Andy2.pdf · 2019. 12. 4. · A. G. Barto, Barcelona Lectures, April 2006. Based on R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction,

Documents

Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998. sutton/book/the-book.html

Documents

Backgammon, Go, - Ferdowsi University of Mashhadfumblog.um.ac.ir/gallery/839/Chapter 06.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Monte Carlo

Documents

Introduction to Reinforcement Learning E0397 Lecture Slides by Richard Sutton (With small changes)

Documents

Policy Gradient Methods for Reinforcement … Gradient Methods for Reinforcement Learning with Function Approximation Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour

Documents

Sutton & Barto , Chapter 4

Documents

An Introduction to COMPUTATIONAL REINFORCEMENT LEARING Andrew G. Barto Department of Computer Science University of Massachusetts – Amherst Lecture 3 Autonomous

Documents

From Reflex to Reason Rich Sutton AT&T Labs with thanks to Satinder Singh, Doina Precup, and Andy Barto

Documents

Class 2: Model-Free Prediction Sutton and Barto, …dechter/courses/ics-295/winter...Class 2: Model-Free Prediction Sutton and Barto, Chapters 5 and 6 295, class 2 1 David Silver Lecture

Documents

M. Tech Programme in Robotics and Artificial Intelligence ... · 3) R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press, 2018. Robot Design

Documents

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reinforcement Learning Slides from R.S. Sutton and A.G. Barto Reinforcement Learning:

Documents

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Formulating MDPs pFormulating MDPs Rewards Returns Values pEscalator pElevators

Documents

Autonomous Learning Laboratory – Department of Computer Science Perspectives on Computational Reinforcement Learning Andrew G. Barto Autonomous Learning

Documents

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction

Documents

Lecture 9: Policy Gradient II 1 - Stanford University · Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2020 Additional reading: Sutton and Barto

Documents

Lineární algebra Libor Barto a Jiří Tůma barto@karlin.mff

Lineární algebra Libor Barto a Jiří Tůma [email protected]

Documents

RL Lecture 7: Eligibility Traces Lecture 7.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 29 Eligibility Traces for Actor-Critic Methods Critic: On-policy

Documents

Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

Documents

Deep Reinforcement Learning Variants of Multi-Agent ... · List of Figures 2.1 The reinforcement learning sensory-action loop (Sutton and Barto, 1998). 11 2.2 The architecture of

Documents

t learning RL Barto Sutton forthcoming Bertsek · 2005. 1. 10. · Reinforcemen t learning RL Barto Sutton forthcoming Bertsek as Tsitsik lis applies naturally to the case of autonomous

Documents

An Adaptive Dynamic Programming Algorithm for a Stochastic ... · in the context of neuro-dynamic programming and by Sutton & Barto (1998) in the context of reinforcement learning

Documents

What class activities / homework assignments have been the ...taylorm/2010_cs414/Chapter9.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 16 What is Dyna-Q

Documents

ii - Richard S. Suttonincompleteideas.net/book/cover.pdf · Reinforcem Learning An Introduction second edition Richard S. Sutton and Andrew G. nt Barto

Documents