22
Reinforcement Learning Lecture 1: Introduction Vien Ngo MLR, University of Stuttgart

Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Reinforcement Learning

Lecture 1: Introduction

Vien NgoMLR, University of Stuttgart

Page 2: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?– Reinforcement Learning is a subfield of Machine Learning

adapted from David Silver’s lecture2/18

Page 3: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

RL: A subfield of Machine Learning(from Machine Learning course, 2011, Marc Toussaint)

• Supervised learning: learn from “labelled” data {(xi, yi)}Ni=1

Unsupervised learning: learn from “unlabelled” data {xi}Ni=0 onlySemi-supervised learning: many unlabelled data, few labelled data

• Reinforcement learning: learn from data {(st, at, rt, st+1)}– learn a predictive model (s, a) 7→ s′

– learn to predict reward (s, a) 7→ r

– learn a behavior s 7→ a that maximizes the expected total reward

3/18

Page 4: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

RL: A subfield of Machine Learning(from Machine Learning course, 2011, Marc Toussaint)

• Supervised learning: learn from “labelled” data {(xi, yi)}Ni=1

Unsupervised learning: learn from “unlabelled” data {xi}Ni=0 onlySemi-supervised learning: many unlabelled data, few labelled data

• Reinforcement learning: learn from data {(st, at, rt, st+1)}– learn a predictive model (s, a) 7→ s′

– learn to predict reward (s, a) 7→ r

– learn a behavior s 7→ a that maximizes the expected total reward

3/18

Page 5: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?– RL is learning from interaction.– There is no supervisor, only signals of reward/evaluative feedback.– Decisions in sequence does matter as they affect the outcome ofsubsequent decisions.

from Satinder Singh’s Introduction to RL

4/18

Page 6: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?

from Satinder Singh’s Introduction to RL

5/18

Page 7: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Success of Reinforcement Learning

• Games– Backgammon (Tesauro, 1994)– Solitaire (X. Yan et. al., 2005)– Chess,– Checkers,– deep RL in playing Atari games (2014, Google Deepmind).

• Operations Research– Inventory Management (Van Roy, Bertsekas, Lee, & Tsitsiklis, 1996)– Dynamic Channel Allocation (e.g. Singh & Bertsekas, 1997)– Vehicle Routing, etc.

• Economics– Trading,

• Robotics– Robocup Soccer (e.g. Stone & Veloso, 1999)– Helicopter Control (e.g. Ng, 2003, Abbeel & Ng, 2006)– Many Robots (navigation, bi-pedal walking, grasping, switching betweenskills, ...)

6/18

Page 8: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

TD-Gammon, by Gerald Tesauro(See section 11.1 in Sutton & Barto’s book.)

• See (Tesauro, 1992, 1994, 1995)

• Only reward given at end of game for win.

• Self-play: use the current policy to sample moves on both sides!

• After about 300,000 games against itself, near the level of the world’sstrongest grandmasters.

7/18

Page 9: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

GO using UCT, by Gelly(See Gelly et. al 2012, Communications of the ACM for a review.)

8/18

Page 10: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Reinfocement Learning in RoboticsLearning motor skills, Autonomous Helicopter Flight

(2000, by Schaal, Atkeson, Vijayakumar) (2007, Andrew Ng et al.)

(2014, playing Atari games by Google

Deepmind)

9/18

Page 11: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Reinforcement learning in neuroscience

(Yael Niv, ICML 2009’s tutorial.)

10/18

Page 12: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Reinforcement learning in neurosciencePeter Dayan and Yael Niv, Neurobiology 2008.

• The brain employs both model-free and model-based decision-makingstrategies in parallel, with each dominating in different circumstances.

11/18

Page 13: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?

s1a1r2s2a2r2 · · · siairi+1si+1 · · ·

• States can be vectors or other structures, defined as sufficientstatistics to predict what happens next.

• Actions/Controls can be multi-dimensional

• Rewards are scalar but can be arbitrarily uninformative, and might bedelayed; e.g., rt tells how well the agent does at time t (after takingaction at at st).

• Objective: is desribed as the maximization of expected total reward.

• States are sometimes not directly observable, unobservable.

o1a1r2o2a2r2 · · · oiairi+1oi+1 · · ·

• Agent has only partial knowledge about environment, e.g unknowndynamics, reward, observation functions, etc..

12/18

Page 14: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?

s1a1r2s2a2r2 · · · siairi+1si+1 · · ·

• States can be vectors or other structures, defined as sufficientstatistics to predict what happens next.

• Actions/Controls can be multi-dimensional

• Rewards are scalar but can be arbitrarily uninformative, and might bedelayed; e.g., rt tells how well the agent does at time t (after takingaction at at st).

• Objective: is desribed as the maximization of expected total reward.

• States are sometimes not directly observable, unobservable.

o1a1r2o2a2r2 · · · oiairi+1oi+1 · · ·

• Agent has only partial knowledge about environment, e.g unknowndynamics, reward, observation functions, etc..

12/18

Page 15: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?

s1a1r2s2a2r2 · · · siairi+1si+1 · · ·

• States can be vectors or other structures, defined as sufficientstatistics to predict what happens next.

• Actions/Controls can be multi-dimensional

• Rewards are scalar but can be arbitrarily uninformative, and might bedelayed; e.g., rt tells how well the agent does at time t (after takingaction at at st).

• Objective: is desribed as the maximization of expected total reward.

• States are sometimes not directly observable, unobservable.

o1a1r2o2a2r2 · · · oiairi+1oi+1 · · ·

• Agent has only partial knowledge about environment, e.g unknowndynamics, reward, observation functions, etc..

12/18

Page 16: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?

s1a1r2s2a2r2 · · · siairi+1si+1 · · ·

• States can be vectors or other structures, defined as sufficientstatistics to predict what happens next.

• Actions/Controls can be multi-dimensional

• Rewards are scalar but can be arbitrarily uninformative, and might bedelayed; e.g., rt tells how well the agent does at time t (after takingaction at at st).

• Objective: is desribed as the maximization of expected total reward.

• States are sometimes not directly observable, unobservable.

o1a1r2o2a2r2 · · · oiairi+1oi+1 · · ·

• Agent has only partial knowledge about environment, e.g unknowndynamics, reward, observation functions, etc.. 12/18

Page 17: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

What is Reinforcement Learning?– Example of Rewards:

• +1/− 1 of winning/losing a game, e.g. GO, Backgammon, ...• +/− for increasing/decreasing score, e.g. in deep RL algorithms playing

Atari games.• +/− rewards for earning/losing money in managing an investment

portfolio.• +/− rewards for following the desired trajectory/for crashing in controlling

a stunt helicopter.• etc.

13/18

Page 18: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Components of An RL Agent– Policy: define behaviours of the agent, e.g a mapping π : S 7→ A orπ : S ×A 7→ [0, 1]

– Value Functions: the expected return from this state (if starting fromthis state).

V π(s) = Eπ[∑

t

γtRt|s0 = s]

– Model: the agent’s internal representation of the environment, e.g.P (s′|s, a), R(s, a, s′) .

14/18

Page 19: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Schedule of this course

• Part 1: The Basis

• Markov Decision Process (MDP), Partially Observable MDP (POMDP).• Dynamic Programming: Value Iteration, Policy Iteration

• Part 2: Reinforcement Learning Topics

• Temporal Difference learning, Q-Learning.• Reinforcement learning with function approximation• Policy search

• Part 3: Advanced Topics

• Inverse reinforcement learning, imitation learning.• Exploration vs. Exploitation: Multi-armed bandis, PAC-MDP, Bayesian

reinforcement learning.• Hierarchical reinforcement learning: macro actions, skill acquisition.• Deep reinforcement learning• Reinforcement learning in POMDP environment.

15/18

Page 20: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Schedule of this course

• Missing:– Relational MDP– MDP/POMDP/RL as Inference

16/18

Page 21: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Literature

Richard S. Sutton, Andrew Barto: Rein-forcement Learning: An Introduction. TheMIT Press Cambridge, MassachusettsLondon, England, 1998.http://webdocs.cs.ualberta.ca/

~sutton/book/the-book.html

Csaba Szepesvri: Algorithms for Rein-forcement Learning. Morgan & Claypoolin July 2010.http://www.ualberta.ca/~szepesva/

RLBook.html

17/18

Page 22: Reinforcement Learning Lecture Lecture 1: Introduction › mlr › wp... · What is Reinforcement Learning? – Reinforcement Learning is a subfield of Machine Learning adapted from

Organisation• Course webpage::https://ipvs.informatik.uni-stuttgart.de/mlr/reinforcement-learning-ss15/

– Slides, Exercises– Links to other resources

• Secretary, admin issuesCarola Stahl, [email protected], Raum 2.217

• Lecture : Tue. 09:45-11:15, Room 0.124;• Tutorial: Wed. 14:00-15:30, Room 0.108

• Rules for the tutorials:– Doing the exercises is crucial!– At the beginning of each tutorial:

– sign into a list– mark which exercises you have (successfully) worked on

– Students are randomly selected to present their solutions– You need 50% of completed exercises to be allowed to the exam

(Prof. Marc Toussaint’s rules.)18/18