29
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Embed Size (px)

Citation preview

Page 1: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Using Advice to Transfer Knowledge Acquired in

One Reinforcement Learning Task to Another

Lisa Torrey, Trevor Walker, Jude Shavlik

University of Wisconsin-Madison, USA

Richard MaclinUniversity of Minnesota-Duluth, USA

Page 2: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Our Goal

Transfer knowledge…

… between reinforcement learning tasks

… employing SVM function approximators

… using advice

Page 3: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer

Learn first task

Learn related taskknowledge

acquired

Exploit previously learned models

Improve learning of new tasks

performance

experience

with transferwithout transfer

Page 4: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Reinforcement Learning

state…action…reward…new state

Q-function: value of taking action from state Policy: take action with max Qaction(state)

+2

-1

0

Page 5: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Advice for Transfer

Based on what worked in Task A,

I suggest…

Task ASolution

Task BLearner

I’ll try it, but if it doesn’t work I’ll do something

else.

Advice improves RL performance

Advice can be refined or even discarded

Page 6: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Advice from user (optional)

Transfer Advice

Task BQ-functions

Task B experience

Advice from user (optional)

Mapping from user Task A Task B

Task A experience

Task B experience

Page 7: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

RoboCup Soccer Tasks

KeepAway BreakAway

Keep ball from opponents

[Stone & Sutton, ICML 2001]

Score a goal

[Maclin et al., AAAI 2005]

Page 8: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

RL in RoboCup Tasks

Pass, Hold Pass, Move, Shoot

Each time step: +1 At end: +2, 0, or -1

KeepAway BreakAway

Features

Actions

Rewards

(time left)

Page 9: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 10: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Approximating Q-Functions

Learn linear coefficientsy = w1 f1 + … + wn fn + b

Non-linearity from Boolean tile featurestilei,lower,upper = 1 if lower ≤ fi < upper

Given examplesState features Si= <f1 , … , fn>

Estimated values y Qaction(Si)

Page 11: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Support Vector Regression

state S

Q-estimate y

minimize ||w||1 + |b| + C ||k||1

such that y - k Sw + b y + k

Linear Program

Page 12: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 13: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Advice Example

Need only follow advice approximately

Add soft constraints to linear program

if distance_to_goal 10

and shot_angle 30

then prefer shoot over all other actions

Page 14: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Incorporating AdviceMaclin et al., AAAI 2005

if v11 f1 + … + v1n fn d1

and vm1 f1 + … + vmn fn dn

then Qshoot > Qother for all other

Advice and Q-functions have same language Linear expressions of features

Page 15: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Process

Task AQ-functions

Task A experience

Transfer Advice

Task BQ-functions

Task B experience

Mapping from user Task A Task B

Page 16: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Expressing Policy with Advice

Qhold_ball(s) Qpass_near(s)Qpass_far(s)

if Qhold_ball(s) > Qpass_near(s)

and Qhold_ball(s) > Qpass_far(s)

then prefer hold_ball over all other actions

Old Q-functions

Advice expressing policy

Page 17: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Mapping Actions

hold_ball move

pass_near pass_near

pass_far

Qhold_ball(s) Qpass_near(s)Qpass_far(s)

if Qhold_ball(s) > Qpass_near(s)

and Qhold_ball(s) > Qpass_far(s)

then prefer move over all other actions

Old Q-functions

Mapped policy

Mapping from user

Page 18: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Mapping Features

Qhold_ball(s) = w1 (dist_keeper1)+ w2 (dist_taker2)+ …

Q´hold_ball(s) = w1 (dist_attacker1)+ w2 (MAX_DIST)+ …

Mapping from user

Q-function mapping

Page 19: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Example

Qx = wx1f1 + wx2f2 + bx

Qy = wy1f1 + by

Qz = wz2f2 + bz

Old model

Q´x = wx1f´1 + wx2f´2 + bx

Q´y = wy1f´1 + by

Q´z = wz2f´2 + bz

Mapped model

if Q´x > Q´y

and Q´x > Q´z

then prefer x´

Advice

if wx1f´1 + wx2f´2 + bx > wy1f´1 + by

and wx1f´1 + wx2f´2 + bx > wz2 f´2 + bz

then prefer x´ to all other actions

Advice (expanded)

Page 20: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Transfer Experiment

Between RoboCup subtasks From 3-on-2 KeepAway To 2-on-1 BreakAway

Two simultaneous mappings Transfer passing skills Map passing skills to shooting

Page 21: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Experiment Mappings

Play a moving KeepAway game Pass Pass, Hold Move

Pretend teammate is standing in the goal Pass Shoot

imaginaryteammate

Page 22: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Experimental Methodology

Averaged over 10 BreakAway runs

Transfer: advice from one KeepAway model

Control: runs without advice

Page 23: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Results

0

0.2

0.4

0.6

0.8

0 2500 5000 7500 10000

Games Played

Pro

bab

ility

(S

core

Go

al)

Page 24: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Analysis

Transfer advice helps BreakAway learners 7% more likely to score a goal after learning

Improvement is delayed Advantage begins after 2500 games

Some advice rules apply rarely Preconditions for shoot advice not often met

Page 25: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Related Work: Transfer

Remember action subsequences [Singh, ML 1992]

Restrict action choices [Sherstov & Stone, AAAI 2005]

Transfer Q-values directly in KeepAway [Taylor & Stone, AAMAS 2005]

Page 26: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Related Work: Advice

“Take action A now”[Clouse & Utgoff, ICML 1992]

“In situations S, action A has value X ”[Maclin & Shavlik, ML 1996]

“In situations S, prefer action A over B ”[Maclin et al., AAAI 2005]

Page 27: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Future Work

Increase speed of linear-program solving

Decrease sensitivity to imperfect advice

Extract advice from kernel-based models

Help user map actions and features

Page 28: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Conclusions

Transfer exploits previously learned models to improve learning of new tasks

Advice is an appealing way to transfer

Linear regression approach incorporates advice straightforwardly

Transferring a policy accommodates different reward structures

Page 29: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Acknowledgements

DARPA grant HR0011-04-1-0007 United States Naval Research Laboratory

grant N00173-04-1-G026 Michael Ferris Olvi Mangasarian Ted Wild