Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature

Machine Learning

and Robotics

ICML 2011 tutorialBellevue, 28th June 2011

Marc ToussaintFU Berlin

• Please ask questions!

Machine Learning & Robotics tutorial = mission impossible!

– impossible to cover all topics → biased selection– impossible to cover all literature → sorry if I miss your work!

• no emphasis on SLAM & vision– more emphasis on contol, articulated robots, manipulation

• Goals of this tutorial:– Provide an overview on learning problems in Robotics

... where ML can/has contributed and mention literature– Encourage Machine Learners to think more about Robotics

... to understand inherent problems in robotics

... to consider formalizing the specific structure of robotic problems

2/82

First, two little comments on Roboticists vs. MLers...

3/82

Sure!Where's the data?What data do

you need?

I'm tired of programming my robot!Can't you make it learn?

Shouldn'tyou know?

4/82

• Robotics is about interaction with the environment

– Collected data depends on actions– Goal of learning: enable behavior!

(sequential decision making, long horizon control)

5/82

• Implications:– Benchmarking a method involves running the system!– Different to ML in Computer Vision (Pascal challenge, Middlebury),

or other standard benchmarking in ML

– no pipeline: dataapplication

expert

ML

expertresults

– only few examples where pure “learning from a data set”is useful in robotics (e.g, calibration, system identification, SLAM)

• This slows learning research in Robotics!

6/82

2011 IEEE International Conference on Robotics and Automation

ICRA 2011 Technical Program Tuesday May 10, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1

08:20-09:35Room 3B

Regular SessionsTuA101

Aerial Robotics I

08:20-09:35Room 3C


Agent-Based SystemsI

08:20-09:35Room 3D


AutonomousNavigation I

08:20-09:35Room 3E

Invited SessionsTuA104

ICRA RobotChallenge: Advancing

Research throughCompetitions

08:20-09:35Room 3G


Advanced RobotControl

08:20-09:35Room 5A


Behaviour-BasedSystems

08:20-09:35Room 5B


Biologically-InspiredRobots I

08:20-09:35Room 5C


Calibration andIdentification I

08:20-09:35Room 5D


Cellular and ModularRobots I

08:20-09:35Room 5E


Localization andMapping I

08:20-09:35Room 5F


Flexible Arms/Robots

08:20-09:35Room 5H


Distributed RobotSystems I

08:20-09:35Room 5I


Medical Robots andSystems I

08:20-09:35Room 5J


Visual Navigation I

08:20-09:35Hall

Poster SessionsTuA1-InteracInterac

Interactive Session I:Robotic Technology

10:05-11:20Room 3B


Aerial Robotics II

10:05-11:20Room 3C


Climbing Robots

10:05-11:20Room 3D


AutonomousNavigation II

10:05-11:20Room 3E


Human Detectionand Tracking I

10:05-11:20Room 3G


Teleoperation I

10:05-11:20Room 5A


Haptics and HapticInterfaces I

10:05-11:20Room 5B


Biologically-InspiredRobots II

10:05-11:20Room 5C


Calibration andIdentification II

10:05-11:20Room 5D


Cellular and ModularRobots II

10:05-11:20Room 5E


Localization andMapping II

10:05-11:20Room 5F


Direct/InverseDynamics

Formulation

10:05-11:20Room 5H


Force and TactileSensing

10:05-11:20Room 5I


Medical Robots andSystems II

10:05-11:20Room 5J


Visual Navigation II

10:05-11:20Room 3A


CommunicationSession I:

Architecture andSoftware for Robotic

Systems

13:40-14:55Room 3B

Regular SessionsTuP101

Personal and ServiceRobots

13:40-14:55Room 3C


Multi-Legged Robots

13:40-14:55Room 3D


Humanoid Robots I

13:40-14:55Room 3E


Human Detectionand Tracking II

13:40-14:55Room 3G


Teleoperation II

13:40-14:55Room 5A


Haptics and HapticInterfaces II

13:40-14:55Room 5B


Biologically-InspiredRobots III

13:40-14:55Room 5C


RehabilitationRobotics I

13:40-14:55Room 5D


Motion and PathPlanning I

13:40-14:55Room 5E


Localization andMapping III

13:40-14:55Room 5F


Marine andUnderwater Robotics

I

13:40-14:55Room 5H


Field and UnderwaterRobotics I

13:40-14:55Room 5I


Medical Robots andSystems III

13:40-14:55Room 5J


Visual Navigation III

13:40-14:55Room 3A


CommunicationSession II: Industrial

Manipulators

13:40-14:55Hall

Poster SessionsTuP1-InteracInterac

Interactive Session II:Systems, Control and

Automation

13:40-14:55Room T1

Poster SessionsTuP1-InteracT1

Buffer Session 1

15:25-16:55Room 3B


Human andMulti-RobotInteraction

15:25-16:55Room 3C


Legged Locomotion

15:25-16:55Room 3D


Humanoid Robots II

15:25-16:55Room 3E


Recognition I

15:25-16:55Room 3G


Teleoperation III

15:25-16:55Room 5A


Robust/AdaptiveControl of Robotic

Systems

15:25-16:55Room 5B


Space Robotics

15:25-16:55Room 5C


RehabilitationRobotics II

15:25-16:55Room 5D


Motion and PathPlanning II

15:25-16:55Room 5E


Localization andMapping IV

15:25-16:55Room 5F



II

15:25-16:55Room 5H


Field and UnderwaterRobotics II

15:25-16:55Room 5I


Medical Robots andSystems IV

15:25-16:55Room 5J


Visual Navigation IV

15:25-16:55Room 3A


CommunicationSession III:Automation

Technologies

ICRA 2011 Technical Program Wednesday May 11, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

08:20-09:35Room 3B

Regular SessionsWeA101

Aerial Robotics III

08:20-09:35Room 3C


Agent-Based Systems II

08:20-09:35Room 3D


Autonomous NavigationIII

08:20-09:35Room 3E


Range Sensing I

08:20-09:35Room 3G


Slam I

08:20-09:35Room 5A


Micro-Nano Robots I

08:20-09:35Room 5B


Biologically-InspiredRobots IV

08:20-09:35Room 5C


Parallel Robots I

08:20-09:35Room 5D


Novel Actuators I

08:20-09:35Room 5E


Mapping and NavigationI

08:20-09:35Room 5F


Grasping I

08:20-09:35Room 5H


Distributed RobotSystems II

08:20-09:35Room 5I


Medical Robots I

08:20-09:35Room 5J


Computer Vision forRobotics andAutomation I

08:20-09:40Room 3A

Video SessionsWeAV115

Video Session I: Aerialand Mobile Robotics

10:05-11:20Room 3B


Aerial Robotics IV

10:05-11:20Room 3C


Underactuated andTendon/WireMechanisms I

10:05-11:20Room 3D


Autonomous NavigationIV

10:05-11:20Room 3E


Range Sensing II

10:05-11:20Room 3G


Slam Ii

10:05-11:20Room 5A


Micro-Nano Robots II

10:05-11:20Room 5B


Biologically-InspiredRobots V

10:05-11:20Room 5C


Parallel Robots II

10:05-11:20Room 5D


Novel Actuators II

10:05-11:20Room 5E


Mapping and NavigationII

10:05-11:20Room 5F


Grasping II

10:05-11:20Room 5H


Cooperative Control forMultiple Robots

10:05-11:20Room 5I


Medical Robots II

10:05-11:20Room 5J


Computer Vision forRobotics andAutomation II

10:05-11:25Room 3A


Video Session II:Humanoid and Service

Robotics

13:40-14:55Room 3B

Regular SessionsWeP101

Collision Avoidance

13:40-14:55Room 3C


Underactuated andTendon/Wire

Mechanisms II

13:40-14:55Room 3D


Humanoid Robots III

13:40-14:55Room 3E


Range Sensing III

13:40-14:55Room 3G


Slam Iii

13:40-14:55Room 5A


Micro-Nano Robots III

13:40-14:55Room 5B


Biologically-InspiredRobots VI

13:40-14:55Room 5C


Rehabilitation RoboticsIII

13:40-14:55Room 5D


Motion and PathPlanning III

13:40-14:55Room 5E


Industrial Automation

13:40-14:55Room 5F


Physical Human-RobotInteraction I

13:40-14:55Room 5H


Learning and AdaptiveSystems I

13:40-14:55Room 5I


Networked Robots

13:40-14:55Room 5J


Computer Vision forRobotics andAutomation III

13:40-14:55Room 3A


Communication SessionIV: Robotic Applications

I

15:25-16:55Room 3B


Cognitive Human-RobotInteraction

15:25-16:55Room 3C


Kinematics of Serial andParallel Robots

15:25-16:55Room 3D


Humanoid Robots IV

15:25-16:55Room 3E


Recognition II

15:25-16:55Room 3G


Slam Iv

15:25-16:55Room 5A


Micro-Nano Robots IV

15:25-16:55Room 5B


Micro-Nano Robots andApplications to Life

Science

15:25-16:55Room 5C


Rehabilitation RoboticsIV

15:25-16:55Room 5D


Motion and PathPlanning IV

15:25-16:55Room 5E


Surveillance, Search andRescue Robotics

15:25-16:55Room 5F


Physical Human-RobotInteraction II

15:25-16:55Room 5H


Learning and AdaptiveSystems II

15:25-16:55Room 5I


Factory Automation

15:25-16:55Room 5J


Visual Tracking

15:25-16:55Room 3A


Communication SessionV: Robotic Applications

II

ICRA 2011 Technical Program Thursday May 12, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14

08:20-09:35Room 3B

Regular Sessions ThA101

Aerial Robotics V

08:20-09:35Room 3C


Planning, Scheduling andCoordination

08:20-09:35Room 3D


Path Planning for MultipleRobots I

08:20-09:35Room 3E


Sensor Fusion I

08:20-09:35Room 3G


Manipulation Planning I

08:20-09:35Room 5A


Mechanism Design ofMobile Robots I

08:20-09:35Room 5B


Variable StiffnessActuators I

08:20-09:35Room 5C


Rehabilitation Robotics V

08:20-09:35Room 5D


Redundant Robots

08:20-09:35Room 5E


Localization I

08:20-09:35Room 5F


Grasping, Tactile Sensingand Force Control

08:20-09:35Room 5H


Distributed Robot SystemsIII

08:20-09:35Room 5I


Medical Robots andSystems V

08:20-09:35Room 5J


Computer Vision I: Model

10:05-11:20Room 3B


Nonholonomic MotionPlanning

10:05-11:20Room 3C


Robot Design

10:05-11:20Room 3D


Path Planning for MultipleRobots II

10:05-11:20Room 3E


Sensor Fusion II

10:05-11:20Room 3G


Manipulation Planning II

10:05-11:20Room 5A


Mechanism Design ofMobile Robots II

10:05-11:20Room 5B


Variable StiffnessActuators II

10:05-11:20Room 5C


Soft Material Robotics

10:05-11:20Room 5D


New Sensing andMechanism for Robots

10:05-11:20Room 5E


Localization II

10:05-11:20Room 5F


Vision: 3D

10:05-11:20Room 5H


Dexterous Manipulation

10:05-11:20Room 5I


Medical Robots andSystems VI

10:05-11:20Room 5J


Computer Vision II:Recognition

13:40-14:55Room 3B

Regular Sessions ThP101

Multifingered Hands

13:40-14:55Room 3C


Distributed and NetworkedRobot Systems

13:40-14:55Room 3D


Robot Safety

13:40-14:55Room 3E


Reactive andSensor-Based Planning

13:40-14:55Room 3G


Manipulation Planning III

13:40-14:55Room 5A


Mechanism Design ofMobile Robots III

13:40-14:55Room 5B


Variable Stiffness andImpedance Control I

13:40-14:55Room 5C


Robotic Software,Middleware andProgramming

Environments I

13:40-14:55Room 5D


Motion and Path PlanningV

13:40-14:55Room 5E


Localization III

13:40-14:55Room 5F


Visual Servoing I

13:40-14:55Room 5H


Learning and AdaptiveSystems III

13:40-14:55Room 5I


Medical Robots andSystems VII

13:40-14:55Room 5J


Computer Vision IIINavigation

15:25-16:55Room 3B


Motion Control ofManipulators

15:25-16:55Room 3C


Robot Design forAdvanced Applications

15:25-16:55Room 3D


Sensor Networks

15:25-16:55Room 3E


Tactile Sensing andMultifingered Grasping

15:25-16:55Room 3G


Vision for ObjectRecognition

15:25-16:55Room 5A


Micro and NanoscaleAutomation

15:25-16:55Room 5B


Variable Stiffness andImpedance Control II

15:25-16:55Room 5C



Environments II

15:25-16:55Room 5D


Motion and Path PlanningVI

15:25-16:55Room 5E


Wheeled Robots

15:25-16:55Room 5F


Visual Servoing II

15:25-16:55Room 5H


Learning and AdaptiveSystems IV

15:25-16:55Room 5I


Medical Robots andSystems VIII

15:25-16:55Room 5J


Omnidirectional Vision forRobotics

Sessions with “learning papers” (keyword in title or abstract): Learning & AdaptingSystems, Recognition, Motion & Path planning, Grasping, Adaptive Control 7/82

• The field of robotics is huge!

If Robotics was a huge Pizza...ML = chillies; rest = “infrastructure”

• bits and pieces of learning here and thereML on the system level?

• Implications:– good: many possibilities for adaptivity and learning– the integrated system is hard to “formalize”/become subject to ML

8/82

OutlinePart I: Learning problems in Robotics – the RL view

• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics

• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL

• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning

Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion

9/82

Part I:

Learning problems in Robotics – the RL view

10/82

• “The Reinforcement Learning view”:Viewed in the framework of Markov Decision Processes / StochasticControl

→ Unifying notation, accessible to MLers

• BUT:solving the Mountain Car problem 6= solving Robotics

(→ Discussion)

11/82

Markov Decision Process & Optimal Control

state variable

control variable

rewards/costs

u0 u1 u2 uT

xTx2x1x0

r0 r1 r2 rT

P (x0:T , u0:T , r0:T ;π) =

P (x0)P (u0|x0;π)P (r0|u0, x0)∏Tt=1 P (xt|ut-1, xt-1)P (ut|xt;π)P (rt|ut, xt)

State xt

Control/Action ut

Process P (x0) and P (xt+1 |ut, xt)Reward/Cost rt(xt, ut) or ct(xt, ut)

Control policy π(ut |xt) or ut = π(xt) (deterministic)

12/82

Markov Decision Process & Optimal Control

• typical MDP case:– infinite time horizon: T →∞– stationary world & rewards: P (x′ |u, x) and r(x, u) indep. of t– discounting: total return

∑Tt=0 γ

tr(xt, ut)

→ stationary policy π(u|x) indep. of t

• typical Stochastic Optimal Control case:– finite time horizon T– costs depend on absolut time ct(xt, ut)– total costs non-discounted C(x0:T , u0:T ) =

∑Tt=0 ct(xt, ut)

→ non-stationary control policy πt(ut|xt)

13/82

One way to introduce to Robotics basics is to consider three basiccontrol problems:

• 1-step kinematic process

• 1-step dynamic process

• T -step dynamic process

14/82

Kinematics

• We consider a 1-step kinematic control problem, where we know thestate at time t and optimize controls ut to minimize costs at t+1.

State xt ∈ Rn = joint angles

Control ut ∈ Rn = command change in joint angles

Process (deterministic): xt+1 = xt + ut

Costs in the 1-step kinematic case is of the formc(xt+1, ut) = ||ut||2W + ||φ(xt+1)− y∗||2/σ2

15/82

Kinematicsc(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2

• The word kinematics refers to φ(q):– defines to-be-controlled state features (task variables)

LeftHandposition

RightHandposition

– φ is determined by the geometry (“kinematics”) of the robot– Roboticists know how to compute φ(q) and J = ∂

∂qφ(q)

– y∗ says what the targets are16/82

Kinematics

• Using a local linearization of φ, this has a simple solution:

c(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2 , xt+1 = xt + u

c(u) = ||u||2W + ||φ(xt + u)− y∗||2/σ2

= ||u||2W + ||φ(xt)− y∗ + Ju||2/σ2 , J := ∂∂qφ

argminu

c(u) = (J>J + σ2W )-1J> (y∗ − φ(xt))

This choice of control is called inverse kinematics

MLers note: The problem and solution is identical to that of Ridge regression

17/82

Kinematics

[demo: roboticsCourse2 ./x.exe -mode 2 3 4]

What can ML contribute?

18/82

Learning the kinematics

• If the kinematics φ are unknown, learn them from data!

Literature:

Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)

Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)

19/82

Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)

Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)

20/82

Dynamics

• We consider a 1-step dynamic control problem, where we know thestate at time t = 0 and optimize the controls ut to minimize costs att = 1.

State xt = (qt, qt) ∈ R2n joint angles & velocities

Control ut ∈ Rn = torques (angular forces) applied in each joint

Process (deterministic): xt+1 = (I +A) xt +B ut + aA and B are local linearization of the system dynamics, see appendix

Costs in the 1-step dynamic case is of the formc(xt+1, u) = ||u||2H + ||φ(xt+1)− y∗||2/s2

y∗ determines (e.g.) desired accelerations in state features φ

21/82

Dynamics

• Using the local linearizations of the process and the kinematics φ, thishas a simple solution:

c(xt+1, u) = ||u||H2 + ||φ(xt+1)− y∗||2/s2

= ||u||H2 + ||φ(xt) + J(Axu +Bu+ a)− y∗||2/s2

argminu

c(u) = (B>J>JB + σ2W )-1B>J> [y∗ − φ(xt)− J(Axt + a)]

This is optimal 1-step dynamic control

(It includes so-called optimal operational space control as special case, see also Peterset al: A unifying framework for the control of robotics systems (IROS 2005))

22/82

Dynamics

[demo: roboticsCourse3 ./x.exe -control 0 1]


23/82

Learning the dynamics

• If the dynamics x = f(x, u) are unknown, learn them from data!

Literature:

Moore: Acquisition of Dynamic Control Knowledge for a Robotic Manipulator (ICML1990)

Atkeson, Moore & Schaal: Locally weighted learning for control. Artificial IntelligenceReview, 1997.

Schaal, Atkeson & Vijayakumar: Real-Time Robot Learning with Locally WeightedStatistical Learning. (ICRA 2000)

Vijayakumar et al: Statistical learning for humanoid robots, Autonomous Robots, 2002.

24/82

(Schaal, Atkeson, Vijayakumar)

• Use a simple regression method (locally weighted Linear Regression)to estimate x = f(x, u)

25/82

Beyond 1-step horizon: Motion Planning & Skills

• The 1-step processes with quadratic costs can, with local linearization,be solved analytically.– Basic role of learning: learning the model (kinematic or dynamics)

• Planning (multi-step processes) cannot be solved in that way – weneed the notion of a value function or cost-to-go function.

26/82

Stochastic Optimal Control

• In the multi-step case of horizon T , the cost function is of the form

C(x0:T , u0:T ) =

T∑t=0

ct(xt, ut)

• Define the optimal value function (aka cost-to-go function)

Vt(xt) = minut:T

∑Tk=t 〈ck(xk, uk)〉xk|ut:k,xt

= minut

[ct(xt, ut) + min

ut+1:T

∑Tk=t-1 〈ck(xk, uk)〉xk|ut:k,xt

]= min

ut

[ct(xt, ut) +

∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1

]u∗t = argmin

ut

[ct(xt, ut) +

∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1

]Bellman optimality principle

• Dynamic Programming: Compute Vt(x) backward, starting withVT+1(x)=0

27/82

Stochastic Optimal Control

[demo: git/mlr/share/robot/10-optimizationBenchmarks]

(Here, we optimized the control using probabilistic inference – see later.)


28/82

Five approaches to learning optimal control

Po

lic

y S

ea

rch

Mo

de

l−fr

ee

RL

Mo

de

l−b

as

ed

RL

Inv

ers

e R

L

Imit

ati

on

Le

arn

ing

learn value fct.V (x)

policyπ(x)

optimize policy learn latent costsc(x)

dynamic prog.

π(x)policy

learn policyπ(x)

policy

learn model

πt(x)

P (x′|u, x)c(x, u)

dynamic prog.Vt(x) Vt

π(x)

demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )

d}nd=1

29/82






30/82

1. Model learning (model-based RL)

D = {(xt, ut, ct)}Tt=0learn→ P (x′|u, x)

DP→ Vt(x) → πt(x)

Literature:

Exactly as for Learning Dynamics & Kinematics in 1-step case

31/82

2. Model-free RL

D = {(xt, ut, rt)}Tt=0learn→ Vt(x) → πt(x)

• Use ML to directly estimate the value function V (x)

Literature:

Gordon: Stable function approximation in dynamic programming. DTIC Document, 1995.

Lagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).

Rasmussen & Kuss: Gaussian Processes in Reinforcement Learning (NIPS 2004)

Engel, Mannor & Meir: Reinforcement Learning with Gaussian Processes. (ICML 2005)

Mahadevan & Maggioni: Proto-Value Functions: A Laplacian Framework for LearningRepresentation and Control in Markov Decision Processes (JMLR 2007)

32/82

LSPI: Least Squares Policy IterationLagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).

(I’ll explain it here in terms of the value function instead of Q-function.)

• The value function fulfils

V (x) = r(x, π(x)) + γ∑x′

P (x′ |π(x), x) V (x′)

• If we have n data points D = {(xt, ut, rt, xt+1)}nt=1, we require that thisequation holds (approximatly) for these n data points:

∀t : V (xt) = rt + γV (xt+1)

• Written in vector notation: V = R+ gV with N -dim data vectorsV ,R, V

• Written as optmization: minimize the Bellman residual error

L(V ) =

T∑t=1

[V (xt)− rt − γV (xt+1)]2 = ||R− V + γV ||2

33/82

LSPI: Least Squares Policy Iteration• Approximate V (x) as linear in k features φj :

V (x) =∑kj=1 φj(x)βj = φ(x)>β

Then

V = φβ , φtj = φj(xt) ∈ RT×k

T = φβ , φtj = φt(xt+1) ∈ RT×k

• the loss becomes

L(β) = ||R− (φ− γφ)>β||2

→ has analytic solution!

• Like regression, butsquared error in supervised learning → Bellman residual error

details on simplifications made: see Appendix34/82

LSPI: Riding a bike

from Lagoudakis & Parr (JMLR 2003)35/82

LSPI: Riding a bike

from Lagoudakis & Parr (JMLR 2003)36/82

3. Policy Search

D = {(xt, ut, ct)}Tt=0optimize→ πt(x)

• Use ML to directly optimize the policy π(u|x) based on data

Literature:

Peters & Schaal: Reinforcement Learning of Motor Skills with Policy Gradients (NeuralNetworks 2008)

Kober & Peters: Policy Search for Motor Primitives in Robotics (NIPS 2008)

Moriarty, Schultz & Grefenstette: Evolutionary algorithms for reinforcement learning(JAIR 1999)

37/82

Policy Search using policy gradients

• In continuous state/action case, represent the policy as linear inarbitrary state features:

π(x) =

k∑j=1

φj(x)βj = φ(x)>β (deterministic)

π(u |x) = N(u |φ(x)>β,Σ) (stochastic)

with k features φj .

• Given Data D = {(xt, ut, rt)}Tt=0 we want to estimate

∂V (β)

∂β

38/82

Policy Search using policy gradients• One approach is called REINFORCE:

∂V (β)

∂β= ∂

∂β

∫P (ξ|β) R(ξ) dξ =

∫P (ξ|β) ∂

∂β logP (ξ|β)R(ξ)dξ

= Eξ|β{ ∂∂β logP (ξ|β)R(ξ)}

= Eξ|β{T∑t=0

γt∂ log π(ut|xt)

∂β

T∑t′=t

γt′−trt′︸︷︷︸

Qπ(xt,ut,t)

}

• PoWER (Kober & Peters) and Monte Carlo EM (Vlassis & Toussaint)are similar, but try to make a full “M-step” instead of only a gradientstep.

See: Peters & Schaal (2008): Reinforcement learning of motor skills with policygradients, Neural Networks.

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo EMAlgorithm. Autonomous Robots 27, 123-130. 39/82

Policy Search using policy gradients

Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.

40/82

Five approaches to learning optimal control

Po

lic

y S

ea

rch

Mo

de

l−fr

ee

RL

Mo

de

l−b

as

ed

RL

Inv

ers

e R

L

Imit

ati

on

Le

arn

ing

learn value fct.V (x)

policyπ(x)

optimize policy learn latent costsc(x)

dynamic prog.

π(x)policy

learn policyπ(x)

policy

learn model

πt(x)

P (x′|u, x)c(x, u)

dynamic prog.Vt(x) Vt

π(x)

demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )

d}nd=1

41/82

4. Imitation Learning

D = {(x0:T , u0:T )d}nd=1

learn/copy→ πt(x)

• Use ML to imitate demonstrated state trajectories x0:T

Literature:

Atkeson & Schaal: Robot learning from demonstration (ICML 1997)

Schaal, Ijspeert & Billard: Computational approaches to motor learning by imitation(Philosophical Transactions of the Royal Society of London. Series B: BiologicalSciences 2003)

Grimes, Chalodhorn & Rao: Dynamic Imitation in a Humanoid Robot throughNonparametric Probabilistic Inference. (RSS 2006)

Rudiger Dillmann: Teaching and learning of robot tasks via observation of humanperformance (Robotics and Autonomous Systems, 2004)

42/82

Imitation Learning

• There a many ways to imitate/copy the oberved policy:

Learn a density model P (ut |xt)P (xt) (e.g., with mixture of Gaussians)from the observed data and use it as policy (Billard et al.)

Or trace observed trajectories by minimizing perturbation costs(Atkeson & Schaal 1997)

43/82

Imitation Learning

Atkeson & Schaal44/82

5. Inverse RL

D = {(x0:T , u0:T )d}nd=1learn→ r(x, u)

DP→ Vt(x) → πt(x)

• Use ML to “uncover” the latent reward function in observed behavior

Literature:

Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse reinforcement learning(ICML 2004)

Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learning (ICML2000)

Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse Feedback Control(ICML 2011).

45/82

Inverse RL (Apprenticeship Learning)

• Given: demonstrations D = {xd0:T }nd=1

• Try to find a reward function that discriminates demonstrations fromother policies– Assume the reward function is linear in some features R(x) = w>φ(x)

– Iterate:

1. Given a set of candidate policies {π0, π1, ..}2. Find weights w that maximize the value margin between teacher and all

other candidates

maxw,ξ

ξ

s.t. ∀πi : w>〈φ〉D︸︷︷︸value of demonstrations

≥ w>〈φ〉πi︸︷︷︸value of πi

+ξ

||w||2 ≤ 1

3. Compute a new candidate policy πi that optimizes R(x) = w>φ(x) andadd to candidate list.

(Abbeel & Ng, ICML 2004)46/82

47/82






48/82

6. Exploration

• Active Learning is a form of ML where the algorithm can query the nextdata point.

→ explore where “smoothed” empirical distribution is low

49/82

Exploration

• Exploration in robotics is tricky: can’t just pick the next point, but needto control system into state of interest.

Literature (very diverse selection):

Schaal & Atkeson: Robot juggling: An implementation of memory-based learning[Shifting Setpoint Algorithm] (Control Systems Magazine 1994)

Nouri & Littman: Dimension reduction and its application to model-based exploration incontinuous spaces (Machine Learning 2010)

Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)

Katz, Pyuro & Brock: Learning to Manipulate Articulated Objects in UnstructuredEnvironments Using a Grounded Relational Representation (RSS 2008)

Hsiao, Kaelbling & Lozano-Perez: Grasping POMDPs (ICRA 2007)

Saxena et al: Learning to grasp novel objects using vision (ISER 2006)

Oudeyer et al: The playground experiment: Task-independent development of a curiousrobot (Symposium on Developmental Robotics 2005)

50/82

Exploration

• R-max is a simple exploration strategy, that associates high value tostate-action pairs not often visites (optimism)

• We can use ML to approximate this optimistic value function.

Fitted R-max on the Mountain Car problem:

Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)

51/82

Physical ExplorationDov Katz and Oliver Brock 2010: Manipulating Articulated Objects WithInteractive Perception

52/82

7. Probabilistic Inference for Control & Planning

• Bellman’s optimality principle is one approach to optimal control:

Vt(x) = minut

[ct(xt, ut) + Ex′|ut,x{V (x′)}

]

• Reductions to Probabilistic Inference:

Toussaint & Goerick: Probabilistic inference for structured planning in robotics (IROS2007).

Todorov: General duality between optimal control and estimation (Decision and Control,2008)

Toussaint: Robot Trajectory Optimization using Approximate Inference (ICML 2009).

Kappen, Gomez & Opper: Optimal control as a graphical model inference problem(arXiv:0901.0633, 2009)

Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)

53/82

Approximate Inference Control

state variable

control variable

rewards/costs

u0 u1 u2 uT

xTx2x1x0

r0 r1 r2 rT

• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}

• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )

• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs

〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )

Expectation Maximization↔ Stochastic Optimal Control

54/82


state variable

control variable

rewards/costs

uTu2u1u0

x0 x1 x2 xT

zTz2z1z0

• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}

• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )

• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs

〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )

Expectation Maximization↔ Stochastic Optimal Control

54/82


state variable

control variable

rewards/costs

uTu2u1u0

x0 x1 x2 xT

zTz2z1z0• Distinguish 3 different processes:

prior P (x0:T , u0:T ) := P (x0)T∏t=0

P (ut |xt)T∏t=1

P (xt |ut-1xt-1)

controlled qπ(x0:T , u0:T ) := P (x0)T∏t=0

δut=πt(xt)

T∏t=1

P (xt |ut-1xt-1)

posterior p(x0:T , u0:T ) :=P (x0)

P (z0:T =1)

T∏t=0

P (ut |xt)T∏t=1

P (xt |ut-1xt-1)T∏t=0

exp{−ct(xt, ut)}

For uniform P (ut |xt)D(qπ∣∣∣∣ p) = logP (z0:T =1) + Eqπ(x0:T ){C(x0:T , π(x0:T ))}

Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)

55/82


Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)

56/82






57/82

• Most (if not all) examples concerned control of own or attached DoFs

What about really controlling a natural environment?

58/82

• That might be a natural environment.How can we control (e.g., clean) this?

59/82

Learning a model from object interactionsD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...


grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...

.

.

.

}• How can we learn a predictive model P (x′ |u, x) for this data?

60/82

Learning a model from object interactions

• Differences to model learning in motion control:– symbolic representation of state– exponential state space

Object Abstraction Assumption: The world is made up ofobjects, and the effects of actions on these objects generallydepend on their attributes rather than their identities.

Pasula, Zettlemoyer & Kaelbling (ICAPS 2004)

• Wanted: generalization across objects

→ Statistical Relational Learning

61/82

Statistical Relational Learning (SRL)

• See ECML/PKDD 2007 tutorial by Lise Getoorhttp://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf

• “Probabilistic learning & inference on 1st order logic representations”

– very strong generalization across objects– in my view: the currently only way to express & learn uncertainknowledge about environments with objects & properties/relations

SRL + Robotics = perfect match!

62/82

http://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf

Learning a model from interaction dataD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...

puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...


puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...


grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...

→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...

.

.

.

}• How can we learn a predictive model P (x′ |u, x) for this data?

63/82

A form of Statistical Relational LearningPasula, Zettlemoyer & Kaelbling: Learning probabilistic relational planning rules (ICAPS2004)

• compress this data into probabilistic relational rules (Pasula et al.):pickup(X,Y) : on(X,Y), clear(X), inhand(NIL),block(Y)

→

.7 :inhand(X),¬clear(X),¬inhand(NIL),¬on(X,Y), clear(Y)

.2 : on(X, TABLE),¬on(X,Y), clear(Y)

.1 : no changepickup(X, TABLE) : on(X,TABLE), clear(X), inhand(NIL)

→

(.66 :

inhand(X),¬clear(X),¬inhand(NIL),¬on(X,TABLE)

.34 : no changeputon(X,Y) : clear(Y), inhand(X),block(Y)

→

.7 :inhand(NIL),¬clear(Y),¬inhand(X),on(X,Y), clear(X)

.2 :on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)

.1 : no changeputon(X,TABLE) : inhand(X)

→

(.8 :

on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)

.2 : no change

• Find a rule set that maximizes (likelihood - description length)• Opportunities for reducing description length:

– Frame Assumption:Actions only influence few predicates; most predicates remain unchanged

– Abstraction: Introducing novel predicates.– Uncertainty! 64/82

Role of uncertainty in these rules

(b)(a)

⇒ uncertainty↔ regularization↔ compression & abstraction

• Introducing uncertainty in the rules not only allows us to modelstochastic worlds, it enables to compress/regularize and therebylearn strongly generalizing models!

65/82

Planning by inference in relational domains

• Once the model is learnt, using it (planning) is hard

• SST & UCT do not scale with # objects

→ Use Planning-by-Inference:

model

depending on:situation

relevance

modelrule-based DBN

one representation good for learning, another good for planning

(Lang & Toussaint, JAIR 2010)66/82

Planning by inference in relational domains

(we’re using factored frontier for approx. inference)

→ Advances in Lifted Inference could translate to better robotmanipulation planning.

67/82

ApplicationRandom exploration:

Planning:

Real-world:

Lang & Toussaint: Planning with Noisy Probabilistic Relational Rules (JAIR 2010)

Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)

68/82

Relational Exploration

• The state space is inherently exponential in the # objects. How couldwe realize strategies like E3 or R-max in relational domains?

• Key insight:

strong generalization of model↔

strong implication on what is considered novel / is explored

For instance, if you’ve seen a red, green and yellow ball rolling, will you explore whetherthe blue ball also rolls? Or rather explore something totally different, like dropping a bluebox?

69/82

Relational Exploration

• Transfer Explicit Explore or Exploit (E3) to Relational Domains

• Representations to formulate an “empirical distribution” (non-novelty)

propositional P (s) ∝ cD(s)

distance based Pd(s) ∝ exp{− min(se,ae,s′e)∈D

d(s, se)2}

predicate-based Pp(s) ∝ cp(s) I(s |= p) + c¬p(s) I(s |= ¬p)context-based Pφ(s) ∝

∑φ∈Φ

cD(φ) I(∃σ : s |= σ(φ))

(contexts ↔ set of LHSs of rules)

70/82

ApplicationOnline Relational explore-exploit:

Lang, Toussaint & Kersting: Exploration in Relational Worlds (ECML 2010)

71/82






72/82

Three conclucing comments:

– Scaling RL?– Relational Learning in Robotics– Whole Pizza vs. Chillies

73/82

Scaling RL?

• Recommended: Satinder Singh’s “Myths of RL”:http://umichrl.pbwiki.com/Myths-of-Reinforcement-Learning

1. Large state spaces are hard for RL2. RL is slow3. RL does not have (m)any success stories since TDgammon4. RL does not work well with function approximation5. Value function approximation does not work6. Non-Markovianness invalidates standard RL methods7. POMDPs are hard for RL to deal with8. RL is about learning optimal policies

• The first half of this tutorial discussed many success stories of the RLapproach to Robotics.

74/82

http://umichrl.pbwiki.com/Myths-of-Reinforcement-Learning

Wolfgang Kohler (1917)Intelligenzprufungen amMenschenaffenThe Mentality of Apes

[movie]

75/82

Scaling RL?

The real world is not a scaled up versionof the Mountain Car Problem.

• In other terms: What should we scale with?– The size of the state space?

It’ll always be exponential – is this the right view?

– The number of objects?How many objects can humans mentally manipulate (=plan with)?

– Scaling with the planning horizon?But on the right level of abstraction, horizons are short.

• What we need in Robotics areExploration, learning and goal-directed behavior that exploit thestructure of natural environments.

76/82

Relational Learning in Robotics

↔

• Many of the successful learning methods concern motion control. There is (inmy impression) less methods for learning to control/manipulate worlds of manyobjects – worlds as they are classically described in AI.

• What Robotics needs is a fusion of “this type of AI”, Machine Learning andControl methods

77/82

Relational Learning in Robotics

• A popular science article:I, algorithm: A new dawn for artificial intelligence

(Anil Ananthaswamy, NewScientist, January 2011)

Talks of “probabilistic programming, which combines the logical underpinningsof the old AI with the power of statistics and probability.” Cites Stuart Russel as“It’s a natural unification of two of the most powerful theories that have beendeveloped to understand the world and reason about it.” and Josh Tenenbaumas “It’s definitely spring”.

• My impression: Exactly these kinds of developments give new hope forRobots to explore, learn and plan in our natural world, composed ofobjects.

78/82

Whole Pizza vs. Chillies2011 IEEE International Conference on Robotics and Automation

ICRA 2011 Technical Program Tuesday May 10, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1

08:20-09:35Room 3B


Aerial Robotics I

08:20-09:35Room 3C


Agent-Based SystemsI

08:20-09:35Room 3D


AutonomousNavigation I

08:20-09:35Room 3E

Invited SessionsTuA104

ICRA RobotChallenge: Advancing

Research throughCompetitions

08:20-09:35Room 3G


Advanced RobotControl

08:20-09:35Room 5A


Behaviour-BasedSystems

08:20-09:35Room 5B


Biologically-InspiredRobots I

08:20-09:35Room 5C


Calibration andIdentification I

08:20-09:35Room 5D


Cellular and ModularRobots I

08:20-09:35Room 5E


Localization andMapping I

08:20-09:35Room 5F


Flexible Arms/Robots

08:20-09:35Room 5H


Distributed RobotSystems I

08:20-09:35Room 5I


Medical Robots andSystems I

08:20-09:35Room 5J


Visual Navigation I

08:20-09:35Hall

Poster SessionsTuA1-InteracInterac

Interactive Session I:Robotic Technology

10:05-11:20Room 3B


Aerial Robotics II

10:05-11:20Room 3C


Climbing Robots

10:05-11:20Room 3D


AutonomousNavigation II

10:05-11:20Room 3E


Human Detectionand Tracking I

10:05-11:20Room 3G


Teleoperation I

10:05-11:20Room 5A


Haptics and HapticInterfaces I

10:05-11:20Room 5B


Biologically-InspiredRobots II

10:05-11:20Room 5C


Calibration andIdentification II

10:05-11:20Room 5D


Cellular and ModularRobots II

10:05-11:20Room 5E


Localization andMapping II

10:05-11:20Room 5F


Direct/InverseDynamics

Formulation

10:05-11:20Room 5H


Force and TactileSensing

10:05-11:20Room 5I


Medical Robots andSystems II

10:05-11:20Room 5J


Visual Navigation II

10:05-11:20Room 3A


CommunicationSession I:

Architecture andSoftware for Robotic

Systems

13:40-14:55Room 3B


Personal and ServiceRobots

13:40-14:55Room 3C


Multi-Legged Robots

13:40-14:55Room 3D


Humanoid Robots I

13:40-14:55Room 3E


Human Detectionand Tracking II

13:40-14:55Room 3G


Teleoperation II

13:40-14:55Room 5A


Haptics and HapticInterfaces II

13:40-14:55Room 5B


Biologically-InspiredRobots III

13:40-14:55Room 5C


RehabilitationRobotics I

13:40-14:55Room 5D


Motion and PathPlanning I

13:40-14:55Room 5E


Localization andMapping III

13:40-14:55Room 5F



I

13:40-14:55Room 5H


Field and UnderwaterRobotics I

13:40-14:55Room 5I


Medical Robots andSystems III

13:40-14:55Room 5J


Visual Navigation III

13:40-14:55Room 3A


CommunicationSession II: Industrial

Manipulators

13:40-14:55Hall

Poster SessionsTuP1-InteracInterac

Interactive Session II:Systems, Control and

Automation

13:40-14:55Room T1

Poster SessionsTuP1-InteracT1

Buffer Session 1

15:25-16:55Room 3B


Human andMulti-RobotInteraction

15:25-16:55Room 3C


Legged Locomotion

15:25-16:55Room 3D


Humanoid Robots II

15:25-16:55Room 3E


Recognition I

15:25-16:55Room 3G


Teleoperation III

15:25-16:55Room 5A


Robust/AdaptiveControl of Robotic

Systems

15:25-16:55Room 5B


Space Robotics

15:25-16:55Room 5C


RehabilitationRobotics II

15:25-16:55Room 5D


Motion and PathPlanning II

15:25-16:55Room 5E


Localization andMapping IV

15:25-16:55Room 5F



II

15:25-16:55Room 5H


Field and UnderwaterRobotics II

15:25-16:55Room 5I


Medical Robots andSystems IV

15:25-16:55Room 5J


Visual Navigation IV

15:25-16:55Room 3A


CommunicationSession III:Automation

Technologies

ICRA 2011 Technical Program Wednesday May 11, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

08:20-09:35Room 3B


Aerial Robotics III

08:20-09:35Room 3C


Agent-Based Systems II

08:20-09:35Room 3D


Autonomous NavigationIII

08:20-09:35Room 3E


Range Sensing I

08:20-09:35Room 3G


Slam I

08:20-09:35Room 5A


Micro-Nano Robots I

08:20-09:35Room 5B


Biologically-InspiredRobots IV

08:20-09:35Room 5C


Parallel Robots I

08:20-09:35Room 5D


Novel Actuators I

08:20-09:35Room 5E


Mapping and NavigationI

08:20-09:35Room 5F


Grasping I

08:20-09:35Room 5H


Distributed RobotSystems II

08:20-09:35Room 5I


Medical Robots I

08:20-09:35Room 5J


Computer Vision forRobotics andAutomation I

08:20-09:40Room 3A


Video Session I: Aerialand Mobile Robotics

10:05-11:20Room 3B


Aerial Robotics IV

10:05-11:20Room 3C


Underactuated andTendon/WireMechanisms I

10:05-11:20Room 3D


Autonomous NavigationIV

10:05-11:20Room 3E


Range Sensing II

10:05-11:20Room 3G


Slam Ii

10:05-11:20Room 5A


Micro-Nano Robots II

10:05-11:20Room 5B


Biologically-InspiredRobots V

10:05-11:20Room 5C


Parallel Robots II

10:05-11:20Room 5D


Novel Actuators II

10:05-11:20Room 5E


Mapping and NavigationII

10:05-11:20Room 5F


Grasping II

10:05-11:20Room 5H


Cooperative Control forMultiple Robots

10:05-11:20Room 5I


Medical Robots II

10:05-11:20Room 5J


Computer Vision forRobotics andAutomation II

10:05-11:25Room 3A


Video Session II:Humanoid and Service

Robotics

13:40-14:55Room 3B


Collision Avoidance

13:40-14:55Room 3C


Underactuated andTendon/Wire

Mechanisms II

13:40-14:55Room 3D


Humanoid Robots III

13:40-14:55Room 3E


Range Sensing III

13:40-14:55Room 3G


Slam Iii

13:40-14:55Room 5A


Micro-Nano Robots III

13:40-14:55Room 5B


Biologically-InspiredRobots VI

13:40-14:55Room 5C


Rehabilitation RoboticsIII

13:40-14:55Room 5D


Motion and PathPlanning III

13:40-14:55Room 5E


Industrial Automation

13:40-14:55Room 5F


Physical Human-RobotInteraction I

13:40-14:55Room 5H


Learning and AdaptiveSystems I

13:40-14:55Room 5I


Networked Robots

13:40-14:55Room 5J


Computer Vision forRobotics andAutomation III

13:40-14:55Room 3A


Communication SessionIV: Robotic Applications

I

15:25-16:55Room 3B


Cognitive Human-RobotInteraction

15:25-16:55Room 3C


Kinematics of Serial andParallel Robots

15:25-16:55Room 3D


Humanoid Robots IV

15:25-16:55Room 3E


Recognition II

15:25-16:55Room 3G


Slam Iv

15:25-16:55Room 5A


Micro-Nano Robots IV

15:25-16:55Room 5B


Micro-Nano Robots andApplications to Life

Science

15:25-16:55Room 5C


Rehabilitation RoboticsIV

15:25-16:55Room 5D


Motion and PathPlanning IV

15:25-16:55Room 5E


Surveillance, Search andRescue Robotics

15:25-16:55Room 5F


Physical Human-RobotInteraction II

15:25-16:55Room 5H


Learning and AdaptiveSystems II

15:25-16:55Room 5I


Factory Automation

15:25-16:55Room 5J


Visual Tracking

15:25-16:55Room 3A


Communication SessionV: Robotic Applications

II

ICRA 2011 Technical Program Thursday May 12, 2011

01 02 03 04 05 06 07 08 09 10 11 12 13 14

08:20-09:35Room 3B


Aerial Robotics V

08:20-09:35Room 3C


Planning, Scheduling andCoordination

08:20-09:35Room 3D


Path Planning for MultipleRobots I

08:20-09:35Room 3E


Sensor Fusion I

08:20-09:35Room 3G


Manipulation Planning I

08:20-09:35Room 5A


Mechanism Design ofMobile Robots I

08:20-09:35Room 5B


Variable StiffnessActuators I

08:20-09:35Room 5C


Rehabilitation Robotics V

08:20-09:35Room 5D


Redundant Robots

08:20-09:35Room 5E


Localization I

08:20-09:35Room 5F


Grasping, Tactile Sensingand Force Control

08:20-09:35Room 5H


Distributed Robot SystemsIII

08:20-09:35Room 5I


Medical Robots andSystems V

08:20-09:35Room 5J


Computer Vision I: Model

10:05-11:20Room 3B


Nonholonomic MotionPlanning

10:05-11:20Room 3C


Robot Design

10:05-11:20Room 3D


Path Planning for MultipleRobots II

10:05-11:20Room 3E


Sensor Fusion II

10:05-11:20Room 3G


Manipulation Planning II

10:05-11:20Room 5A


Mechanism Design ofMobile Robots II

10:05-11:20Room 5B


Variable StiffnessActuators II

10:05-11:20Room 5C


Soft Material Robotics

10:05-11:20Room 5D


New Sensing andMechanism for Robots

10:05-11:20Room 5E


Localization II

10:05-11:20Room 5F


Vision: 3D

10:05-11:20Room 5H


Dexterous Manipulation

10:05-11:20Room 5I


Medical Robots andSystems VI

10:05-11:20Room 5J


Computer Vision II:Recognition

13:40-14:55Room 3B


Multifingered Hands

13:40-14:55Room 3C


Distributed and NetworkedRobot Systems

13:40-14:55Room 3D


Robot Safety

13:40-14:55Room 3E


Reactive andSensor-Based Planning

13:40-14:55Room 3G


Manipulation Planning III

13:40-14:55Room 5A


Mechanism Design ofMobile Robots III

13:40-14:55Room 5B


Variable Stiffness andImpedance Control I

13:40-14:55Room 5C



Environments I

13:40-14:55Room 5D


Motion and Path PlanningV

13:40-14:55Room 5E


Localization III

13:40-14:55Room 5F


Visual Servoing I

13:40-14:55Room 5H


Learning and AdaptiveSystems III

13:40-14:55Room 5I


Medical Robots andSystems VII

13:40-14:55Room 5J


Computer Vision IIINavigation

15:25-16:55Room 3B


Motion Control ofManipulators

15:25-16:55Room 3C


Robot Design forAdvanced Applications

15:25-16:55Room 3D


Sensor Networks

15:25-16:55Room 3E


Tactile Sensing andMultifingered Grasping

15:25-16:55Room 3G


Vision for ObjectRecognition

15:25-16:55Room 5A


Micro and NanoscaleAutomation

15:25-16:55Room 5B


Variable Stiffness andImpedance Control II

15:25-16:55Room 5C



Environments II

15:25-16:55Room 5D


Motion and Path PlanningVI

15:25-16:55Room 5E


Wheeled Robots

15:25-16:55Room 5F


Visual Servoing II

15:25-16:55Room 5H


Learning and AdaptiveSystems IV

15:25-16:55Room 5I


Medical Robots andSystems VIII

15:25-16:55Room 5J


Omnidirectional Vision forRobotics

ICRA 2011 schedule79/82

Whole Pizza vs. Chillies• Integrated robotic systems are huge in terms of

– lines of code– methods/disciplines/formalizations involved

Beetz et al: The Assistive Kitchen – A Demonstration Scenario for Cognitive TechnicalSystems

• Existing learning methods tend to address only isolated aspects –identified and formalized by the engineer.

• Machine Learning is used to work with a “full formalization” of thedomain.Will this ever be possible for Robotics?Can we apply ML on the full system level? 80/82

Robotics as big graphical model?(where “graphical model” would include relational structures...)

• Many aspects (Computer Vision, Perception, Feedback Control,Stochastic Optimal Control, MDPs) can already be formalized in termsof probabilistic models and inference.

Would it help to view it all as one big graphical model?

81/82

Thanks for your attention!

82/82

Appendix• Robot dynamics can be desrcribed by the differential equationM(q) q + C(q, q) q + F (q) = u with mass matrix M(q), Coriolis forces C(q, q) andgravity forces F (q). The Newton-Euler algorithm can efficiently (numerically) computeM,C and F for any specific robot configuration (q, q). Using (e.g.) Leap Frog integrationwith time step size τ , the process becomes

qt+1 = qt + τ(qt+1 + qt)/2

qt+1 = qt + τM-1(ut + Cqt + F ) ,qt+1

qt+1

= (I+A)

qtqt

+Bu+ a

A =

0 τ + τ2M-1C/20 τM-1C

, B =

τ2M-1/2τM-1

, a =

τ2M-1F/2τM-1F

• LSPI simplifications made: Actually, LSPI estimates the Q(s, a)-function instead of theV (s)-function, which represents the expected return given the current state and selectedaction. Second, I skipped explaining Policy Iteration: once you estimated the Q-function,you need to update the policy (perhaps collect new data) and iterate re-estimating theQ-function.

83/82

Documents

Machine Learning and Robotics · Machine Learning & Robotics tutorial = mission impossible! – impossible to cover all topics !biased selection – impossible to cover all literature