Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Machine Learning
and Robotics
ICML 2011 tutorialBellevue, 28th June 2011
Marc ToussaintFU Berlin
• Please ask questions!
Machine Learning & Robotics tutorial = mission impossible!
– impossible to cover all topics → biased selection– impossible to cover all literature → sorry if I miss your work!
• no emphasis on SLAM & vision– more emphasis on contol, articulated robots, manipulation
• Goals of this tutorial:– Provide an overview on learning problems in Robotics
... where ML can/has contributed and mention literature– Encourage Machine Learners to think more about Robotics
... to understand inherent problems in robotics
... to consider formalizing the specific structure of robotic problems
2/82
First, two little comments on Roboticists vs. MLers...
3/82
Sure!Where's the data?What data do
you need?
I'm tired of programming my robot!Can't you make it learn?
Shouldn'tyou know?
4/82
• Robotics is about interaction with the environment
– Collected data depends on actions– Goal of learning: enable behavior!
(sequential decision making, long horizon control)
5/82
• Implications:– Benchmarking a method involves running the system!– Different to ML in Computer Vision (Pascal challenge, Middlebury),
or other standard benchmarking in ML
– no pipeline: dataapplication
expert
ML
expertresults
– only few examples where pure “learning from a data set”is useful in robotics (e.g, calibration, system identification, SLAM)
• This slows learning research in Robotics!
6/82
2011 IEEE International Conference on Robotics and Automation
ICRA 2011 Technical Program Tuesday May 10, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1
08:20-09:35Room 3B
Regular SessionsTuA101
Aerial Robotics I
08:20-09:35Room 3C
Regular SessionsTuA102
Agent-Based SystemsI
08:20-09:35Room 3D
Regular SessionsTuA103
AutonomousNavigation I
08:20-09:35Room 3E
Invited SessionsTuA104
ICRA RobotChallenge: Advancing
Research throughCompetitions
08:20-09:35Room 3G
Regular SessionsTuA105
Advanced RobotControl
08:20-09:35Room 5A
Regular SessionsTuA106
Behaviour-BasedSystems
08:20-09:35Room 5B
Regular SessionsTuA107
Biologically-InspiredRobots I
08:20-09:35Room 5C
Regular SessionsTuA108
Calibration andIdentification I
08:20-09:35Room 5D
Regular SessionsTuA109
Cellular and ModularRobots I
08:20-09:35Room 5E
Regular SessionsTuA110
Localization andMapping I
08:20-09:35Room 5F
Regular SessionsTuA111
Flexible Arms/Robots
08:20-09:35Room 5H
Regular SessionsTuA112
Distributed RobotSystems I
08:20-09:35Room 5I
Regular SessionsTuA113
Medical Robots andSystems I
08:20-09:35Room 5J
Regular SessionsTuA114
Visual Navigation I
08:20-09:35Hall
Poster SessionsTuA1-InteracInterac
Interactive Session I:Robotic Technology
10:05-11:20Room 3B
Regular SessionsTuA201
Aerial Robotics II
10:05-11:20Room 3C
Regular SessionsTuA202
Climbing Robots
10:05-11:20Room 3D
Regular SessionsTuA203
AutonomousNavigation II
10:05-11:20Room 3E
Regular SessionsTuA204
Human Detectionand Tracking I
10:05-11:20Room 3G
Regular SessionsTuA205
Teleoperation I
10:05-11:20Room 5A
Regular SessionsTuA206
Haptics and HapticInterfaces I
10:05-11:20Room 5B
Regular SessionsTuA207
Biologically-InspiredRobots II
10:05-11:20Room 5C
Regular SessionsTuA208
Calibration andIdentification II
10:05-11:20Room 5D
Regular SessionsTuA209
Cellular and ModularRobots II
10:05-11:20Room 5E
Regular SessionsTuA210
Localization andMapping II
10:05-11:20Room 5F
Regular SessionsTuA211
Direct/InverseDynamics
Formulation
10:05-11:20Room 5H
Regular SessionsTuA212
Force and TactileSensing
10:05-11:20Room 5I
Regular SessionsTuA213
Medical Robots andSystems II
10:05-11:20Room 5J
Regular SessionsTuA214
Visual Navigation II
10:05-11:20Room 3A
Regular SessionsTuA215
CommunicationSession I:
Architecture andSoftware for Robotic
Systems
13:40-14:55Room 3B
Regular SessionsTuP101
Personal and ServiceRobots
13:40-14:55Room 3C
Regular SessionsTuP102
Multi-Legged Robots
13:40-14:55Room 3D
Regular SessionsTuP103
Humanoid Robots I
13:40-14:55Room 3E
Regular SessionsTuP104
Human Detectionand Tracking II
13:40-14:55Room 3G
Regular SessionsTuP105
Teleoperation II
13:40-14:55Room 5A
Regular SessionsTuP106
Haptics and HapticInterfaces II
13:40-14:55Room 5B
Regular SessionsTuP107
Biologically-InspiredRobots III
13:40-14:55Room 5C
Regular SessionsTuP108
RehabilitationRobotics I
13:40-14:55Room 5D
Regular SessionsTuP109
Motion and PathPlanning I
13:40-14:55Room 5E
Regular SessionsTuP110
Localization andMapping III
13:40-14:55Room 5F
Regular SessionsTuP111
Marine andUnderwater Robotics
I
13:40-14:55Room 5H
Regular SessionsTuP112
Field and UnderwaterRobotics I
13:40-14:55Room 5I
Regular SessionsTuP113
Medical Robots andSystems III
13:40-14:55Room 5J
Regular SessionsTuP114
Visual Navigation III
13:40-14:55Room 3A
Regular SessionsTuP115
CommunicationSession II: Industrial
Manipulators
13:40-14:55Hall
Poster SessionsTuP1-InteracInterac
Interactive Session II:Systems, Control and
Automation
13:40-14:55Room T1
Poster SessionsTuP1-InteracT1
Buffer Session 1
15:25-16:55Room 3B
Regular SessionsTuP201
Human andMulti-RobotInteraction
15:25-16:55Room 3C
Regular SessionsTuP202
Legged Locomotion
15:25-16:55Room 3D
Regular SessionsTuP203
Humanoid Robots II
15:25-16:55Room 3E
Regular SessionsTuP204
Recognition I
15:25-16:55Room 3G
Regular SessionsTuP205
Teleoperation III
15:25-16:55Room 5A
Regular SessionsTuP206
Robust/AdaptiveControl of Robotic
Systems
15:25-16:55Room 5B
Regular SessionsTuP207
Space Robotics
15:25-16:55Room 5C
Regular SessionsTuP208
RehabilitationRobotics II
15:25-16:55Room 5D
Regular SessionsTuP209
Motion and PathPlanning II
15:25-16:55Room 5E
Regular SessionsTuP210
Localization andMapping IV
15:25-16:55Room 5F
Regular SessionsTuP211
Marine andUnderwater Robotics
II
15:25-16:55Room 5H
Regular SessionsTuP212
Field and UnderwaterRobotics II
15:25-16:55Room 5I
Regular SessionsTuP213
Medical Robots andSystems IV
15:25-16:55Room 5J
Regular SessionsTuP214
Visual Navigation IV
15:25-16:55Room 3A
Regular SessionsTuP215
CommunicationSession III:Automation
Technologies
ICRA 2011 Technical Program Wednesday May 11, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
08:20-09:35Room 3B
Regular SessionsWeA101
Aerial Robotics III
08:20-09:35Room 3C
Regular SessionsWeA102
Agent-Based Systems II
08:20-09:35Room 3D
Regular SessionsWeA103
Autonomous NavigationIII
08:20-09:35Room 3E
Regular SessionsWeA104
Range Sensing I
08:20-09:35Room 3G
Regular SessionsWeA105
Slam I
08:20-09:35Room 5A
Regular SessionsWeA106
Micro-Nano Robots I
08:20-09:35Room 5B
Regular SessionsWeA107
Biologically-InspiredRobots IV
08:20-09:35Room 5C
Regular SessionsWeA108
Parallel Robots I
08:20-09:35Room 5D
Regular SessionsWeA109
Novel Actuators I
08:20-09:35Room 5E
Regular SessionsWeA110
Mapping and NavigationI
08:20-09:35Room 5F
Regular SessionsWeA111
Grasping I
08:20-09:35Room 5H
Regular SessionsWeA112
Distributed RobotSystems II
08:20-09:35Room 5I
Regular SessionsWeA113
Medical Robots I
08:20-09:35Room 5J
Regular SessionsWeA114
Computer Vision forRobotics andAutomation I
08:20-09:40Room 3A
Video SessionsWeAV115
Video Session I: Aerialand Mobile Robotics
10:05-11:20Room 3B
Regular SessionsWeA201
Aerial Robotics IV
10:05-11:20Room 3C
Regular SessionsWeA202
Underactuated andTendon/WireMechanisms I
10:05-11:20Room 3D
Regular SessionsWeA203
Autonomous NavigationIV
10:05-11:20Room 3E
Regular SessionsWeA204
Range Sensing II
10:05-11:20Room 3G
Regular SessionsWeA205
Slam Ii
10:05-11:20Room 5A
Regular SessionsWeA206
Micro-Nano Robots II
10:05-11:20Room 5B
Regular SessionsWeA207
Biologically-InspiredRobots V
10:05-11:20Room 5C
Regular SessionsWeA208
Parallel Robots II
10:05-11:20Room 5D
Regular SessionsWeA209
Novel Actuators II
10:05-11:20Room 5E
Regular SessionsWeA210
Mapping and NavigationII
10:05-11:20Room 5F
Regular SessionsWeA211
Grasping II
10:05-11:20Room 5H
Regular SessionsWeA212
Cooperative Control forMultiple Robots
10:05-11:20Room 5I
Regular SessionsWeA213
Medical Robots II
10:05-11:20Room 5J
Regular SessionsWeA214
Computer Vision forRobotics andAutomation II
10:05-11:25Room 3A
Video SessionsWeAV215
Video Session II:Humanoid and Service
Robotics
13:40-14:55Room 3B
Regular SessionsWeP101
Collision Avoidance
13:40-14:55Room 3C
Regular SessionsWeP102
Underactuated andTendon/Wire
Mechanisms II
13:40-14:55Room 3D
Regular SessionsWeP103
Humanoid Robots III
13:40-14:55Room 3E
Regular SessionsWeP104
Range Sensing III
13:40-14:55Room 3G
Regular SessionsWeP105
Slam Iii
13:40-14:55Room 5A
Regular SessionsWeP106
Micro-Nano Robots III
13:40-14:55Room 5B
Regular SessionsWeP107
Biologically-InspiredRobots VI
13:40-14:55Room 5C
Regular SessionsWeP108
Rehabilitation RoboticsIII
13:40-14:55Room 5D
Regular SessionsWeP109
Motion and PathPlanning III
13:40-14:55Room 5E
Regular SessionsWeP110
Industrial Automation
13:40-14:55Room 5F
Regular SessionsWeP111
Physical Human-RobotInteraction I
13:40-14:55Room 5H
Regular SessionsWeP112
Learning and AdaptiveSystems I
13:40-14:55Room 5I
Regular SessionsWeP113
Networked Robots
13:40-14:55Room 5J
Regular SessionsWeP114
Computer Vision forRobotics andAutomation III
13:40-14:55Room 3A
Regular SessionsWeP115
Communication SessionIV: Robotic Applications
I
15:25-16:55Room 3B
Regular SessionsWeP201
Cognitive Human-RobotInteraction
15:25-16:55Room 3C
Regular SessionsWeP202
Kinematics of Serial andParallel Robots
15:25-16:55Room 3D
Regular SessionsWeP203
Humanoid Robots IV
15:25-16:55Room 3E
Regular SessionsWeP204
Recognition II
15:25-16:55Room 3G
Regular SessionsWeP205
Slam Iv
15:25-16:55Room 5A
Regular SessionsWeP206
Micro-Nano Robots IV
15:25-16:55Room 5B
Regular SessionsWeP207
Micro-Nano Robots andApplications to Life
Science
15:25-16:55Room 5C
Regular SessionsWeP208
Rehabilitation RoboticsIV
15:25-16:55Room 5D
Regular SessionsWeP209
Motion and PathPlanning IV
15:25-16:55Room 5E
Regular SessionsWeP210
Surveillance, Search andRescue Robotics
15:25-16:55Room 5F
Regular SessionsWeP211
Physical Human-RobotInteraction II
15:25-16:55Room 5H
Regular SessionsWeP212
Learning and AdaptiveSystems II
15:25-16:55Room 5I
Regular SessionsWeP213
Factory Automation
15:25-16:55Room 5J
Regular SessionsWeP214
Visual Tracking
15:25-16:55Room 3A
Regular SessionsWeP215
Communication SessionV: Robotic Applications
II
ICRA 2011 Technical Program Thursday May 12, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14
08:20-09:35Room 3B
Regular Sessions ThA101
Aerial Robotics V
08:20-09:35Room 3C
Regular Sessions ThA102
Planning, Scheduling andCoordination
08:20-09:35Room 3D
Regular Sessions ThA103
Path Planning for MultipleRobots I
08:20-09:35Room 3E
Regular Sessions ThA104
Sensor Fusion I
08:20-09:35Room 3G
Regular Sessions ThA105
Manipulation Planning I
08:20-09:35Room 5A
Regular Sessions ThA106
Mechanism Design ofMobile Robots I
08:20-09:35Room 5B
Regular Sessions ThA107
Variable StiffnessActuators I
08:20-09:35Room 5C
Regular Sessions ThA108
Rehabilitation Robotics V
08:20-09:35Room 5D
Regular Sessions ThA109
Redundant Robots
08:20-09:35Room 5E
Regular Sessions ThA110
Localization I
08:20-09:35Room 5F
Regular Sessions ThA111
Grasping, Tactile Sensingand Force Control
08:20-09:35Room 5H
Regular Sessions ThA112
Distributed Robot SystemsIII
08:20-09:35Room 5I
Regular Sessions ThA113
Medical Robots andSystems V
08:20-09:35Room 5J
Regular Sessions ThA114
Computer Vision I: Model
10:05-11:20Room 3B
Regular Sessions ThA201
Nonholonomic MotionPlanning
10:05-11:20Room 3C
Regular Sessions ThA202
Robot Design
10:05-11:20Room 3D
Regular Sessions ThA203
Path Planning for MultipleRobots II
10:05-11:20Room 3E
Regular Sessions ThA204
Sensor Fusion II
10:05-11:20Room 3G
Regular Sessions ThA205
Manipulation Planning II
10:05-11:20Room 5A
Regular Sessions ThA206
Mechanism Design ofMobile Robots II
10:05-11:20Room 5B
Regular Sessions ThA207
Variable StiffnessActuators II
10:05-11:20Room 5C
Regular Sessions ThA208
Soft Material Robotics
10:05-11:20Room 5D
Regular Sessions ThA209
New Sensing andMechanism for Robots
10:05-11:20Room 5E
Regular Sessions ThA210
Localization II
10:05-11:20Room 5F
Regular Sessions ThA211
Vision: 3D
10:05-11:20Room 5H
Regular Sessions ThA212
Dexterous Manipulation
10:05-11:20Room 5I
Regular Sessions ThA213
Medical Robots andSystems VI
10:05-11:20Room 5J
Regular Sessions ThA214
Computer Vision II:Recognition
13:40-14:55Room 3B
Regular Sessions ThP101
Multifingered Hands
13:40-14:55Room 3C
Regular Sessions ThP102
Distributed and NetworkedRobot Systems
13:40-14:55Room 3D
Regular Sessions ThP103
Robot Safety
13:40-14:55Room 3E
Regular Sessions ThP104
Reactive andSensor-Based Planning
13:40-14:55Room 3G
Regular Sessions ThP105
Manipulation Planning III
13:40-14:55Room 5A
Regular Sessions ThP106
Mechanism Design ofMobile Robots III
13:40-14:55Room 5B
Regular Sessions ThP107
Variable Stiffness andImpedance Control I
13:40-14:55Room 5C
Regular Sessions ThP108
Robotic Software,Middleware andProgramming
Environments I
13:40-14:55Room 5D
Regular Sessions ThP109
Motion and Path PlanningV
13:40-14:55Room 5E
Regular Sessions ThP110
Localization III
13:40-14:55Room 5F
Regular Sessions ThP111
Visual Servoing I
13:40-14:55Room 5H
Regular Sessions ThP112
Learning and AdaptiveSystems III
13:40-14:55Room 5I
Regular Sessions ThP113
Medical Robots andSystems VII
13:40-14:55Room 5J
Regular Sessions ThP114
Computer Vision IIINavigation
15:25-16:55Room 3B
Regular Sessions ThP201
Motion Control ofManipulators
15:25-16:55Room 3C
Regular Sessions ThP202
Robot Design forAdvanced Applications
15:25-16:55Room 3D
Regular Sessions ThP203
Sensor Networks
15:25-16:55Room 3E
Regular Sessions ThP204
Tactile Sensing andMultifingered Grasping
15:25-16:55Room 3G
Regular Sessions ThP205
Vision for ObjectRecognition
15:25-16:55Room 5A
Regular Sessions ThP206
Micro and NanoscaleAutomation
15:25-16:55Room 5B
Regular Sessions ThP207
Variable Stiffness andImpedance Control II
15:25-16:55Room 5C
Regular Sessions ThP208
Robotic Software,Middleware andProgramming
Environments II
15:25-16:55Room 5D
Regular Sessions ThP209
Motion and Path PlanningVI
15:25-16:55Room 5E
Regular Sessions ThP210
Wheeled Robots
15:25-16:55Room 5F
Regular Sessions ThP211
Visual Servoing II
15:25-16:55Room 5H
Regular Sessions ThP212
Learning and AdaptiveSystems IV
15:25-16:55Room 5I
Regular Sessions ThP213
Medical Robots andSystems VIII
15:25-16:55Room 5J
Regular Sessions ThP214
Omnidirectional Vision forRobotics
Sessions with “learning papers” (keyword in title or abstract): Learning & AdaptingSystems, Recognition, Motion & Path planning, Grasping, Adaptive Control 7/82
• The field of robotics is huge!
If Robotics was a huge Pizza...ML = chillies; rest = “infrastructure”
• bits and pieces of learning here and thereML on the system level?
• Implications:– good: many possibilities for adaptivity and learning– the integrated system is hard to “formalize”/become subject to ML
8/82
OutlinePart I: Learning problems in Robotics – the RL view
• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics
• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL
• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning
Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion
9/82
Part I:
Learning problems in Robotics – the RL view
10/82
• “The Reinforcement Learning view”:Viewed in the framework of Markov Decision Processes / StochasticControl
→ Unifying notation, accessible to MLers
• BUT:solving the Mountain Car problem 6= solving Robotics
(→ Discussion)
11/82
Markov Decision Process & Optimal Control
state variable
control variable
rewards/costs
u0 u1 u2 uT
xTx2x1x0
r0 r1 r2 rT
P (x0:T , u0:T , r0:T ;π) =
P (x0)P (u0|x0;π)P (r0|u0, x0)∏Tt=1 P (xt|ut-1, xt-1)P (ut|xt;π)P (rt|ut, xt)
State xt
Control/Action ut
Process P (x0) and P (xt+1 |ut, xt)Reward/Cost rt(xt, ut) or ct(xt, ut)
Control policy π(ut |xt) or ut = π(xt) (deterministic)
12/82
Markov Decision Process & Optimal Control
• typical MDP case:– infinite time horizon: T →∞– stationary world & rewards: P (x′ |u, x) and r(x, u) indep. of t– discounting: total return
∑Tt=0 γ
tr(xt, ut)
→ stationary policy π(u|x) indep. of t
• typical Stochastic Optimal Control case:– finite time horizon T– costs depend on absolut time ct(xt, ut)– total costs non-discounted C(x0:T , u0:T ) =
∑Tt=0 ct(xt, ut)
→ non-stationary control policy πt(ut|xt)
13/82
One way to introduce to Robotics basics is to consider three basiccontrol problems:
• 1-step kinematic process
• 1-step dynamic process
• T -step dynamic process
14/82
Kinematics
• We consider a 1-step kinematic control problem, where we know thestate at time t and optimize controls ut to minimize costs at t+1.
State xt ∈ Rn = joint angles
Control ut ∈ Rn = command change in joint angles
Process (deterministic): xt+1 = xt + ut
Costs in the 1-step kinematic case is of the formc(xt+1, ut) = ||ut||2W + ||φ(xt+1)− y∗||2/σ2
15/82
Kinematicsc(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2
• The word kinematics refers to φ(q):– defines to-be-controlled state features (task variables)
LeftHandposition
RightHandposition
– φ is determined by the geometry (“kinematics”) of the robot– Roboticists know how to compute φ(q) and J = ∂
∂qφ(q)
– y∗ says what the targets are16/82
Kinematics
• Using a local linearization of φ, this has a simple solution:
c(xt+1, u) = ||u||2W + ||φ(xt+1)− y∗||2/σ2 , xt+1 = xt + u
c(u) = ||u||2W + ||φ(xt + u)− y∗||2/σ2
= ||u||2W + ||φ(xt)− y∗ + Ju||2/σ2 , J := ∂∂qφ
argminu
c(u) = (J>J + σ2W )-1J> (y∗ − φ(xt))
This choice of control is called inverse kinematics
MLers note: The problem and solution is identical to that of Ridge regression
17/82
Kinematics
[demo: roboticsCourse2 ./x.exe -mode 2 3 4]
What can ML contribute?
18/82
Learning the kinematics
• If the kinematics φ are unknown, learn them from data!
Literature:
Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)
Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)
19/82
Todorov: Probabilistic inference of multi-joint movements, skeletal parameters andmarker attachments from diverse sensor data. (IEEE Transactions on BiomedicalEngineering 2007)
Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manipulator usingData-Efficient Reinforcement Learning (RSS 2011)
20/82
Dynamics
• We consider a 1-step dynamic control problem, where we know thestate at time t = 0 and optimize the controls ut to minimize costs att = 1.
State xt = (qt, qt) ∈ R2n joint angles & velocities
Control ut ∈ Rn = torques (angular forces) applied in each joint
Process (deterministic): xt+1 = (I +A) xt +B ut + aA and B are local linearization of the system dynamics, see appendix
Costs in the 1-step dynamic case is of the formc(xt+1, u) = ||u||2H + ||φ(xt+1)− y∗||2/s2
y∗ determines (e.g.) desired accelerations in state features φ
21/82
Dynamics
• Using the local linearizations of the process and the kinematics φ, thishas a simple solution:
c(xt+1, u) = ||u||H2 + ||φ(xt+1)− y∗||2/s2
= ||u||H2 + ||φ(xt) + J(Axu +Bu+ a)− y∗||2/s2
argminu
c(u) = (B>J>JB + σ2W )-1B>J> [y∗ − φ(xt)− J(Axt + a)]
This is optimal 1-step dynamic control
(It includes so-called optimal operational space control as special case, see also Peterset al: A unifying framework for the control of robotics systems (IROS 2005))
22/82
Dynamics
[demo: roboticsCourse3 ./x.exe -control 0 1]
What can ML contribute?
23/82
Learning the dynamics
• If the dynamics x = f(x, u) are unknown, learn them from data!
Literature:
Moore: Acquisition of Dynamic Control Knowledge for a Robotic Manipulator (ICML1990)
Atkeson, Moore & Schaal: Locally weighted learning for control. Artificial IntelligenceReview, 1997.
Schaal, Atkeson & Vijayakumar: Real-Time Robot Learning with Locally WeightedStatistical Learning. (ICRA 2000)
Vijayakumar et al: Statistical learning for humanoid robots, Autonomous Robots, 2002.
24/82
(Schaal, Atkeson, Vijayakumar)
• Use a simple regression method (locally weighted Linear Regression)to estimate x = f(x, u)
25/82
Beyond 1-step horizon: Motion Planning & Skills
• The 1-step processes with quadratic costs can, with local linearization,be solved analytically.– Basic role of learning: learning the model (kinematic or dynamics)
• Planning (multi-step processes) cannot be solved in that way – weneed the notion of a value function or cost-to-go function.
26/82
Stochastic Optimal Control
• In the multi-step case of horizon T , the cost function is of the form
C(x0:T , u0:T ) =
T∑t=0
ct(xt, ut)
• Define the optimal value function (aka cost-to-go function)
Vt(xt) = minut:T
∑Tk=t 〈ck(xk, uk)〉xk|ut:k,xt
= minut
[ct(xt, ut) + min
ut+1:T
∑Tk=t-1 〈ck(xk, uk)〉xk|ut:k,xt
]= min
ut
[ct(xt, ut) +
∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1
]u∗t = argmin
ut
[ct(xt, ut) +
∫P (xt+1|ut, xt) Vt+1(xt+1) dxt+1
]Bellman optimality principle
• Dynamic Programming: Compute Vt(x) backward, starting withVT+1(x)=0
27/82
Stochastic Optimal Control
[demo: git/mlr/share/robot/10-optimizationBenchmarks]
(Here, we optimized the control using probabilistic inference – see later.)
What can ML contribute?
28/82
Five approaches to learning optimal control
Po
lic
y S
ea
rch
Mo
de
l−fr
ee
RL
Mo
de
l−b
as
ed
RL
Inv
ers
e R
L
Imit
ati
on
Le
arn
ing
learn value fct.V (x)
policyπ(x)
optimize policy learn latent costsc(x)
dynamic prog.
π(x)policy
learn policyπ(x)
policy
learn model
πt(x)
P (x′|u, x)c(x, u)
dynamic prog.Vt(x) Vt
π(x)
demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )
d}nd=1
29/82
OutlinePart I: Learning problems in Robotics – the RL view
• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics
• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL
• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning
Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion
30/82
1. Model learning (model-based RL)
D = {(xt, ut, ct)}Tt=0learn→ P (x′|u, x)
DP→ Vt(x) → πt(x)
Literature:
Exactly as for Learning Dynamics & Kinematics in 1-step case
31/82
2. Model-free RL
D = {(xt, ut, rt)}Tt=0learn→ Vt(x) → πt(x)
• Use ML to directly estimate the value function V (x)
Literature:
Gordon: Stable function approximation in dynamic programming. DTIC Document, 1995.
Lagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).
Rasmussen & Kuss: Gaussian Processes in Reinforcement Learning (NIPS 2004)
Engel, Mannor & Meir: Reinforcement Learning with Gaussian Processes. (ICML 2005)
Mahadevan & Maggioni: Proto-Value Functions: A Laplacian Framework for LearningRepresentation and Control in Markov Decision Processes (JMLR 2007)
32/82
LSPI: Least Squares Policy IterationLagoudakis & Parr: Least-Squares Policy Iteration (JMLR 2003).
(I’ll explain it here in terms of the value function instead of Q-function.)
• The value function fulfils
V (x) = r(x, π(x)) + γ∑x′
P (x′ |π(x), x) V (x′)
• If we have n data points D = {(xt, ut, rt, xt+1)}nt=1, we require that thisequation holds (approximatly) for these n data points:
∀t : V (xt) = rt + γV (xt+1)
• Written in vector notation: V = R+ gV with N -dim data vectorsV ,R, V
• Written as optmization: minimize the Bellman residual error
L(V ) =
T∑t=1
[V (xt)− rt − γV (xt+1)]2 = ||R− V + γV ||2
33/82
LSPI: Least Squares Policy Iteration• Approximate V (x) as linear in k features φj :
V (x) =∑kj=1 φj(x)βj = φ(x)>β
Then
V = φβ , φtj = φj(xt) ∈ RT×k
T = φβ , φtj = φt(xt+1) ∈ RT×k
• the loss becomes
L(β) = ||R− (φ− γφ)>β||2
→ has analytic solution!
• Like regression, butsquared error in supervised learning → Bellman residual error
details on simplifications made: see Appendix34/82
LSPI: Riding a bike
from Lagoudakis & Parr (JMLR 2003)35/82
LSPI: Riding a bike
from Lagoudakis & Parr (JMLR 2003)36/82
3. Policy Search
D = {(xt, ut, ct)}Tt=0optimize→ πt(x)
• Use ML to directly optimize the policy π(u|x) based on data
Literature:
Peters & Schaal: Reinforcement Learning of Motor Skills with Policy Gradients (NeuralNetworks 2008)
Kober & Peters: Policy Search for Motor Primitives in Robotics (NIPS 2008)
Moriarty, Schultz & Grefenstette: Evolutionary algorithms for reinforcement learning(JAIR 1999)
37/82
Policy Search using policy gradients
• In continuous state/action case, represent the policy as linear inarbitrary state features:
π(x) =
k∑j=1
φj(x)βj = φ(x)>β (deterministic)
π(u |x) = N(u |φ(x)>β,Σ) (stochastic)
with k features φj .
• Given Data D = {(xt, ut, rt)}Tt=0 we want to estimate
∂V (β)
∂β
38/82
Policy Search using policy gradients• One approach is called REINFORCE:
∂V (β)
∂β= ∂
∂β
∫P (ξ|β) R(ξ) dξ =
∫P (ξ|β) ∂
∂β logP (ξ|β)R(ξ)dξ
= Eξ|β{ ∂∂β logP (ξ|β)R(ξ)}
= Eξ|β{T∑t=0
γt∂ log π(ut|xt)
∂β
T∑t′=t
γt′−trt′︸ ︷︷ ︸
Qπ(xt,ut,t)
}
• PoWER (Kober & Peters) and Monte Carlo EM (Vlassis & Toussaint)are similar, but try to make a full “M-step” instead of only a gradientstep.
See: Peters & Schaal (2008): Reinforcement learning of motor skills with policygradients, Neural Networks.
Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo EMAlgorithm. Autonomous Robots 27, 123-130. 39/82
Policy Search using policy gradients
Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
40/82
Five approaches to learning optimal control
Po
lic
y S
ea
rch
Mo
de
l−fr
ee
RL
Mo
de
l−b
as
ed
RL
Inv
ers
e R
L
Imit
ati
on
Le
arn
ing
learn value fct.V (x)
policyπ(x)
optimize policy learn latent costsc(x)
dynamic prog.
π(x)policy
learn policyπ(x)
policy
learn model
πt(x)
P (x′|u, x)c(x, u)
dynamic prog.Vt(x) Vt
π(x)
demonstration dataexperience dataD = {(xt, ut, ct)}Tt=0 D = {(x0:T , u0:T )
d}nd=1
41/82
4. Imitation Learning
D = {(x0:T , u0:T )d}nd=1
learn/copy→ πt(x)
• Use ML to imitate demonstrated state trajectories x0:T
Literature:
Atkeson & Schaal: Robot learning from demonstration (ICML 1997)
Schaal, Ijspeert & Billard: Computational approaches to motor learning by imitation(Philosophical Transactions of the Royal Society of London. Series B: BiologicalSciences 2003)
Grimes, Chalodhorn & Rao: Dynamic Imitation in a Humanoid Robot throughNonparametric Probabilistic Inference. (RSS 2006)
Rudiger Dillmann: Teaching and learning of robot tasks via observation of humanperformance (Robotics and Autonomous Systems, 2004)
42/82
Imitation Learning
• There a many ways to imitate/copy the oberved policy:
Learn a density model P (ut |xt)P (xt) (e.g., with mixture of Gaussians)from the observed data and use it as policy (Billard et al.)
Or trace observed trajectories by minimizing perturbation costs(Atkeson & Schaal 1997)
43/82
Imitation Learning
Atkeson & Schaal44/82
5. Inverse RL
D = {(x0:T , u0:T )d}nd=1learn→ r(x, u)
DP→ Vt(x) → πt(x)
• Use ML to “uncover” the latent reward function in observed behavior
Literature:
Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse reinforcement learning(ICML 2004)
Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learning (ICML2000)
Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse Feedback Control(ICML 2011).
45/82
Inverse RL (Apprenticeship Learning)
• Given: demonstrations D = {xd0:T }nd=1
• Try to find a reward function that discriminates demonstrations fromother policies– Assume the reward function is linear in some features R(x) = w>φ(x)
– Iterate:
1. Given a set of candidate policies {π0, π1, ..}2. Find weights w that maximize the value margin between teacher and all
other candidates
maxw,ξ
ξ
s.t. ∀πi : w>〈φ〉D︸ ︷︷ ︸value of demonstrations
≥ w>〈φ〉πi︸ ︷︷ ︸value of πi
+ξ
||w||2 ≤ 1
3. Compute a new candidate policy πi that optimizes R(x) = w>φ(x) andadd to candidate list.
(Abbeel & Ng, ICML 2004)46/82
47/82
OutlinePart I: Learning problems in Robotics – the RL view
• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics
• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL
• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning
Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion
48/82
6. Exploration
• Active Learning is a form of ML where the algorithm can query the nextdata point.
→ explore where “smoothed” empirical distribution is low
49/82
Exploration
• Exploration in robotics is tricky: can’t just pick the next point, but needto control system into state of interest.
Literature (very diverse selection):
Schaal & Atkeson: Robot juggling: An implementation of memory-based learning[Shifting Setpoint Algorithm] (Control Systems Magazine 1994)
Nouri & Littman: Dimension reduction and its application to model-based exploration incontinuous spaces (Machine Learning 2010)
Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)
Katz, Pyuro & Brock: Learning to Manipulate Articulated Objects in UnstructuredEnvironments Using a Grounded Relational Representation (RSS 2008)
Hsiao, Kaelbling & Lozano-Perez: Grasping POMDPs (ICRA 2007)
Saxena et al: Learning to grasp novel objects using vision (ISER 2006)
Oudeyer et al: The playground experiment: Task-independent development of a curiousrobot (Symposium on Developmental Robotics 2005)
50/82
Exploration
• R-max is a simple exploration strategy, that associates high value tostate-action pairs not often visites (optimism)
• We can use ML to approximate this optimistic value function.
Fitted R-max on the Mountain Car problem:
Jong & Stone: Model-Based Exploration in Continuous State (SARA 2007)
51/82
Physical ExplorationDov Katz and Oliver Brock 2010: Manipulating Articulated Objects WithInteractive Perception
52/82
7. Probabilistic Inference for Control & Planning
• Bellman’s optimality principle is one approach to optimal control:
Vt(x) = minut
[ct(xt, ut) + Ex′|ut,x{V (x′)}
]
• Reductions to Probabilistic Inference:
Toussaint & Goerick: Probabilistic inference for structured planning in robotics (IROS2007).
Todorov: General duality between optimal control and estimation (Decision and Control,2008)
Toussaint: Robot Trajectory Optimization using Approximate Inference (ICML 2009).
Kappen, Gomez & Opper: Optimal control as a graphical model inference problem(arXiv:0901.0633, 2009)
Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)
53/82
Approximate Inference Control
state variable
control variable
rewards/costs
u0 u1 u2 uT
xTx2x1x0
r0 r1 r2 rT
• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}
• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )
• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs
〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )
Expectation Maximization↔ Stochastic Optimal Control
54/82
Approximate Inference Control
state variable
control variable
rewards/costs
uTu2u1u0
x0 x1 x2 xT
zTz2z1z0
• Introduce a binary auxiliary variable zt withP (zt=1 |ut, xt) = exp{−ct(xt, ut)}
• For a given trajectory x0:T , u0:T :logP (z0:T =1 |x0:T , u0:T ) = −C(x0:T , u0:T )
• W.r.t. a distribution q(x0:T , u0:T ):expected log-likelihood = expected neg-costs
〈logP (z0:T =1 |x0:T , u0:T )〉q(x0:T ,u0:T ) = −〈C(x0:T , u0:T )〉q(x0:T ,u0:T )
Expectation Maximization↔ Stochastic Optimal Control
54/82
Approximate Inference Control
state variable
control variable
rewards/costs
uTu2u1u0
x0 x1 x2 xT
zTz2z1z0• Distinguish 3 different processes:
prior P (x0:T , u0:T ) := P (x0)T∏t=0
P (ut |xt)T∏t=1
P (xt |ut-1xt-1)
controlled qπ(x0:T , u0:T ) := P (x0)T∏t=0
δut=πt(xt)
T∏t=1
P (xt |ut-1xt-1)
posterior p(x0:T , u0:T ) :=P (x0)
P (z0:T =1)
T∏t=0
P (ut |xt)T∏t=1
P (xt |ut-1xt-1)T∏t=0
exp{−ct(xt, ut)}
For uniform P (ut |xt)D(qπ∣∣∣∣ p) = logP (z0:T =1) + Eqπ(x0:T ){C(x0:T , π(x0:T ))}
Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control(arXiv:1009.3958, 2010)
55/82
Approximate Inference Control
Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)
56/82
OutlinePart I: Learning problems in Robotics – the RL view
• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics
• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL
• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning
Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion
57/82
• Most (if not all) examples concerned control of own or attached DoFs
What about really controlling a natural environment?
58/82
• That might be a natural environment.How can we control (e.g., clean) this?
59/82
Learning a model from object interactionsD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...
puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...
.
.
.
}• How can we learn a predictive model P (x′ |u, x) for this data?
60/82
Learning a model from object interactions
• Differences to model learning in motion control:– symbolic representation of state– exponential state space
Object Abstraction Assumption: The world is made up ofobjects, and the effects of actions on these objects generallydepend on their attributes rather than their identities.
Pasula, Zettlemoyer & Kaelbling (ICAPS 2004)
• Wanted: generalization across objects
→ Statistical Relational Learning
61/82
Statistical Relational Learning (SRL)
• See ECML/PKDD 2007 tutorial by Lise Getoorhttp://www.ecmlpkdd2007.org/CD/tutorials/T3_Getoor/Getoor_CD.pdf
• “Probabilistic learning & inference on 1st order logic representations”
– very strong generalization across objects– in my view: the currently only way to express & learn uncertainknowledge about environments with objects & properties/relations
SRL + Robotics = perfect match!
62/82
Learning a model from interaction dataD = {grab(c) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,d) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...
puton(a) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) ¬on(c,d) inhand(c) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
puton(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
grab(b) : box(a) box(b) ball(c) table(d) on(a,b) on(b,d) on(c,a) inhand(nil) ...
→ box(a) box(b) ball(c) table(d) on(a,d) ¬on(b,d) on(c,d) inhand(b) ...
.
.
.
}• How can we learn a predictive model P (x′ |u, x) for this data?
63/82
A form of Statistical Relational LearningPasula, Zettlemoyer & Kaelbling: Learning probabilistic relational planning rules (ICAPS2004)
• compress this data into probabilistic relational rules (Pasula et al.):pickup(X,Y) : on(X,Y), clear(X), inhand(NIL),block(Y)
→
.7 :inhand(X),¬clear(X),¬inhand(NIL),¬on(X,Y), clear(Y)
.2 : on(X, TABLE),¬on(X,Y), clear(Y)
.1 : no changepickup(X, TABLE) : on(X,TABLE), clear(X), inhand(NIL)
→
(.66 :
inhand(X),¬clear(X),¬inhand(NIL),¬on(X,TABLE)
.34 : no changeputon(X,Y) : clear(Y), inhand(X),block(Y)
→
.7 :inhand(NIL),¬clear(Y),¬inhand(X),on(X,Y), clear(X)
.2 :on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)
.1 : no changeputon(X,TABLE) : inhand(X)
→
(.8 :
on(X,TABLE), clear(X), inhand(NIL),¬inhand(X)
.2 : no change
• Find a rule set that maximizes (likelihood - description length)• Opportunities for reducing description length:
– Frame Assumption:Actions only influence few predicates; most predicates remain unchanged
– Abstraction: Introducing novel predicates.– Uncertainty! 64/82
Role of uncertainty in these rules
(b)(a)
⇒ uncertainty↔ regularization↔ compression & abstraction
• Introducing uncertainty in the rules not only allows us to modelstochastic worlds, it enables to compress/regularize and therebylearn strongly generalizing models!
65/82
Planning by inference in relational domains
• Once the model is learnt, using it (planning) is hard
• SST & UCT do not scale with # objects
→ Use Planning-by-Inference:
model
depending on:situation
relevance
modelrule-based DBN
one representation good for learning, another good for planning
(Lang & Toussaint, JAIR 2010)66/82
Planning by inference in relational domains
(we’re using factored frontier for approx. inference)
→ Advances in Lifted Inference could translate to better robotmanipulation planning.
67/82
ApplicationRandom exploration:
Planning:
Real-world:
Lang & Toussaint: Planning with Noisy Probabilistic Relational Rules (JAIR 2010)
Toussaint et al: Integrated motor control, planning, grasping and high-level reasoning in ablocks world using probabilistic inference (ICRA 2010)
68/82
Relational Exploration
• The state space is inherently exponential in the # objects. How couldwe realize strategies like E3 or R-max in relational domains?
• Key insight:
strong generalization of model↔
strong implication on what is considered novel / is explored
For instance, if you’ve seen a red, green and yellow ball rolling, will you explore whetherthe blue ball also rolls? Or rather explore something totally different, like dropping a bluebox?
69/82
Relational Exploration
• Transfer Explicit Explore or Exploit (E3) to Relational Domains
• Representations to formulate an “empirical distribution” (non-novelty)
propositional P (s) ∝ cD(s)
distance based Pd(s) ∝ exp{− min(se,ae,s′e)∈D
d(s, se)2}
predicate-based Pp(s) ∝ cp(s) I(s |= p) + c¬p(s) I(s |= ¬p)context-based Pφ(s) ∝
∑φ∈Φ
cD(φ) I(∃σ : s |= σ(φ))
(contexts ↔ set of LHSs of rules)
70/82
ApplicationOnline Relational explore-exploit:
Lang, Toussaint & Kersting: Exploration in Relational Worlds (ECML 2010)
71/82
OutlinePart I: Learning problems in Robotics – the RL view
• Introduce to some basics– Markov Decision Processes and Stochastic Optimal Control– Kinematics and Dynamics
• Five Approaches to Learning in Robotics1. Model learning (model-based RL)2. Value learning (model-free RL)3. Policy search4. Imitation learning5. Inverse RL
• ...plus two more:6. Exploration7. Probabilistic Inference for Control & Planning
Part II: Interacting with a world of objects & Discussion– Statistical Relational Learning for Robots– Discussion
72/82
Three conclucing comments:
– Scaling RL?– Relational Learning in Robotics– Whole Pizza vs. Chillies
73/82
Scaling RL?
• Recommended: Satinder Singh’s “Myths of RL”:http://umichrl.pbwiki.com/Myths-of-Reinforcement-Learning
1. Large state spaces are hard for RL2. RL is slow3. RL does not have (m)any success stories since TDgammon4. RL does not work well with function approximation5. Value function approximation does not work6. Non-Markovianness invalidates standard RL methods7. POMDPs are hard for RL to deal with8. RL is about learning optimal policies
• The first half of this tutorial discussed many success stories of the RLapproach to Robotics.
74/82
Wolfgang Kohler (1917)Intelligenzprufungen amMenschenaffenThe Mentality of Apes
[movie]
75/82
Scaling RL?
The real world is not a scaled up versionof the Mountain Car Problem.
• In other terms: What should we scale with?– The size of the state space?
It’ll always be exponential – is this the right view?
– The number of objects?How many objects can humans mentally manipulate (=plan with)?
– Scaling with the planning horizon?But on the right level of abstraction, horizons are short.
• What we need in Robotics areExploration, learning and goal-directed behavior that exploit thestructure of natural environments.
76/82
Relational Learning in Robotics
↔
• Many of the successful learning methods concern motion control. There is (inmy impression) less methods for learning to control/manipulate worlds of manyobjects – worlds as they are classically described in AI.
• What Robotics needs is a fusion of “this type of AI”, Machine Learning andControl methods
77/82
Relational Learning in Robotics
• A popular science article:I, algorithm: A new dawn for artificial intelligence
(Anil Ananthaswamy, NewScientist, January 2011)
Talks of “probabilistic programming, which combines the logical underpinningsof the old AI with the power of statistics and probability.” Cites Stuart Russel as“It’s a natural unification of two of the most powerful theories that have beendeveloped to understand the world and reason about it.” and Josh Tenenbaumas “It’s definitely spring”.
• My impression: Exactly these kinds of developments give new hope forRobots to explore, learn and plan in our natural world, composed ofobjects.
78/82
Whole Pizza vs. Chillies2011 IEEE International Conference on Robotics and Automation
ICRA 2011 Technical Program Tuesday May 10, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Interac Track T1
08:20-09:35Room 3B
Regular SessionsTuA101
Aerial Robotics I
08:20-09:35Room 3C
Regular SessionsTuA102
Agent-Based SystemsI
08:20-09:35Room 3D
Regular SessionsTuA103
AutonomousNavigation I
08:20-09:35Room 3E
Invited SessionsTuA104
ICRA RobotChallenge: Advancing
Research throughCompetitions
08:20-09:35Room 3G
Regular SessionsTuA105
Advanced RobotControl
08:20-09:35Room 5A
Regular SessionsTuA106
Behaviour-BasedSystems
08:20-09:35Room 5B
Regular SessionsTuA107
Biologically-InspiredRobots I
08:20-09:35Room 5C
Regular SessionsTuA108
Calibration andIdentification I
08:20-09:35Room 5D
Regular SessionsTuA109
Cellular and ModularRobots I
08:20-09:35Room 5E
Regular SessionsTuA110
Localization andMapping I
08:20-09:35Room 5F
Regular SessionsTuA111
Flexible Arms/Robots
08:20-09:35Room 5H
Regular SessionsTuA112
Distributed RobotSystems I
08:20-09:35Room 5I
Regular SessionsTuA113
Medical Robots andSystems I
08:20-09:35Room 5J
Regular SessionsTuA114
Visual Navigation I
08:20-09:35Hall
Poster SessionsTuA1-InteracInterac
Interactive Session I:Robotic Technology
10:05-11:20Room 3B
Regular SessionsTuA201
Aerial Robotics II
10:05-11:20Room 3C
Regular SessionsTuA202
Climbing Robots
10:05-11:20Room 3D
Regular SessionsTuA203
AutonomousNavigation II
10:05-11:20Room 3E
Regular SessionsTuA204
Human Detectionand Tracking I
10:05-11:20Room 3G
Regular SessionsTuA205
Teleoperation I
10:05-11:20Room 5A
Regular SessionsTuA206
Haptics and HapticInterfaces I
10:05-11:20Room 5B
Regular SessionsTuA207
Biologically-InspiredRobots II
10:05-11:20Room 5C
Regular SessionsTuA208
Calibration andIdentification II
10:05-11:20Room 5D
Regular SessionsTuA209
Cellular and ModularRobots II
10:05-11:20Room 5E
Regular SessionsTuA210
Localization andMapping II
10:05-11:20Room 5F
Regular SessionsTuA211
Direct/InverseDynamics
Formulation
10:05-11:20Room 5H
Regular SessionsTuA212
Force and TactileSensing
10:05-11:20Room 5I
Regular SessionsTuA213
Medical Robots andSystems II
10:05-11:20Room 5J
Regular SessionsTuA214
Visual Navigation II
10:05-11:20Room 3A
Regular SessionsTuA215
CommunicationSession I:
Architecture andSoftware for Robotic
Systems
13:40-14:55Room 3B
Regular SessionsTuP101
Personal and ServiceRobots
13:40-14:55Room 3C
Regular SessionsTuP102
Multi-Legged Robots
13:40-14:55Room 3D
Regular SessionsTuP103
Humanoid Robots I
13:40-14:55Room 3E
Regular SessionsTuP104
Human Detectionand Tracking II
13:40-14:55Room 3G
Regular SessionsTuP105
Teleoperation II
13:40-14:55Room 5A
Regular SessionsTuP106
Haptics and HapticInterfaces II
13:40-14:55Room 5B
Regular SessionsTuP107
Biologically-InspiredRobots III
13:40-14:55Room 5C
Regular SessionsTuP108
RehabilitationRobotics I
13:40-14:55Room 5D
Regular SessionsTuP109
Motion and PathPlanning I
13:40-14:55Room 5E
Regular SessionsTuP110
Localization andMapping III
13:40-14:55Room 5F
Regular SessionsTuP111
Marine andUnderwater Robotics
I
13:40-14:55Room 5H
Regular SessionsTuP112
Field and UnderwaterRobotics I
13:40-14:55Room 5I
Regular SessionsTuP113
Medical Robots andSystems III
13:40-14:55Room 5J
Regular SessionsTuP114
Visual Navigation III
13:40-14:55Room 3A
Regular SessionsTuP115
CommunicationSession II: Industrial
Manipulators
13:40-14:55Hall
Poster SessionsTuP1-InteracInterac
Interactive Session II:Systems, Control and
Automation
13:40-14:55Room T1
Poster SessionsTuP1-InteracT1
Buffer Session 1
15:25-16:55Room 3B
Regular SessionsTuP201
Human andMulti-RobotInteraction
15:25-16:55Room 3C
Regular SessionsTuP202
Legged Locomotion
15:25-16:55Room 3D
Regular SessionsTuP203
Humanoid Robots II
15:25-16:55Room 3E
Regular SessionsTuP204
Recognition I
15:25-16:55Room 3G
Regular SessionsTuP205
Teleoperation III
15:25-16:55Room 5A
Regular SessionsTuP206
Robust/AdaptiveControl of Robotic
Systems
15:25-16:55Room 5B
Regular SessionsTuP207
Space Robotics
15:25-16:55Room 5C
Regular SessionsTuP208
RehabilitationRobotics II
15:25-16:55Room 5D
Regular SessionsTuP209
Motion and PathPlanning II
15:25-16:55Room 5E
Regular SessionsTuP210
Localization andMapping IV
15:25-16:55Room 5F
Regular SessionsTuP211
Marine andUnderwater Robotics
II
15:25-16:55Room 5H
Regular SessionsTuP212
Field and UnderwaterRobotics II
15:25-16:55Room 5I
Regular SessionsTuP213
Medical Robots andSystems IV
15:25-16:55Room 5J
Regular SessionsTuP214
Visual Navigation IV
15:25-16:55Room 3A
Regular SessionsTuP215
CommunicationSession III:Automation
Technologies
ICRA 2011 Technical Program Wednesday May 11, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
08:20-09:35Room 3B
Regular SessionsWeA101
Aerial Robotics III
08:20-09:35Room 3C
Regular SessionsWeA102
Agent-Based Systems II
08:20-09:35Room 3D
Regular SessionsWeA103
Autonomous NavigationIII
08:20-09:35Room 3E
Regular SessionsWeA104
Range Sensing I
08:20-09:35Room 3G
Regular SessionsWeA105
Slam I
08:20-09:35Room 5A
Regular SessionsWeA106
Micro-Nano Robots I
08:20-09:35Room 5B
Regular SessionsWeA107
Biologically-InspiredRobots IV
08:20-09:35Room 5C
Regular SessionsWeA108
Parallel Robots I
08:20-09:35Room 5D
Regular SessionsWeA109
Novel Actuators I
08:20-09:35Room 5E
Regular SessionsWeA110
Mapping and NavigationI
08:20-09:35Room 5F
Regular SessionsWeA111
Grasping I
08:20-09:35Room 5H
Regular SessionsWeA112
Distributed RobotSystems II
08:20-09:35Room 5I
Regular SessionsWeA113
Medical Robots I
08:20-09:35Room 5J
Regular SessionsWeA114
Computer Vision forRobotics andAutomation I
08:20-09:40Room 3A
Video SessionsWeAV115
Video Session I: Aerialand Mobile Robotics
10:05-11:20Room 3B
Regular SessionsWeA201
Aerial Robotics IV
10:05-11:20Room 3C
Regular SessionsWeA202
Underactuated andTendon/WireMechanisms I
10:05-11:20Room 3D
Regular SessionsWeA203
Autonomous NavigationIV
10:05-11:20Room 3E
Regular SessionsWeA204
Range Sensing II
10:05-11:20Room 3G
Regular SessionsWeA205
Slam Ii
10:05-11:20Room 5A
Regular SessionsWeA206
Micro-Nano Robots II
10:05-11:20Room 5B
Regular SessionsWeA207
Biologically-InspiredRobots V
10:05-11:20Room 5C
Regular SessionsWeA208
Parallel Robots II
10:05-11:20Room 5D
Regular SessionsWeA209
Novel Actuators II
10:05-11:20Room 5E
Regular SessionsWeA210
Mapping and NavigationII
10:05-11:20Room 5F
Regular SessionsWeA211
Grasping II
10:05-11:20Room 5H
Regular SessionsWeA212
Cooperative Control forMultiple Robots
10:05-11:20Room 5I
Regular SessionsWeA213
Medical Robots II
10:05-11:20Room 5J
Regular SessionsWeA214
Computer Vision forRobotics andAutomation II
10:05-11:25Room 3A
Video SessionsWeAV215
Video Session II:Humanoid and Service
Robotics
13:40-14:55Room 3B
Regular SessionsWeP101
Collision Avoidance
13:40-14:55Room 3C
Regular SessionsWeP102
Underactuated andTendon/Wire
Mechanisms II
13:40-14:55Room 3D
Regular SessionsWeP103
Humanoid Robots III
13:40-14:55Room 3E
Regular SessionsWeP104
Range Sensing III
13:40-14:55Room 3G
Regular SessionsWeP105
Slam Iii
13:40-14:55Room 5A
Regular SessionsWeP106
Micro-Nano Robots III
13:40-14:55Room 5B
Regular SessionsWeP107
Biologically-InspiredRobots VI
13:40-14:55Room 5C
Regular SessionsWeP108
Rehabilitation RoboticsIII
13:40-14:55Room 5D
Regular SessionsWeP109
Motion and PathPlanning III
13:40-14:55Room 5E
Regular SessionsWeP110
Industrial Automation
13:40-14:55Room 5F
Regular SessionsWeP111
Physical Human-RobotInteraction I
13:40-14:55Room 5H
Regular SessionsWeP112
Learning and AdaptiveSystems I
13:40-14:55Room 5I
Regular SessionsWeP113
Networked Robots
13:40-14:55Room 5J
Regular SessionsWeP114
Computer Vision forRobotics andAutomation III
13:40-14:55Room 3A
Regular SessionsWeP115
Communication SessionIV: Robotic Applications
I
15:25-16:55Room 3B
Regular SessionsWeP201
Cognitive Human-RobotInteraction
15:25-16:55Room 3C
Regular SessionsWeP202
Kinematics of Serial andParallel Robots
15:25-16:55Room 3D
Regular SessionsWeP203
Humanoid Robots IV
15:25-16:55Room 3E
Regular SessionsWeP204
Recognition II
15:25-16:55Room 3G
Regular SessionsWeP205
Slam Iv
15:25-16:55Room 5A
Regular SessionsWeP206
Micro-Nano Robots IV
15:25-16:55Room 5B
Regular SessionsWeP207
Micro-Nano Robots andApplications to Life
Science
15:25-16:55Room 5C
Regular SessionsWeP208
Rehabilitation RoboticsIV
15:25-16:55Room 5D
Regular SessionsWeP209
Motion and PathPlanning IV
15:25-16:55Room 5E
Regular SessionsWeP210
Surveillance, Search andRescue Robotics
15:25-16:55Room 5F
Regular SessionsWeP211
Physical Human-RobotInteraction II
15:25-16:55Room 5H
Regular SessionsWeP212
Learning and AdaptiveSystems II
15:25-16:55Room 5I
Regular SessionsWeP213
Factory Automation
15:25-16:55Room 5J
Regular SessionsWeP214
Visual Tracking
15:25-16:55Room 3A
Regular SessionsWeP215
Communication SessionV: Robotic Applications
II
ICRA 2011 Technical Program Thursday May 12, 2011
01 02 03 04 05 06 07 08 09 10 11 12 13 14
08:20-09:35Room 3B
Regular Sessions ThA101
Aerial Robotics V
08:20-09:35Room 3C
Regular Sessions ThA102
Planning, Scheduling andCoordination
08:20-09:35Room 3D
Regular Sessions ThA103
Path Planning for MultipleRobots I
08:20-09:35Room 3E
Regular Sessions ThA104
Sensor Fusion I
08:20-09:35Room 3G
Regular Sessions ThA105
Manipulation Planning I
08:20-09:35Room 5A
Regular Sessions ThA106
Mechanism Design ofMobile Robots I
08:20-09:35Room 5B
Regular Sessions ThA107
Variable StiffnessActuators I
08:20-09:35Room 5C
Regular Sessions ThA108
Rehabilitation Robotics V
08:20-09:35Room 5D
Regular Sessions ThA109
Redundant Robots
08:20-09:35Room 5E
Regular Sessions ThA110
Localization I
08:20-09:35Room 5F
Regular Sessions ThA111
Grasping, Tactile Sensingand Force Control
08:20-09:35Room 5H
Regular Sessions ThA112
Distributed Robot SystemsIII
08:20-09:35Room 5I
Regular Sessions ThA113
Medical Robots andSystems V
08:20-09:35Room 5J
Regular Sessions ThA114
Computer Vision I: Model
10:05-11:20Room 3B
Regular Sessions ThA201
Nonholonomic MotionPlanning
10:05-11:20Room 3C
Regular Sessions ThA202
Robot Design
10:05-11:20Room 3D
Regular Sessions ThA203
Path Planning for MultipleRobots II
10:05-11:20Room 3E
Regular Sessions ThA204
Sensor Fusion II
10:05-11:20Room 3G
Regular Sessions ThA205
Manipulation Planning II
10:05-11:20Room 5A
Regular Sessions ThA206
Mechanism Design ofMobile Robots II
10:05-11:20Room 5B
Regular Sessions ThA207
Variable StiffnessActuators II
10:05-11:20Room 5C
Regular Sessions ThA208
Soft Material Robotics
10:05-11:20Room 5D
Regular Sessions ThA209
New Sensing andMechanism for Robots
10:05-11:20Room 5E
Regular Sessions ThA210
Localization II
10:05-11:20Room 5F
Regular Sessions ThA211
Vision: 3D
10:05-11:20Room 5H
Regular Sessions ThA212
Dexterous Manipulation
10:05-11:20Room 5I
Regular Sessions ThA213
Medical Robots andSystems VI
10:05-11:20Room 5J
Regular Sessions ThA214
Computer Vision II:Recognition
13:40-14:55Room 3B
Regular Sessions ThP101
Multifingered Hands
13:40-14:55Room 3C
Regular Sessions ThP102
Distributed and NetworkedRobot Systems
13:40-14:55Room 3D
Regular Sessions ThP103
Robot Safety
13:40-14:55Room 3E
Regular Sessions ThP104
Reactive andSensor-Based Planning
13:40-14:55Room 3G
Regular Sessions ThP105
Manipulation Planning III
13:40-14:55Room 5A
Regular Sessions ThP106
Mechanism Design ofMobile Robots III
13:40-14:55Room 5B
Regular Sessions ThP107
Variable Stiffness andImpedance Control I
13:40-14:55Room 5C
Regular Sessions ThP108
Robotic Software,Middleware andProgramming
Environments I
13:40-14:55Room 5D
Regular Sessions ThP109
Motion and Path PlanningV
13:40-14:55Room 5E
Regular Sessions ThP110
Localization III
13:40-14:55Room 5F
Regular Sessions ThP111
Visual Servoing I
13:40-14:55Room 5H
Regular Sessions ThP112
Learning and AdaptiveSystems III
13:40-14:55Room 5I
Regular Sessions ThP113
Medical Robots andSystems VII
13:40-14:55Room 5J
Regular Sessions ThP114
Computer Vision IIINavigation
15:25-16:55Room 3B
Regular Sessions ThP201
Motion Control ofManipulators
15:25-16:55Room 3C
Regular Sessions ThP202
Robot Design forAdvanced Applications
15:25-16:55Room 3D
Regular Sessions ThP203
Sensor Networks
15:25-16:55Room 3E
Regular Sessions ThP204
Tactile Sensing andMultifingered Grasping
15:25-16:55Room 3G
Regular Sessions ThP205
Vision for ObjectRecognition
15:25-16:55Room 5A
Regular Sessions ThP206
Micro and NanoscaleAutomation
15:25-16:55Room 5B
Regular Sessions ThP207
Variable Stiffness andImpedance Control II
15:25-16:55Room 5C
Regular Sessions ThP208
Robotic Software,Middleware andProgramming
Environments II
15:25-16:55Room 5D
Regular Sessions ThP209
Motion and Path PlanningVI
15:25-16:55Room 5E
Regular Sessions ThP210
Wheeled Robots
15:25-16:55Room 5F
Regular Sessions ThP211
Visual Servoing II
15:25-16:55Room 5H
Regular Sessions ThP212
Learning and AdaptiveSystems IV
15:25-16:55Room 5I
Regular Sessions ThP213
Medical Robots andSystems VIII
15:25-16:55Room 5J
Regular Sessions ThP214
Omnidirectional Vision forRobotics
ICRA 2011 schedule79/82
Whole Pizza vs. Chillies• Integrated robotic systems are huge in terms of
– lines of code– methods/disciplines/formalizations involved
Beetz et al: The Assistive Kitchen – A Demonstration Scenario for Cognitive TechnicalSystems
• Existing learning methods tend to address only isolated aspects –identified and formalized by the engineer.
• Machine Learning is used to work with a “full formalization” of thedomain.Will this ever be possible for Robotics?Can we apply ML on the full system level? 80/82
Robotics as big graphical model?(where “graphical model” would include relational structures...)
• Many aspects (Computer Vision, Perception, Feedback Control,Stochastic Optimal Control, MDPs) can already be formalized in termsof probabilistic models and inference.
Would it help to view it all as one big graphical model?
81/82
Thanks for your attention!
82/82
Appendix• Robot dynamics can be desrcribed by the differential equationM(q) q + C(q, q) q + F (q) = u with mass matrix M(q), Coriolis forces C(q, q) andgravity forces F (q). The Newton-Euler algorithm can efficiently (numerically) computeM,C and F for any specific robot configuration (q, q). Using (e.g.) Leap Frog integrationwith time step size τ , the process becomes
qt+1 = qt + τ(qt+1 + qt)/2
qt+1 = qt + τM-1(ut + Cqt + F ) ,qt+1
qt+1
= (I+A)
qtqt
+Bu+ a
A =
0 τ + τ2M-1C/20 τM-1C
, B =
τ2M-1/2τM-1
, a =
τ2M-1F/2τM-1F
• LSPI simplifications made: Actually, LSPI estimates the Q(s, a)-function instead of theV (s)-function, which represents the expected return given the current state and selectedaction. Second, I skipped explaining Policy Iteration: once you estimated the Q-function,you need to update the policy (perhaps collect new data) and iterate re-estimating theQ-function.
83/82