27
Reinforcement Learning and Motion Planning Mrinal Kalakrishnan University of Southern California August 25, 2010 Reinforcement Learning and Motion Planning Mrinal Kalakrishnan

Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Embed Size (px)

Citation preview

Page 1: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learningand Motion Planning

Mrinal KalakrishnanUniversity of Southern California

August 25, 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 2: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 3: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 4: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 5: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 6: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 7: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 8: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 9: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 10: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 11: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 12: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

I Generate initial straight-line trajectoryI Repeat until convergence:

I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!

I Compute costs for each rolloutI Apply PI2 update: reward-weighted average

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 13: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Initial trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 14: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 15: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 16: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 17: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 18: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 19: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 20: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

The algorithm

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 21: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Pole

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 22: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Test Setup

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 23: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Test Results

Condition Success rate

Unconstrained 39 / 42

Constrained 38 / 42

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 24: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Video: Real-world

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 25: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Conclusion

I Optimization-based motion planner that does not requiregradients

I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,

torques)I May handle local minima better than CHOMP (needs

further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning

package, coming soon to a sandbox near you. . .

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 26: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Page 27: Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan