Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning

Reinforcement Learningand Motion Planning

Mrinal KalakrishnanUniversity of Southern California

August 25, 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010


Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010


Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima


Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima


Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!























Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel


Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel


The algorithm

I Generate initial straight-line trajectoryI Repeat until convergence:

I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!

I Compute costs for each rolloutI Apply PI2 update: reward-weighted average


The algorithm

Initial trajectory


The algorithm

Noisy rollout


The algorithm

Noisy rollout


The algorithm

Noisy rollout


The algorithm

Noisy rollout


The algorithm

Noisy rollout


The algorithm

Noisy rollout


The algorithm

Updated trajectory


Video: Pole

Updated trajectory


Video: Test Setup


Test Results

Condition Success rate

Unconstrained 39 / 42

Constrained 38 / 42


Video: Real-world


Conclusion

I Optimization-based motion planner that does not requiregradients

I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,

torques)I May handle local minima better than CHOMP (needs

further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning

package, coming soon to a sandbox near you. . .


Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage


Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage


Documents

Reinforcement Learning and Motion Planning - ros.org · I State-of-the-art: Policy Improvement with Path Integrals - Theodorou et al., 2010 Mrinal Kalakrishnan Reinforcement Learning