Upload
dinhkhuong
View
217
Download
0
Embed Size (px)
Citation preview
Reinforcement Learningand Motion Planning
Mrinal KalakrishnanUniversity of Southern California
August 25, 2010
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Reinforcement Learning
I Holy grail of learning for roboticsI Curse of dimensionality. . .
I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement
with Path Integrals - Theodorou etal., 2010
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Reinforcement Learning
I Holy grail of learning for roboticsI Curse of dimensionality. . .
I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement
with Path Integrals - Theodorou etal., 2010
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Motion Planning
I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal
I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Motion Planning
I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal
I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Our New Motion Planner
Apply PI2 to motion planning
I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:
I A is an acceleration differentiation matrixI R measures squared accelerations
I Cost = control cost + state costsI State costs can include:
I Collision costI Energy costI Constraint violation costI Need not be differentiable!
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Our New Motion Planner
Apply PI2 to motion planning
I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:
I A is an acceleration differentiation matrixI R measures squared accelerations
I Cost = control cost + state costsI State costs can include:
I Collision costI Energy costI Constraint violation costI Need not be differentiable!
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Our New Motion Planner
Apply PI2 to motion planning
I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:
I A is an acceleration differentiation matrixI R measures squared accelerations
I Cost = control cost + state costsI State costs can include:
I Collision costI Energy costI Constraint violation costI Need not be differentiable!
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Our New Motion Planner
Apply PI2 to motion planning
I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:
I A is an acceleration differentiation matrixI R measures squared accelerations
I Cost = control cost + state costsI State costs can include:
I Collision costI Energy costI Constraint violation costI Need not be differentiable!
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Collision cost
I Distance field / distance transformI Answers clearance and penetration
depth queriesI Voxelize robot body and add up
costs for each voxel
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Collision cost
I Distance field / distance transformI Answers clearance and penetration
depth queriesI Voxelize robot body and add up
costs for each voxel
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
I Generate initial straight-line trajectoryI Repeat until convergence:
I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!
I Compute costs for each rolloutI Apply PI2 update: reward-weighted average
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Initial trajectory
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Noisy rollout
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
The algorithm
Updated trajectory
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Video: Pole
Updated trajectory
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Video: Test Setup
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Test Results
Condition Success rate
Unconstrained 39 / 42
Constrained 38 / 42
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Video: Real-world
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Conclusion
I Optimization-based motion planner that does not requiregradients
I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,
torques)I May handle local minima better than CHOMP (needs
further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning
package, coming soon to a sandbox near you. . .
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Future Work
I Torque optimalityI Trajectory libraries, cached plans
Thanks:I Sachin ChittaI Peter PastorI Willow Garage
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan
Future Work
I Torque optimalityI Trajectory libraries, cached plans
Thanks:I Sachin ChittaI Peter PastorI Willow Garage
Reinforcement Learning and Motion PlanningMrinal Kalakrishnan