11
Uඖඌඍකඉඋගඝඉගඍඌ Rඊගඑඋඛ Algorithms for Walking, Running, Swimming, Flying, and Manipulation Russ Tedrake © Russ Tedrake, 2014 How to cite these notes Note: These are working notes that will be updated throughout the Fall 2014 semester. Cඐඉගඍක 12 Trajectory Optimization I've argued that optimal control is a powerful framework for specifying complex behaviors with simple objective functions, letting the dynamics and constraints on the system shape the resulting feedback controller (and vice versa!). But the computational tools that we've provided so far have been limited in some important ways. The numerical approaches to dynamic programming which involve putting a mesh over the state space do not scale well to systems with state dimension more than four or five. Linearization around a nominal operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g. using LQR) for even very high-dimensional systems, but the effectiveness of the resulting controllers is limited to the region of state space where the linearization is a good approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis from the last chapter can provide, among other things, an effective way to compute estimates of those regions. But we have not yet provided any real computational tools for approximate optimal control that work for high-dimensional systems beyond the linearization around a goal. That is precisely the goal for this chapter. The big change that is going to allow us to scale to high-dimensional systems is that we are going to give up the goal of solving for the optimal feedback controller for the entire state space, and instead attempt to find an optimal control solution that is valid from only a single initial condition. Instead of representing this as a feedback control function, we can represent this solution as a trajectory, , typically defined over a finite interval. In our graph- search dynamic programming algorithms, we discretized the dynamics of the system on a mesh spread across the state space. This does not scale to high-dimensional systems, and it was difficult to bound the errors due to the discretization. If we instead restrict ourselves to optimizing only a single initial condition, then a different discretization scheme emerges: we can discretize the state and input trajectories over time. 12.1 PකඊඔඍඕFකඕඝඔඉගඑඖ Given an initial condition, , and an input trajectory defined over a finite interval, , we can compute the long-term (finite-horizon) cost of executing that trajectory using the standard additive-cost optimal control objective, Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha... 1 of 11 11/14/2014 9:26 PM

Chapter12 Trajectory Optimization

Embed Size (px)

DESCRIPTION

Chapter12 Trajectory Optimization to a non linear system

Citation preview

Page 1: Chapter12 Trajectory Optimization

UඖඌඍකඉඋගඝඉගඍඌRඊගඑඋඛAlgorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

© Russ Tedrake, 2014How to cite these notes

Note: These are working notes that will be updated throughout the Fall 2014 semester.

Cඐඉගඍක12Trajectory Optimization

I've argued that optimal control is a powerful framework for specifying complex behaviorswith simple objective functions, letting the dynamics and constraints on the system shape theresulting feedback controller (and vice versa!). But the computational tools that we'veprovided so far have been limited in some important ways. The numerical approaches todynamic programming which involve putting a mesh over the state space do not scale well tosystems with state dimension more than four or five. Linearization around a nominaloperating point (or trajectory) allowed us to solve for locally optimal control policies (e.g.using LQR) for even very high-dimensional systems, but the effectiveness of the resultingcontrollers is limited to the region of state space where the linearization is a goodapproximation of the nonlinear dynamics. The computational tools for Lyapunov analysisfrom the last chapter can provide, among other things, an effective way to compute estimatesof those regions. But we have not yet provided any real computational tools for approximateoptimal control that work for high-dimensional systems beyond the linearization around agoal. That is precisely the goal for this chapter.

The big change that is going to allow us to scale to high-dimensional systems is that we aregoing to give up the goal of solving for the optimal feedback controller for the entire statespace, and instead attempt to find an optimal control solution that is valid from only a singleinitial condition. Instead of representing this as a feedback control function, we can representthis solution as a trajectory, , typically defined over a finite interval. In our graph-search dynamic programming algorithms, we discretized the dynamics of the system on amesh spread across the state space. This does not scale to high-dimensional systems, and itwas difficult to bound the errors due to the discretization. If we instead restrict ourselves tooptimizing only a single initial condition, then a different discretization scheme emerges: wecan discretize the state and input trajectories over time.

12.1 PකඊඔඍඕFකඕඝඔඉගඑඖ

Given an initial condition, , and an input trajectory defined over a finite interval,, we can compute the long-term (finite-horizon) cost of executing that trajectory

using the standard additive-cost optimal control objective,

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

1 of 11 11/14/2014 9:26 PM

Page 2: Chapter12 Trajectory Optimization

We will write the trajectory optimization problem as

Some trajectory optimization problems may also include additional constraints, such ascollision avoidance ( can not cause the robot to be inside an obstacle) or input limits (e.g.

), which can be defined for all time or some subset of the trajectory.

As written, the optimization above is an optimization over continuous trajectories. In order toformulate this as a numerical optimization, we must parameterize it with a finite set ofnumbers. Perhaps not surprisingly, there are many different ways to write down thisparameterization, with a variety of different properties in terms of speed, robustness, andaccuracy of the results. We will outline just a few of the most popular below. I wouldrecommend [78] for additional details.

12.2 CඕඝගඉගඑඖඉඔTඔඛඎකNඖඔඑඖඍඉකOගඑඕඑජඉගඑඖ

Before we dive in, we need to take a moment to understand the optimization tools that wewill be using. In the graph-search dynamic programming algorithm, we magically were ableto provide an iterative algorithm that was known to converge to optimal cost-to-go function.With LQR we were able to reduce the problem to a matrix Riccati equation, for which wehave scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able toformulate a very specific kind of optimization problem--a semi-definite program (orSDP)--which is a subset of convex optimization, and relied on custom solvers like SeDuMior Mosek to solve the problems for us. Convex optimization is a hugely important subset ofnonlinear optimization, in which we can guarantee that the optimization has no "localminima". In this chapter we won't be so lucky, the optimizations that we formulate may havelocal minima and the solution techniques will at best only guarantee that they give a locallyoptimal solution.

The generic formulation of a nonlinear optimization problem is

where is a vector of decision variables, is a scalar objective function and is a vector ofconstraints. Note that, although we write , this formulation captures positivityconstraints on the decision variables (simply mulitply the constraint by ) and equalityconstraints (simply list both and ) as well.

The picture that you should have in your head is a nonlinear, potentially non-convexobjective function defined over (multi-dimensional) , with a subset of possible valuessatisfying the constraints.

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

2 of 11 11/14/2014 9:26 PM

Page 3: Chapter12 Trajectory Optimization

Figure 12.1 - One-dimensional cartoon of a nonlinear optimization problem. Thered dots represent local minima. The blue dot represents the optimal solution.

Note that minima can be the result of the objective function having zero-derivative or due tothe a sloped objective up against a constraint.

Numerical methods for solving these optimization problems require an initial guess, , andproceed by trying to move down the objective function to a minima. Common approachesinclude gradient descent, in which the gradient of the objective function is computed orestimated, or second-order methods such as sequential quadratic programming (SQP) whichattempts to make a local quadratic approximation of the objective function and local linearapproximations of the constraints and solves a quadratic program on each iteration to jumpdirectly to the minimum of the local approximation.

While not strictly required, these algorithms can often benefit a great deal from having thegradients of the objective and constraints computed explicitly; the alternative is to obtainthem from numerical differentiation. Beyond pure speed considerations, I strongly prefer tocompute the gradients explicitly because it can help avoid numerical accuracy issues that cancreep in with finite difference methods. The desire to calculate these gradients will be amajor theme in the discussion below, and we have gone to great lengths to provide explicitgradients of our provided functions and automatic differentiation of user-provided functionsin Dඋൺൾ.

When I started out, I was of the opinion that there is nothing difficult about implementinggradient descent or even a second-order method, and I wrote all of the solvers myself. I nowrealize that I was wrong. The commercial solvers available for nonlinear programming aresubstantially higher performance than anything I wrote myself, with a number of tricks,subtleties, and parameter choices that can make a huge difference in practice. Some of thesesolvers can exploit sparsity in the problem (e.g., if the constraints in a sparse way on thedecision variables). Nowadays, we make heaviest use of SNOPT [79], which now comesbundled with the precompiled distributions of Dඋൺൾ, but also support fmincon from theOptimization Toolbox in MATLAB. Note that while I do advocate using these tools, you donot need to use them as a black box. In many cases you can improve the optimizationperformance by understanding and selecting non-default configuration parameters.

12.3 Tකඉඒඍඋගකඡගඑඕඑජඉගඑඖඉඛඉඖඖඔඑඖඍඉකකඏකඉඕ

12.3.1 Direct Transcription

Let us start by representing the finite-time trajectories, and , by theirvalues at a series of break points, , and denoting the values at those points (akathe knot points) and , respectively.

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

3 of 11 11/14/2014 9:26 PM

Page 4: Chapter12 Trajectory Optimization

Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinearprogram is to fix the break points at even intervals, , and use Euler integration to write

Note that the decision variables here are , because is given,and does not appear in the cost nor any of the constraints. It is easy to generalize thisapproach to add additional costs or constraints on and/or . (Note that this formulationdoes not actually benefit from the additive cost structure, so more general cost formulationsare also possible.) Computing the gradients of the objective and constraints is essentially assimple as computing the gradients of and .

Eචඉඕඔඍ12.1 (Direct Transcription for the Double Integrator)

We have implemented an optimization class hierarchy in Dඋൺൾwhich makes it easy to tryout these algorithms. Watching the way that they perform on our simple problems is a verynice way to gain intuition. Here is some simple code to solve the (time-discretized)minimum-time problem for the double integrator.

% note: requires Drake ver >= 0.9.7

cd(fullfile(getDrakePath,'examples'));DoubleIntegrator.runDirtran;

% make sure you take a look at the code!edit('DoubleIntegrator.runDirtran')

Nothing compares to running it yourself, and poking around in the code. But you can alsoclick here to watch the result. I hope that you recognize the parabolic trajectory from theinitial condition up to the switching surface, and then the second parabolic trajectory downto the origin. You should also notice that the transition between and isimperfect, due to the discretization error. As an exercise, try increasing the number of knotpoints (the variable N in the code) to see if you can get a sharper response, like this.

If you take a moment to think about what the direct transcription algorithm is doing, you willsee that by satisfying the dynamic constraints, the optimization is effectively solving the(Euler approximation of the) differential equation. But instead of marching forward throughtime, it is minimizing the inconsistency at each of the knot points simultaneously. While it'seasy enough to generalize the constraints to use higher-order integration schemes, payingattention to the trade-off between the number of times the constraint must be evaluated vs thedensity of the knot points in time, if the goal is to obtain smooth trajectory solutions thenanother approach quickly emerges: the approach taken by the so-called collocation methods.

12.3.2 Direct Collocation

In direct collocation (c.f., [80]), both the input trajectory and the state trajectory arerepresented explicitly as piecewise polynomial functions. In particular, the sweet spot for thisalgorithm is taking to be a first-order polynomial -- allowing it to be completely defined

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

4 of 11 11/14/2014 9:26 PM

Page 5: Chapter12 Trajectory Optimization

by the values at the knot points -- and to be a Hermite cubic polynomial -- completelydefined by the values and derivatives at the knot points. The state derivatives at the knotpoints are given as a function of and by the plant dynamics, so the entire spline iscompletely defined by the values of and at the knot points. To add the additionalconstraint that is dynamically consistent with and , we add an additional set ofconstraints that the derivative at a set of collocation points also matches the plant dynamics.For the special case of cubic polynomial state trajectories and piecewise linear inputtrajectories, the derivative at the midpoint of each segment is particularly easy to compute,making it the natural choice.

Figure 12.2 - Cubic spline parameters used in the direct collocation method.

Once again, direct collocation effectively integrates the equations of motion by satisfying theconstraints of the optimization -- this time producing a third-order approximation of thedynamics with effectively two evaluations of the plant dynamics per segment. [81] claims,without proof, that as the break points are brought closer together, the trajectory willconverge to a true solution of the differential equation. Once again it is very natural to addadditional terms to the cost function or additional input/state constraints, and very easy tocalculate the gradients of the objective and constraints. I personally find it very nice toexplicitly account for the parametric encoding of the trajectory in the solution technique.

Eචඉඕඔඍ12.2 (Direct Collocation for the Double Integrator)

Direct collocation also easily solves the minimum-time problem for the double integrator.The performance is similar to direct transcription, but the convergence is visibly different.Try it for yourself:

% note: requires Drake ver >= 0.9.7

cd(fullfile(getDrakePath,'examples'));DoubleIntegrator.runDircol;

% make sure you take a look at the code!edit('DoubleIntegrator.runDircol')

12.3.3 Shooting Methods

In the methods described above, by asking the optimization package to perform thenumerical integration of the equations of motion, we are effectively over-parameterizing theproblem. In fact, the optimization is perfectly well defined if we restrict the decision

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

5 of 11 11/14/2014 9:26 PM

Page 6: Chapter12 Trajectory Optimization

variables to only--we can compute ourselves by knowing , and. This is exactly the approach taken in the shooting methods.

Computing gradients

Again, providing gradients of the objectives and constraints to the solver is not strictlyrequired -- most solvers will obtain them from finite differences if they are not provided --but I feel strongly that the solvers are faster and more robust when exact gradients areprovided. Now that we have removed the decision variables from the program, we have totake a little extra care to compute the gradients. This is still easily accomplished using thechain rule. To be concise (and slightly more general), let us define

as the discrete-time approximation of the continuous dynamics;for example, the forward Euler integration scheme used above would give

Then we have

where the gradient of the state with respect to the inputs can be computed during the"forward simulation",

These simulation gradients can also be used in the chain rule to provide the gradients of anyconstraints. Note that there are a lot of terms to keep around here, on the order of (state dim)

(control dim) (number of timesteps). Ouch. Note also that many of these terms are zero;for instance with the Euler integration scheme above if .

The special case of optimization without state constraints

By solving for ourselves, we've removed a large number of constraints from theoptimization. If no additional state constraints are present, and the only gradients we need tocompute are the gradients of the objective, then a surprisingly efficient algorithm emerges.I'll give the steps here without derivation, but will derive it in the Pontryagin section below:

Simulate forward:

from

1.

Calculate backwards:

from

2.

Extract the gradients:3.

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

6 of 11 11/14/2014 9:26 PM

Page 7: Chapter12 Trajectory Optimization

with .

Here is a vector the same size as which has an interpretation as .

The equation governing is known as the adjoint equation, and it represents a dramaticefficient improvement over calculating the huge number of simulation gradients describedabove. In case you are interested, the adjoint equation is known as the backpropagationalgorithm in the neural networks literature and it is one of the primary reasons that trainingneural networks became so popular in the 1980's; super fast GPU implementations of thisalgorithm are also one of the reasons that deep learning is incredibly popular right now (theavailability of massive training databases is perhaps the other main reason).

To take advantage of this efficiency, advocates of the shooting methods often use penaltymethods instead of enforcing hard state constraints; instead of telling the solver about theconstraint explicitly, you simply add an additional term to the cost function which gives alarge penalty commesurate with the amount by which the constraint is violated. These are notquite as accurate and can be harder to tune (you'd like the cost to be high compared to othercosts, but making it too high can lead to numerical conditioning issues), but they can work.

12.3.4 Discussion

Although the decision about which algorithm is best may depend on the situation, in ourwork we have come to favor the direct collocation method (and occasionally directtranscription) for most of our work. There are a number of arguments for and against eachapproach; I will try to discuss a few of them here.

Solver performance

Numerical conditioning. Tail wagging the dog.

Sparse constraints. Potential for parallel evaluation of the constraints (computing thedynamics and their derivatives are often the most expensive part).

Providing an initial guess

to avoid local minima. direct transcription and collocation can take an initial guess in , too.

Implicit dynamics

Another potential advantage of the direct transcription and collocation methods is that thedynamics constraints can be written in implicit form.

Variations in the problem formulation

There are number of useful variations to the problem formulations I've presented above. Byfar the most common is the addition of a terminal cost, e.g.:

These terms are easily added to the cost function in the any of methods, and the adjointequations of the shooting method simply require the a modified terminal condition

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

7 of 11 11/14/2014 9:26 PM

Page 8: Chapter12 Trajectory Optimization

when computing the gradients.

Another common modification is including the spacing of the breakpoints as additionaldecision variables. This is particularly easy in the direct transcription and collocationmethods, and can also be worked into the shooting methods. Typically we add a lower-boundon the time-step so that they don't all vanish to zero.

Accuracy of numerical integration

One potential complaint about the direct transcription and collocation algorithms is that wetend to use simplistic numerical integration methods and often fix the integration timestep(e.g. by choosing Euler integration and selecting a ); it is difficult to bound the resultingintegration errors in the solution. One tantalizing possibility in the shooting methods is thatthe forward integration could be accomplished by more sophisticated methods, likevariable-step integration. But I must say that I have not had much success with this approach,because while the numerical accuracy of any one function evaluation might be improved,these integration schemes do not necessarily give smooth outputs as you make incrementalchanges to the initial conditions and control (changing by could result in taking adifferent number of steps in the integration scheme). This lack of smoothness can wreakhavoc on the convergence of the optimization. If numerical accuracy is a premium, then Ithink you will have more success by imposing consistency constraints (e.g. as in theRunge-Kutta 4th order simulation with 5th order error checking method) as additionconstraints on the time-steps; shooting methods do not have any particular advantage here.

12.4 Pඖගකඉඡඏඑඖ'ඛMඑඖඑඕඝඕPකඑඖඋඑඔඍ

The tools that we've been developing for numerical trajectory optimization are closely tied totheorems from (analytical) optimal control. Let us take one section to appreciate thoseconnections.

What precisely does it mean for a trajectory, , to be locally optimal? It means that ifI were to perturb that trajectory in any way (e.g. change by ), then I would either incurhigher cost in my objective function or violate a constraint. For an unconstrainedoptimization, a necessary condition for local optimality is that the gradient of the objective atthe solution be exactly zero. Of course the gradient can also vanish at local maxima or saddlepoints, but it certainly must vanish at local minima. We can generalize this argument toconstrained optimization using Lagrange multipliers.

12.4.1 Constrained optimization with Lagrange multipliers

Given the equality-constrained optimization problem

where is a vector. Define a vector of Lagrange multipliers, the same size as , and thescalar Lagrangian function,

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

8 of 11 11/14/2014 9:26 PM

Page 9: Chapter12 Trajectory Optimization

A necessary condition for to be an optimal value of the constrained optimization is thatthe gradients of vanish with respect to both and :

Note that , so requiring this to be zero is equivalent to requiring the constraints tobe satisfied.

Eචඉඕඔඍ12.3 (Optimization on the unit circle)

Consider the following optimization:

The level sets of are straight lines with slope , and the constraint requires that thesolution lives on the unit circle.

Simply by inspection, we can determine that the optimal solution should beLet's make sure we can obtain the same result using Lagrange multipliers.

Formulating

we can take the derivatives and solve

Given the two solutions which satisfy the necessary conditions, the negative solution isclearly the minimizer of the objective.

12.4.2 Lagrange multiplier derivation of the adjoint equations

Let us use Lagrange multiplizers to derive the necessary conditions for our constrainedtrajectory optimization problem in discrete time

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

9 of 11 11/14/2014 9:26 PM

Page 10: Chapter12 Trajectory Optimization

Formulate the Lagrangian,

and set the derivatives to zero to obtain the adjoint equation method described for theshooting algorithm above:

Therefore, if we are given an initial condition and an input trajectory , we can verifythat it satisfies the necessary conditions for optimality by simulating the system forward intime to solve for , solving the adjoint equation backwards in time to solve for , andverifying that for all . The fact that when and follows

from some basic results in the calculus of variations.

12.4.3 Necessary conditions for optimality in continuous time

You won't be suprised to hear that these necessary conditions have an analogue in continuoustime. I'll state it here without derivation. Given the initial conditions, , a continuousdynamics, , and the instantaneous cost , for a trajectorydefined over to be optimal is must satisfy the conditions that

In fact the statement can be generalized even beyond this to the case where has constraints.The result is known as Pontryagin's minimum principle -- giving necessary conditions for atrajectory to be optimal.

Tඐඍකඍඕ12.1 - Pontragin's Minimum Principle

Adapted from [82]. Given the initial conditions, , a continuous dynamics,

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

10 of 11 11/14/2014 9:26 PM

Page 11: Chapter12 Trajectory Optimization

, and the instantaneous cost , for a trajectory definedover to be optimal is must satisfy the conditions that

Note that the terms which are minimized in the final line of the theorem are commonlyreferred to as the Hamiltonian of the optimal control problem,

It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering thathas an interpretation as , you should also recognize it from the HJB.

12.5 Tකඉඒඍඋගකඡගඑඕඑජඉගඑඖඉඛඉඋඖඞඍචගඑඕඑජඉගඑඖ

12.5.1 Linear systems with convex linear constraints

An important special case. Linear/Quadratic objectives results in an LP/QP Convexoptimization.

12.5.2 Differential Flatness

12.5.3 Mixed-integer convex optimization for non-convex constraints

12.6 LඋඉඔTකඉඒඍඋගකඡFඍඍඌඊඉඋඓDඍඛඑඏඖ

Once we have obtained a locally optimal trajectory from trajectory optimization, there is stillwork to do...

12.6.1 Model-predictive control

12.6.2 Time-varying LQR

Take , and , then apply finite-horizon LQR (seethe LQR chapter).

12.7 IගඍකඉගඑඞඍLQR ඉඖඌDඑඎඎඍකඍඖගඑඉඔDඡඖඉඕඑඋPකඏකඉඕඕඑඖඏ

12.8 CඉඛඍSගඝඌඡ: A ඏඔඑඌඍකගඐඉගඋඉඖඔඉඖඌඖඉඍකඋඐඔඑඓඍඉඊඑකඌ

Next chapter

Underactuated Robotics © Russ Tedrake, 2014

Underactuated Robotics http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

11 of 11 11/14/2014 9:26 PM