Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization
Stephen G. Nash
George Mason University
Joint with R. Michael Lewis
College of William & Mary

Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization
Stephen G. Nash

Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization

Stephen G. NashGeorge Mason University

Joint with R. Michael LewisCollege of William & Mary

• Optimize a high-fidelity model:minimize fh(a)subject to <constraints>

• Also available: an easier-to-solve low-fidelity model:

minimize fH(a)subject to <constraints>

• How can you exploit the low-fidelity model?

Some Applications

• PDE-constrained optimization• Aeronautical design• Nano-porous materials• Image processing• VLSI design

In many cases, there may be a hierarchy of lower-fidelity models

Example: Minimal SurfaceN=92, flops=2x106

N=182, flops=2x107

N=272, flops=1x108

N=32, flops=8x104

An ExampleModel Framework

• An optimization model governed by a system of differential equations

• S(a,u) = 0: system of PDEs Design variables: a State variables: u Vary the discretization

0))(,(subject to




User-supplied Information

• Procedure to solve S (a,u ) = 0 for u given a

• Procedure to evaluate Fh (a,u ) and a Fh (a,u) for any level h

• Procedures to implement downdate IhH and

update IHh operators

IHh = <constant> (Ih


Some Simplifications (for this talk)

• Either: No constraints in optimization models, or Constraint equations solved exactly

• But computational approaches are designed to extend to the constrained case: avoid explicit use of (reduced) Hessian only need Hessian-vector products do not assume sparsity or known sparsity


Model Management: Algorithmic Template

• Given some initial guess ak of the solution: set a(1) ← ak

(pre-smoothing) partially minimize fh to get a(1)

(recursion) Compute Obtain a(2) by solving

subject to bounds on a. Define search direction line search: a(3) a(1) + e

(post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(3)

)()( )1()1( afafv hH

avafaf THs )()(min

)1()2( aae

Multilevel (no coarsening): Algorithmic Template

• Given some initial guess ak of the solution: set a(0) ← ak • (pre-smoothing) partially minimize fh to get a(1)

• (recursion) Compute Obtain a(2) by solving

subject to bounds on a. Define search direction line search: a(3) a(1) + e

• (post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(4)

)()( )1()1( afafv hH

avafaf THs )()(min

)1()2( aae

Multilevel: MG/OptAlgorithmic Template

• Given some initial guess ak of the solution: set a(0) ← ak • (pre-smoothing) partially minimize fh to get a(1)

• (recursion) Compute Obtain a(2) by solving

subject to bounds on a. Define search direction line search: a(3) a(1) + e

• (post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(4)

)()( )1()1( afIaIfv hHh


avafaf THs )()(min

)1()2( aaIe hH

The Reduced Hessian

• Properties of the reduced Hessian govern the behavior of MG/Opt

• Not the same as the PDE S(a,u): E.g., hyperbolic PDE, elliptic reduced Hessian

• If L = Lagrangian, Sa and Su = Jacobians:

• We don’t know its properties or sparsity pattern






Some of the Justifications

• Richer class of models• Guarantees of convergence• Better operator properties than for

PDEs alone• Good performance (even far from

solution)• Connection to other optimization


Justification #1:

Richer Class of Models

Optimization Models are More Flexible

• Applies to a large variety of optimization models and constraints Not just for solving PDEs

• Can add additional constraints: Bounds Inequalities

• True generalization of multigrid

Justification #2:


Analogy: Nonlinear equations vs. Optimization

• If we solve optimality conditions

• If we minimize

( ) 0f x

lim ( ) 0f x

( )f x

lim ( ) 0f x

• If underlying optimization algorithm is

guaranteed to converge (to a stationary point) without multilevel strategy

• Then MG/Opt is guaranteed to converge (to a

stationary point)

Justification #3:

Ellipticity & Convexity

When will MG/Opt work well?

• convex ≈ elliptic ≈ positive definite ≈ “nice”

• The reduced Hessian will be positive (semi) definite at the solution

• Multigrid works well for elliptic PDEs• Optimization methods work well on

convex problems

A Sample Model Problem

• Match a target function u*:

• Where u (a) solves the 1-way wave eqn.:

• With• Computations use c = constant = 1


( ,0) ( )t xu cu

u x a x

0 1, 0 1x t

2 2

* *( , ( )) ( )x xf a u a u u u u

Model Problem: Wave Eqn.

• Hyperbolic equation• Initial value moves

without dissipation or dispersion

• Multigrid methods (applied to constraint alone) are not ideal: usual approach is to march forward in time

-5 -4 -3 -2 -1 0 1 2 3 4 5-2




2Initial Solution

-5 -4 -3 -2 -1 0 1 2 3 4 5-2




2Solution at t=1

Model Problem: Analysis of Continuous Problem

• The reduced Hessian is

• This is like the 1-dimensional Laplacian• Ideal for multigrid• Likely to cause difficulties for general-

purpose large-scale optimization methods • Analogous results for discretized model






Justification #4:

Computational Performance

Model Problem:Computations

Model Problem: Computations (cont.)• # of design variables: n=1025 [1,051,650 total variables: n(n+1)]


n=512 n=257 n=129 n=65 n=33


it 99

ls 100

cg 967

Successive refinement

it 23 25 25 25 25 19

ls 100 26 26 26 26 20

cg 956 216 225 214 220 145


it 10 12 14 16 18 232

ls 20 24 28 32 36 242

cg 57 65 79 89 112 1974

Choice of Comparative Algorithms

• Why only compare MG/Opt with traditional optimization algorithms (and not MG for systems of equations)? Inequality constraints may be present Optimality conditions not elliptic in

constrained case Hard to derive reduced Hessian/system

(thus hard to identify a good preconditioner) No obvious relationship between original

optimization model and reduced system

Justification #5:

Relation to other Optimization Methods

MG/Opt & Steepest Descent

• Coarse-level problem is a first-order approximation to the fine level problem

Gradient of coarse-level problem at aH =

IhH [gradient of fine-level problem at ah]

• Analogous to the first-order approximation used to derive the steepest-descent method

MG/Opt & Newton’s Method

• Multilevel line search: let Well-scaled search direction: Search direction of the form

• If subproblems solved accurately, then:

• Search direction is “Newton-like”

'(1) 0s h

h H He I e

)()( heafs




TH eOefIfIes

• Apply algorithm (e.g., model management) Suppose that it does not work well

• Why not? Examine results of diagnostic tests Performed as part of optimization algorithm

– Diagnostic tests have low overhead– Analogous to condition-number estimators

• Now what options do you have? Manual versus automatic

Critical Condition

• Multilevel:

• Can be automatically guaranteed through additive (as here) or multiplicative corrections

• Convergence is guaranteed regardless of the quality of the approximate models

[email protected] G. Nash, George Mason University

)()( )1()1( afIaIf hHh


Sufficient to ConsiderFour Properties

• Nonlinearity• Model Consistency• Level Complementarity• Separability across Levels

Some assessment tests assume use of truncated-Newton method (TN) based on conjugate-gradient method (CG)

Some tests assume coarsening: ah → aH

Diagnostic Test #1: Nonlinearity

Is the optimization search direction well-scaled?

TN Search Directions

• Let • Line search: approximates • For the search directions from TN:

• Test: Is

)()( pafv

3)1(' pOv

)(min v

?0)1(' v

Diagnostic Test #2: Model Consistency

Compare predicted and actual reductions in the multilevel line search

Predicted & Actual Reduction

• Predicted reduction: reduction in coarse-level objective (via standard


• Actual reduction: reduction in fine-level objective (via multilevel

line search)

• Difference between (scaled) actual & predicted:



02 ))(()()()(


1 HHhH




TH eOeIafIafe

consistency of problems nonlinearity

Diagnostic Test #3: Level Complementarity

Does the coarse level correspond to the near null space of the fine-level Hessian?

Algebraic Smoothness

• Optimizer: TN based on conjugate-gradient CG reduces error corresponding to large

eigenvalues on the fine level Complementary components correspond to

small eigenvalues (“near null space”)• Does the coarse level correspond to the

near null space of the reduced Hessian? Extend ideas from adaptive algebraic

multigrid for linear problems …

Near Null-Space

• The error in the design variables should lie in the near null-space of the reduced Hessian

• Generalized Rayleigh quotient should be small:










Reduced Hessian(not known)

Error in design variables (not known)

Practical Test

• We must estimate: Norm of reduced Hessian (estimate via CG

method) Error in the design variables (use the multilevel

search direction)

• Test: Is small?





hh eeGT




Multilevel search direction

Norm estimate from CG method

Matrix-vector product (as in TN)

)( hh eR

Diagnostic Test #4: Separability across Levels

Compare corresponding fine-level and coarse-level Hessian-vector products

• Can the fine-level and coarse-level components of the solution be computed separately?

• How much do they interact?• Is the reduced Hessian (nearly) block

diagonal in terms of fine-level and coarse-level components?

[email protected] Stephen G. Nash, George Mason University

Is it Possible to Test for Separability?

• How do you test for separability of the reduced Hessian when: You don’t compute the Hessian You can’t construct/analyze the Hessian You only have function & gradient values

and update & downdate operators

• Our test is based on Hessian-vector products: already estimated by TN

Rough Idea

• Write reduced Hessian in block form, based on high/low frequencies:

• Use “perfect” update/downdate operators:

• Compare coarse/fine Hessian-vector products:









hH p








hHh pGpIGpGI )()( )()( 0 if separable

Sufficiency of the Diagnostic Tests

Perturbation Analysis

• Apply MG/Opt to

• Assume user-supplied procedures are correct

• Assume nonlinearity test satisfied• Then MG/Opt solves a perturbed problem

• How large are the perturbations?

sorder term-higher)(min 21 h


hThh baaGaaF

)()(min 21 bbaaGGa h



Perturbation Analysis (cont.)


hhhH GGGGG 12

rmssimilar te)(1 hhhhhH bGGb

small: separability

small: model consistency

small: level complementarity

What if the diagnostic tests are not satisfied?

• Further analysis based on problem-specific techniques

• Nonlinearity Is it worthwhile to use a sophisticated optimization

method far from the solution?• Model Consistency

Over-coarsening? Programming errors?

• Level Complementarity Add or improve preconditioner

• Separability Use a different optimization method? Delay using multilevel until closer to solution of

optimization problem?

Precision of the Diagnostic Tests

Computational Tests

• Tests based on specified choices for the reduced Hessian

• Test problems chosen to isolate a particular property and measure sensitivity of diagnostic tests Multilevel already known to work well

• Ideal case: reduced Hessian is a discretized Laplacian

• Assume nonlinearity test satisfied: use quadratic optimization problems Nonlinearity test has been studied in other


Level Complementarity

• Laplacian versus Laplacian with permuted eigenvalues

• Satisfies separability and problem consistencynH nh Laplacian

Permuted Ratio

7 15 0.03 0.34 11.7

15 31 0.04 0.21 4.8

31 63 0.04 0.57 13.6

63 127 0.07 0.30 4.5

127 255 0.13 0.25 1.9

255 511 0.20 0.34 1.7

511 1023 0.26 0.77 3.0

• Diagonalize Laplacian:

• Test problems:

• R is random, with norm 1• Satisfies problem consistency and, for

small values of , level complementarity



hh VD






hh VDR



Separability (cont.)

Model Consistency

• Test problems derived from discretized Laplacian

• Q is orthogonal

• R is random, with norm 1• Satisfies level complementarity and



Problem Consistency (cont.)

• Introduction• Model management and multilevel methods• Justification for optimization-based

multilevel methods• Diagnostic tests for multilevel methods

Related Research

• vast literature on multigrid methods for PDEs

• optimization-based multigrid methods Based on full approximation scheme

(Brandt, 1977) applied to optimality conditions for optimization model

Lewis & Nash (2005)– SIAM J. Sci. Comp., v. 26, pp. 1811-1837

• model management Alexandrov & Lewis (2001)

– Optimization and Engineering, v. 2, pp. 413-430

Related Research (cont.)

• diagnostic tests and related ideas for optimization-based multilevel methods

– Nash & Lewis (2008)- www.math.wm.edu/~buckaroo/pubs/LeNa08a.pdf

adaptive algebraic multigrid:– Brandt (1977)

- Math. Comp., v. 31, pp. 333-390– Brannick & Zikatanov (2006)

- Tech. Report, Penn. State University– Brezina, et al. (2006)

- SIAM J. Sci. Comp., v. 27, pp. 1261-1286.

stopping rules for inexact Newton methods: – Eisenstat & Walker (1996)

- SIAM J. Sci. Comp., v. 4, pp. 16-32


