Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization Copyright, 1996 © Dale Carnegie & Associates, Inc. Stephen G. Nash

Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization

Copyright, 1996 © Dale Carnegie & Associates, Inc.

Stephen G. NashGeorge Mason University

Joint with R. Michael LewisCollege of William & Mary

[email protected] G. Nash, George Mason University

Outline

• Introduction• Model management and multilevel methods• Justification for optimization-based

multilevel methods• Diagnostic tests for multilevel methods

Setting

• Optimize a high-fidelity model:minimize fh(a)subject to <constraints>

• Also available: an easier-to-solve low-fidelity model:

minimize fH(a)subject to <constraints>

• How can you exploit the low-fidelity model?


Some Applications

• PDE-constrained optimization• Aeronautical design• Nano-porous materials• Image processing• VLSI design

In many cases, there may be a hierarchy of lower-fidelity models


[email protected] Stephen G. Nash, George Mason University

Example: Minimal SurfaceN=92, flops=2x106

N=182, flops=2x107

N=272, flops=1x108

N=32, flops=8x104


An ExampleModel Framework

• An optimization model governed by a system of differential equations

• S(a,u) = 0: system of PDEs Design variables: a State variables: u Vary the discretization

0))(,(subject to

))(,()(minimize

auaS

auaFaf


User-supplied Information

• Procedure to solve S (a,u ) = 0 for u given a

• Procedure to evaluate Fh (a,u ) and a Fh (a,u) for any level h

• Procedures to implement downdate IhH and

update IHh operators

IHh = <constant> (Ih

H)T


Outline



Some Simplifications (for this talk)

• Either: No constraints in optimization models, or Constraint equations solved exactly

• But computational approaches are designed to extend to the constrained case: avoid explicit use of (reduced) Hessian only need Hessian-vector products do not assume sparsity or known sparsity

pattern



Model Management: Algorithmic Template

• Given some initial guess ak of the solution: set a(1) ← ak

(pre-smoothing) partially minimize fh to get a(1)

(recursion) Compute Obtain a(2) by solving

subject to bounds on a. Define search direction line search: a(3) a(1) + e

(post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(3)

)()( )1()1( afafv hH

avafaf THs )()(min

)1()2( aae


Multilevel (no coarsening): Algorithmic Template

• Given some initial guess ak of the solution: set a(0) ← ak • (pre-smoothing) partially minimize fh to get a(1)

• (recursion) Compute Obtain a(2) by solving


• (post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(4)

)()( )1()1( afafv hH

avafaf THs )()(min

)1()2( aae


Multilevel: MG/OptAlgorithmic Template

• Given some initial guess ak of the solution: set a(0) ← ak • (pre-smoothing) partially minimize fh to get a(1)

• (recursion) Compute Obtain a(2) by solving


• (post-smoothing) partially minimize fh to get a(4)

• Set ak+1 ← a(4)

)()( )1()1( afIaIfv hHh

HhH

avafaf THs )()(min

)1()2( aaIe hH


The Reduced Hessian

• Properties of the reduced Hessian govern the behavior of MG/Opt

• Not the same as the PDE S(a,u): E.g., hyperbolic PDE, elliptic reduced Hessian

• If L = Lagrangian, Sa and Su = Jacobians:

• We don’t know its properties or sparsity pattern

auuuua

uauaauauaa

SSLSS

LSSSSLLf1**

**12


Outline




Some of the Justifications

• Richer class of models• Guarantees of convergence• Better operator properties than for

PDEs alone• Good performance (even far from

solution)• Connection to other optimization

methods

Justification #1:

Richer Class of Models


Optimization Models are More Flexible

• Applies to a large variety of optimization models and constraints Not just for solving PDEs

• Can add additional constraints: Bounds Inequalities

• True generalization of multigrid

Justification #2:

Convergence


Analogy: Nonlinear equations vs. Optimization

• If we solve optimality conditions

• If we minimize

( ) 0f x

lim ( ) 0f x

( )f x

lim ( ) 0f x


Convergence

• If underlying optimization algorithm is

guaranteed to converge (to a stationary point) without multilevel strategy

• Then MG/Opt is guaranteed to converge (to a

stationary point)

Justification #3:

Ellipticity & Convexity


When will MG/Opt work well?

• convex ≈ elliptic ≈ positive definite ≈ “nice”

• The reduced Hessian will be positive (semi) definite at the solution

• Multigrid works well for elliptic PDEs• Optimization methods work well on

convex problems


A Sample Model Problem

• Match a target function u*:

• Where u (a) solves the 1-way wave eqn.:

• With• Computations use c = constant = 1

0

( ,0) ( )t xu cu

u x a x

0 1, 0 1x t

2 2

* *( , ( )) ( )x xf a u a u u u u


Model Problem: Wave Eqn.

• Hyperbolic equation• Initial value moves

without dissipation or dispersion

• Multigrid methods (applied to constraint alone) are not ideal: usual approach is to march forward in time

-5 -4 -3 -2 -1 0 1 2 3 4 5-2

-1

0

1

2Initial Solution

-5 -4 -3 -2 -1 0 1 2 3 4 5-2

-1

0

1

2Solution at t=1


Model Problem: Analysis of Continuous Problem

• The reduced Hessian is

• This is like the 1-dimensional Laplacian• Ideal for multigrid• Likely to cause difficulties for general-

purpose large-scale optimization methods • Analogous results for discretized model

problem

2

2

dI

dx

Justification #4:

Computational Performance

Model Problem:Computations



Model Problem: Computations (cont.)• # of design variables: n=1025 [1,051,650 total variables: n(n+1)]

n=1025

n=512 n=257 n=129 n=65 n=33

Optimization

it 99

ls 100

cg 967

Successive refinement

it 23 25 25 25 25 19

ls 100 26 26 26 26 20

cg 956 216 225 214 220 145

MG/Opt

it 10 12 14 16 18 232

ls 20 24 28 32 36 242

cg 57 65 79 89 112 1974


Choice of Comparative Algorithms

• Why only compare MG/Opt with traditional optimization algorithms (and not MG for systems of equations)? Inequality constraints may be present Optimality conditions not elliptic in

constrained case Hard to derive reduced Hessian/system

(thus hard to identify a good preconditioner) No obvious relationship between original

optimization model and reduced system

Justification #5:

Relation to other Optimization Methods


MG/Opt & Steepest Descent

• Coarse-level problem is a first-order approximation to the fine level problem

Gradient of coarse-level problem at aH =

IhH [gradient of fine-level problem at ah]

• Analogous to the first-order approximation used to derive the steepest-descent method


MG/Opt & Newton’s Method

• Multilevel line search: let Well-scaled search direction: Search direction of the form

• If subproblems solved accurately, then:

• Search direction is “Newton-like”

'(1) 0s h

h H He I e

)()( heafs

)(])([)1('322

HHHhHh

Hh

TH eOefIfIes


Outline



Scenario

• Apply algorithm (e.g., model management) Suppose that it does not work well

• Why not? Examine results of diagnostic tests Performed as part of optimization algorithm

– Diagnostic tests have low overhead– Analogous to condition-number estimators

• Now what options do you have? Manual versus automatic


Critical Condition

• Multilevel:

• Can be automatically guaranteed through additive (as here) or multiplicative corrections

• Convergence is guaranteed regardless of the quality of the approximate models


)()( )1()1( afIaIf hHh

Hhs


Sufficient to ConsiderFour Properties

• Nonlinearity• Model Consistency• Level Complementarity• Separability across Levels

Some assessment tests assume use of truncated-Newton method (TN) based on conjugate-gradient method (CG)

Some tests assume coarsening: ah → aH

Diagnostic Test #1: Nonlinearity

Is the optimization search direction well-scaled?


TN Search Directions

• Let • Line search: approximates • For the search directions from TN:

• Test: Is

)()( pafv

3)1(' pOv

)(min v

?0)1(' v

Diagnostic Test #2: Model Consistency

Compare predicted and actual reductions in the multilevel line search


Predicted & Actual Reduction

• Predicted reduction: reduction in coarse-level objective (via standard

optimization)

• Actual reduction: reduction in fine-level objective (via multilevel

line search)

• Difference between (scaled) actual & predicted:

3

02

02 ))(()()()(

2

1 HHhH

hh

ThH

HH

TH eOeIafIafe

consistency of problems nonlinearity

Diagnostic Test #3: Level Complementarity

Does the coarse level correspond to the near null space of the fine-level Hessian?


Algebraic Smoothness

• Optimizer: TN based on conjugate-gradient CG reduces error corresponding to large

eigenvalues on the fine level Complementary components correspond to

small eigenvalues (“near null space”)• Does the coarse level correspond to the

near null space of the reduced Hessian? Extend ideas from adaptive algebraic

multigrid for linear problems …


Near Null-Space

• The error in the design variables should lie in the near null-space of the reduced Hessian

• Generalized Rayleigh quotient should be small:

hT

hhh

hhT

hh

hhGG

GGRQ

)(

)()()(

2

Reduced Hessian(not known)

Error in design variables (not known)


Practical Test

• We must estimate: Norm of reduced Hessian (estimate via CG

method) Error in the design variables (use the multilevel

search direction)

• Test: Is small?

hT

hhh

hhT

hh

hh eeGT

eGeGeR

))((

)()()(

Multilevel search direction

Norm estimate from CG method

Matrix-vector product (as in TN)

)( hh eR

Diagnostic Test #4: Separability across Levels

Compare corresponding fine-level and coarse-level Hessian-vector products


Separability?

• Can the fine-level and coarse-level components of the solution be computed separately?

• How much do they interact?• Is the reduced Hessian (nearly) block

diagonal in terms of fine-level and coarse-level components?


Is it Possible to Test for Separability?

• How do you test for separability of the reduced Hessian when: You don’t compute the Hessian You can’t construct/analyze the Hessian You only have function & gradient values

and update & downdate operators

• Our test is based on Hessian-vector products: already estimated by TN


Rough Idea

• Write reduced Hessian in block form, based on high/low frequencies:

• Use “perfect” update/downdate operators:

• Compare coarse/fine Hessian-vector products:

HHHh

THhhhh

GG

GGG

HH

hHhh

Hh

HH

hH p

p

pIpI

ppI

)(

0

hHhhHh

Hh

hHh pGpIGpGI )()( )()( 0 if separable

Sufficiency of the Diagnostic Tests


Perturbation Analysis

• Apply MG/Opt to

• Assume user-supplied procedures are correct

• Assume nonlinearity test satisfied• Then MG/Opt solves a perturbed problem

• How large are the perturbations?

sorder term-higher)(min 21 h

Thh

hThh baaGaaF

)()(min 21 bbaaGGa h

Thh

hTh


Perturbation Analysis (cont.)

HHH

hhhH GGGGG 12

rmssimilar te)(1 hhhhhH bGGb

small: separability

small: model consistency

small: level complementarity


What if the diagnostic tests are not satisfied?

• Further analysis based on problem-specific techniques

• Nonlinearity Is it worthwhile to use a sophisticated optimization

method far from the solution?• Model Consistency

Over-coarsening? Programming errors?

• Level Complementarity Add or improve preconditioner

• Separability Use a different optimization method? Delay using multilevel until closer to solution of

optimization problem?

Precision of the Diagnostic Tests


Computational Tests

• Tests based on specified choices for the reduced Hessian

• Test problems chosen to isolate a particular property and measure sensitivity of diagnostic tests Multilevel already known to work well

• Ideal case: reduced Hessian is a discretized Laplacian

• Assume nonlinearity test satisfied: use quadratic optimization problems Nonlinearity test has been studied in other

contexts


Level Complementarity

• Laplacian versus Laplacian with permuted eigenvalues

• Satisfies separability and problem consistencynH nh Laplacian

Permuted Ratio

7 15 0.03 0.34 11.7

15 31 0.04 0.21 4.8

31 63 0.04 0.57 13.6

63 127 0.07 0.30 4.5

127 255 0.13 0.25 1.9

255 511 0.20 0.34 1.7

511 1023 0.26 0.77 3.0


Separability

• Diagonalize Laplacian:

• Test problems:

• R is random, with norm 1• Satisfies problem consistency and, for

small values of , level complementarity

T

H

hh VD

DVG

0

0

T

HT

hh VDR

RDVG

)(


Separability (cont.)


Model Consistency

• Test problems derived from discretized Laplacian

• Q is orthogonal

• R is random, with norm 1• Satisfies level complementarity and

separability

1))((, RIRIQQGQG HTH


Problem Consistency (cont.)


Outline



Related Research

• vast literature on multigrid methods for PDEs

• optimization-based multigrid methods Based on full approximation scheme

(Brandt, 1977) applied to optimality conditions for optimization model

Lewis & Nash (2005)– SIAM J. Sci. Comp., v. 26, pp. 1811-1837

• model management Alexandrov & Lewis (2001)

– Optimization and Engineering, v. 2, pp. 413-430



Related Research (cont.)

• diagnostic tests and related ideas for optimization-based multilevel methods

– Nash & Lewis (2008)- www.math.wm.edu/~buckaroo/pubs/LeNa08a.pdf

adaptive algebraic multigrid:– Brandt (1977)

- Math. Comp., v. 31, pp. 333-390– Brannick & Zikatanov (2006)

- Tech. Report, Penn. State University– Brezina, et al. (2006)

- SIAM J. Sci. Comp., v. 27, pp. 1261-1286.

stopping rules for inexact Newton methods: – Eisenstat & Walker (1996)

- SIAM J. Sci. Comp., v. 4, pp. 16-32

Questions?


Documents

Multilevel Optimization Methods for Engineering Design and PDE-Constrained Optimization Copyright, 1996 © Dale Carnegie & Associates, Inc. Stephen G. Nash