Intelligent Nonlinear Solvers for Computational Fluid Dynamics

Intelligent Nonlinear Solvers for Computational Fluid

Dynamics

Thomas M. Smith, ∗ Russell W. Hooper, † Curtis C. Ober ‡

and

Alfred A. Lorber §

Sandia National Laboratories¶, Albuquerque, New Mexico, 87185-0316, USA

Implicit nonlinear solvers for solving systems of nonlinear PDEs are very powerful.Many compressible flow codes utilize Newton-Krylov (NK) methods and matrix-free Newton-Krylov (MFNK) methods for a range of flow regimes and different flow models such asinviscid, laminar, turbulent and reacting flows. One drawback is that these solvers arecomplex requiring the specification of many settings. Expertise is necessary to achievehigh performance. There is a need to develop ”intelligent nonlinear solvers” that are ca-pable of changing settings dynamically and adapting to evolving solutions and changingsolver performance, in order to reduce the burden on the user, and improve overall effi-ciency and reliability. In this paper we take the first steps in achieving automatic controlof nonlinear solvers for compressible flows by combining semi- and fully- implicit solverstrategies in ways that utilizes them more efficiently than simply applying one method oranother during the entire solution procedure. The understanding gained from this workwill lay the groundwork for future development of more autonomous ”intelligent solvers”.

I. Introduction

Implicit solvers are widely used to in compuational fluid dynamic applications to obtain steady-statesolutions to the equations governing fluid flow. Semi-implicit (point-implicit) methods are one of the mostcommon. Semi-implicit methods are relatively easy to implement, have low memory requirements and canmarch at large time step sizes compared to explicit methods. Semi-implicit iterations are only modestlymore expensive than explicit iterations and tend to converge linearly. They are also robust in the sense thatthey are relatively easy to use. However, convergence ”stalling” can be a problem in certain circumstances.

In recent years, Newton-Krylov (NK) methods are becoming more popular. NK methods are less straightforward to use and more expensive per iteration than semi-implicit methods. However, NK methods are veryefficient for as the solution is approached in an iterative sense, quadratic convergence rates can be achieved.Very large time steps can be used to advance the solution to steady-state. They are also robust due to theeffectiveness of Krylov subspace iterative linear solvers.

In both methods, solutions are achieved iteratively by solving a series of nonlinear problems where thesystem equations are linearized and then solved with an iterative linear solver. Semi-implicit solvers combinethe nonlinear and linear loops together, solving a modified linear system less accurately but more cheaply.Typically, semi-implicit solvers are cheaper than NK methods in the beginning when the CFL is small andthe linear systems are dominated by a large diagonal inertia term. Later, as the inertia term becomes smaller,the linear problem becomes more difficult to solve and NK methods become more efficient. Therefore, itwould make sense to combine these approaches in a single solver stategy.

∗Computational Sciences, Senior Member of the Technical Staff, Exploratory Simulation Technologies, AIAA Member†Applied Computational Methods‡Exploratory Simulation Technologies§Thermal/Fluid Computational Engineering Sciences¶Sandia is a multiprogram laboratory operated by Sandia Corporation, A Lockheed Martin Company, for the United States

Department of Energy under Contract DE-AC04-94AL85000. This material is declared a work of the U.S. Government and isnot subject to copyright protection in the United States.

1 of 14

American Institute of Aeronautics and Astronautics

NK methods have many solver settings that must be ”tuned” in order for the solver to perform optimally.These settings are problem dependent and sometimes non-intuitive. Prescribing these settings requiresexpert knowledge and can be intimidating to new application code users. In addition, static settings maynot be the most efficient or even adequate to ensure convergence to a solution. One reason is that as thesolution evolves iteratively, the nonlinear behavior changes. A good example of this can be found in shockcapturing problems. The steady-state solution is obtained by time marching from initial to final steady-state.Typically, simulations are initialized with uniform flow conditions. The shock wave initially forms on thebody leading edge and propagates upstream until an equilibrium position is reached. This transient behaviorprevents rapid convergence until the final shock location is reached, at which time the solution convergencecan progress rapidly. The most effective solver setting values may change during the solution procedure.Therefore, it is not possible to prescribe static settings from an input file and expect to achieve the highestperformance from the solver.

Based on this discussion, there is a need to develop ”intelligent nonlinear solvers” that can reduce theburden on users of prescribing solver settings, and can automatically modify these settings in order to improveoverall robustness and performance. It is envisioned that a controller would monitor solver diagnostics suchas nonlinear convergence rate, number of linear iterations per nonlinear iteration step and time step sizeand based upon expertise encoded in the controller, modify the appropriate solver settings and continue theprocess. A controller using expertise encoded into the software would rapidly process the diagnostic data toproduce an appropriate response.

The approach we are taking does not rely on traditional control theory where a mathematical model isderived that can predict output response from given input. Instead we propose to use control theory whichcan process vague or imprecise input data and produce an effective output response. Unlike traditionallinear control theory, no mathematical model exists for the nonlinear solver which is viewed as a dynamicalsystem. Instead expert knowledge is encoded into the solver controller in the form of rules that take theform of IF-THEN-ELSE constructs.1 Fuzzy logic is a good example of this type of control theory.1

The approach that we will take contains a four step iterative process. First we identify a solver setting ormultiple settings that we want to dynamically control and write the software infrastructure that will allowthose particular parameters to be dynamically modified. In other words, most solver settings are staticallyset once in the input file, not expected to change. Several examples are worth noting.

1) Adaptive removal/addition of the mass terms in the Jacobian and preconditioner matrix. When thesolution is close to steady-state, removal of the mass term can greatly speed convergence. Removing themass term too soon can be catastrophic. Manzano et al.2 used this strategy as a start-up procedure.

2) Adaptive solver algorithms. Using explicit, and semi-implicit solver algorithms to get close to thefinal solution followed by NK in the final stages can be much cheaper than using NK for the entire solutionprocedure.

3) Adaptive Jacobian recompute frequency. Computing and factorizing the Jacobian matrix each non-linear iteration is not necessary in the initial stages of the solution procedure when the time step size issmall and the linear solves are relatively easy. Recomputing the preconditioner every nonlinear iteration issometimes necessary in the final stages. A single static frequency used throughout is sub-optimal.

4) Solver startup strategies. In some situations, a lower fidelity discretization can be used to obtaina solution and then be used as the initial conditions for a higher-order solution. This is accomplished bychanging the order of spatial accuracy of the discretization.

The second step involves an audit of operator costs. This simply means computing the cost of evaluatingoperators such as; residual function, preconditioning matrix, factorizing the matrix, a matrix-free Kryloviteration or a matrix-vector product. This data will be used to do a cost analysis and guide the developmentof rules used by the controller to make decisions on how best to change settings.

The third step is to quantify setting sensitivities to the solution convergence. This is where expert knowl-edge is introduced into the process. By choosing a representative set of problems and quantifying the changein convergence with change in solver setting, ”knowledge” is gained that will be useful in developing rulesused by the controller for making decisions. These rules may take the form of IF-THEN-ELSE constructs.

The fourth step is to put it all together. Using the cost analysis data and sensitivity data as input, asingle setting controller component will be designed and implemented that outputs a new solver setting.The controller will be based upon the rules generated in step three. Once successful control has beendemonstrated, the process can be repeated for the next setting of interest.

The paper is organized as follows; first the governing equations for compressible fluid flow are presented

2 of 14


followed by a brief description of the finite-volume spatial discretization. Next, time marching strategies arepresented with an emphasis on NK and semi-implicit methods. Candidate solver algorithms and settingsthat arise in the discussion will be identified. Next, results that illustrate the benefit from using adaptivesolver strategies will be presented. Finally, conclusions will be made.

II. Flow Model

A. Governing Equations

We seek solutions to the Euler and Navier-Stokes equations that govern fluid flow. In integral form thegoverning equations can be written as

∫

V

∂W

∂tdV +

∮

Γ

(Fc − Fv) · n dΓ = 0 (1)

where W = {ρ, ρu, ρv, ρw, ρE}T is the vector of conserved variables, V represents the fluid domain and Γits boundary. The inviscid flux vector is;

Fc · n =

(ρui)ni

(ρuiuj + pδij)nj

((ρE + p)ui)ni

the viscous flux vector is;

Fv · n =

0

τijnj

(−qi + ujτij)ni

.

In the above equations, ρ is the density; ui (u = u1, v = u2, w = u3) which can be written u are velocitycomponents in each Cartesian direction xi (i = 1, 2, 3 and j = 1, 2, 3 is implied); p is the pressure defined bythe ideal gas equation of state, p = ρRT ; R is the gas constant; E is the total energy per unit mass, definedas E = e + (u2 + v2 + w2)/2; e is the specific internal energy, defined as e = p

(γ−1)ρ ; γ is the ratio of specific

heats; T is the temperature; ni = n, is the outward pointing surface normal vector, and δij is the Kroneckerdelta. We also define a the primitive state vector as U = {ρ, u, v, w, p}T .

The viscous stress tensor is given by,

τij = µ

(

∂ui

∂xj+

∂uj

∂xi

)

−2

3µ

∂uk

∂xkδij (2)

where µ is the dynamic coefficient of viscosity. The heat flux vector is assumed to follow Fourier’s Law,

qi = −κ∂T

∂xi(3)

where κ is the thermal conductivity.On a solid impermeable surface, in the case of inviscid flow, the velocity vector must be tangent to the

surface, u · n = 0 and there is no mass or heat flux crossing the boundary and so the flux function becomes,

Fb = F · nb =

0

pnb

0

.

On solid no-slip boundaries, u = 0 is enforced strongly. In addition, if the surface temperature is specified, aboundary temperature is enforced strongly. The adiabatic condition is implemented as a zero flux condition.Far-field boundary conditions are defined by specifying an infinity state U∞ that is applied as the right statein the Riemann solver resulting in a boundary flux.

3 of 14


B. Edge-Based Finite Volume Spatial Discretization

The compressible flow code, Premo, uses the SIERRA framework,3 which provides common services toapplication codes such as user-input parsing, bulk mesh-data input/output, interfaces to linear and nonlinearsolvers, inter-mesh transfers, parallel communications, dynamic load balancing, and mesh adaptivity.

Premo is a node-centered, edge-based, finite-volume, algorithm similar to the formulations of Haselbacherand Blazek4 and Luo et al.5

Control volumes and surfaces are defined by the median dual of the primitive finite-element mesh.This construction results in non-overlapping, space-filling, control volumes that are associated with nodes.Control-surface area vectors are area-weighted averages and are stored on the edges.

Equation 1 must hold for each control volume. If we define volume-averaged values of the conserved statevector (W), and apply the midpoint rule for integration on surfaces, we can write the semi-discrete form ofeq. (1) as,

∂WI

∂t+

1

δVI

∑

J∈NE(I)

(FcIJ − Fv

IJ )|δΓIJ | = 0 (4)

∂WbI

∂t+

1

δVI

∑

J∈NE(I)

(FcIJ − Fv

IJ)|δΓIJ | + Fb|δΓbI | = 0 (5)

where I is the current node (i.e., control volume), δVI is the control volume, NE(I) is the set of nodesconnected to node I by edge IJ , J is the second node on the edge, |δΓIJ | is the magnitude of the control-surface area, Wb

Iare boundary nodes and Fb|δΓb

I | represents boundary fluxes. The convective and viscousfluxes are evaluated at the edge midpoint and depend on the reconstructed data, normal vector and nodalgradients; Fc

IJ = F(U+I ,U−

J ,nIJ ), FvIJ = F(UI ,∇UI ,UJ ,∇UJ ,nIJ). Details of the reconstruction of

interface data is given the Appendix.

III. Nonlinear Solution Strategies

We seek a steady-state solution for eq. (1) for which the time derivative term is zero. Spatial discretizationof eq. (1) leads to a system of nonlinear algebraic equations represented by

F(W) =∂W

∂t+ R(W) = 0 (6)

where W is the vector of unknowns, and R(W) is the spatial residual equations which have no explicit timederivative dependencies. At steady-state R(W) = 0. This is achieved by solving a sequence of nonlinearproblems of the form of eq. (6) in which the stabilizing term represented by ∂W

∂t is gradually removed.

A. Pseudo-Transient Newton-Krylov Based Solver

This technique is a variation of the standard Newton’s method. Consider an incremental update to the statevector,

∆Wn = Wn+1 − Wn . (7)

where n is the nonlinear iteration/time step. Linearizing R about the current state and rearranging yieldsa pseudo-transient variation to Newton’s method,

(

1

∆t+

∂Rn

∂W

)

∆Wn = −Rn (8)

which can be written asJ∆Wn = −Rn . (9)

This form is used extensively in CFD codes where the governing equations are advection dominated andtime marching must be used to solve them.

We choose to increase the pseudo-time step at each step as follows:

∆tn = fR∆tn−1 or ∆tn = fA||R||n−2

||R||n−1∆tn−1 (10)

4 of 14


where fR are fA are ramp and adaptive CFL factors. For some problems with complex physics, e.g.,shocks, turbulence, etc., use of a monotonically increasing ∆t can lead to catastrophic failures such as NaN,resulting from the nonlinear solver attempting to sample inadmissible solution states (i.e. negative density).The nonlinear solver library NOX provides checks for such failures and allows the application code, Premo,opportunity to recover from these without terminating the simulation. In this work, we choose to simplyreduce the pseudo-time step by half and resume the solution process with the last ”good” solution vector.Hence, for catastrophic failures,

∆tn = 0.5∆tn−1 (11)

Wn = Wn−1 . (12)

Both of these techniques, acceleration of the time step size and fault tolerance are widely used in the CFDcommunity. These are examples of adaptive methods and have a great impact on solver performance,robustness and usability.

Iterations are continued until a user-specified convergence is achieved,

||Rn|| ≤ εa, ||Rn||/||R0|| ≤ εb, ||∆Wn|| ≤ εc . (13)

Each nonlinear iteration of eq. (8) requires a linear system of equations to be solved. Inexact Newton’smethods are commonly used with iterative solvers. In this case the each linear sub-problem is solved suchthat the inequality

||J∆W + R(W)||2 ≤ η||R(W)||2 (14)

is satisfied. We note that research6 has been conducted in the area of adaptively changing η with successivenonlinear iterations. These techniques are available in NOX, however, limited experimentation with thesetechniques for the class of problems of interest has not shown any advantage in using them.

The work presented here is based on solving eq. (8) using the iterative method GMRES7 and variousmeans of preconditioning. We select GMRES in order to accommodate non-symmetric Jacobian operatorsarising from up-winding methods.

The matrix-free operator is used exclusively as a Jacobian operator and is based on the matrix-freeNewton-Krylov (MFNK) ideas described by Brown and Saad8 and Knoll and Keyes.9 Linear solves involverepeated action of the Jacobian on a vector, e.g., p, (used to generate each Krylov vector), which can beapproximated as follows:

Jp ≈F(W + εp) − F(W)

ε. (15)

Right-preconditioning7 is defined as follows,

JM−1y = −R (16)

M∆W = y (17)

where M is the preconditioning operator. When preconditioning is applied to solving eq. (8) using thematrix-free Jacobian operator, eq. (15) takes the modified form

JM−1p ≈F(W + εM−1p) − F(W)

ε(18)

and each finite-difference based matrix-vector operation now requires application of the preconditioning op-erator M. For this work, we use incomplete LU factorization (ILU(#)) of the matrix M as the preconditionerfor the iterative linear solver. The preconditioner matrix is assembled based on the edge-graph taking thepartial derivatives of the first-order advection operator, (eq. 35) and viscous fluxes based on edge differ-ences (eq. 40). A summary of how the Jacobian is formed can be found in the Appendix and in Smith et

al.10 where performance and robustness comparisons of several different approximations to the Jacobianare presented. Matrix assembly and LU factorization account for a significant fraction of the cpu time ineach nonlinear iteration. However early in most simulations, the matrix is dominated by a large mass termwhich is changing relatively slowly - late in the simulation, the solution is changing gradually and againthe matrix may be changing slowly. Therefore, it is reasonable to expect that the preconditioner could bereused in successive nonlinear iterations to reduce cpu time. This is indeed the case. The goal in adaptive

5 of 14


solver development would be to try and predict the ”best” recompute frequency based upon available solverperformance data.

Both GMRES and application of the preconditioner are performed using the AztecOO linear solverlibrary11 via an interface provided by NOX. Efficient testing of the many Jacobian/preconditioner operatorchoices (along with associated options) is enabled by interfacing the compressible flow code, Premo, to alibrary of nonlinear solver algorithms, NOX,12 currently being developed at Sandia National Laboratories.

B. Semi-Implicit Solvers

Semi-implicit nonlinear solvers designed to achieve steady-state solutions are widely used by the CFD com-munity. Memory requirements are less than NK methods because only the diagonal block matrix is storedand there are no Krylov sub-space vectors to store. Large CFL numbers can be used. The linear system issolved by successive ”sweeps” over the control volumes. The cost of each sweep is comparable to an explicittime step using a Runge-Kutta integration method.

Chen and Wang13 and Sharov et al.14 present a symmetric Gauss-Seidel solver for steady-state solutionsfor unstructured-mesh solvers. With reference to eqs. (8-9), the linearized system is factored into the strictlyupper, strictly lower and block diagonal matrices

J = L + U + D (19)

where J is the Jacobian. Factorization of J and rearrangement of eq. 9 results in,

(D + L)D−1(D + U)∆W = −R + (LD−1U)∆W . (20)

Neglecting the second term on the right-hand-side, this modified system is solved by symmetric forward andbackward sweeps.15 Starting with the guess ∆W = 0 for the k = 0 iteration, ∆W is updated by at eachnode by a (forward sweep):

∆Wk+1/2 = D−1[−R − L∆Wk+1/2 − U∆Wk] (21)

followed by a (backward sweep):

∆Wk+1 = D−1[−R − U∆Wk+1 − L∆Wk+1/2] . (22)

Forward sweeps loop over all nodes starting from node 1 to node N - backward sweeps loop node N to1, where N is the number of nodes. Iterations are terminated when a specified number of iterations (kmax)have been completed or when the norm tolerance

||∆Wk − ∆Wk−1||

||∆W0||≤ ε (23)

is satisfied. If a single iteration is requested, the resulting scheme is referred to as LU-SGS by Sharov et

al.14 and in this case, forward and backward sweeps are simplified.The block diagonal matrix operator is formed by assembly of the edge-based and node-based Jacobian of

the advection scheme or the combination of the flux Jacobian and the spectral radius of the the flux Jacobianas advocated by Sharov et al.14 In the case of the latter, the edge contributions result in a pure diagonal,however, the inclusion of boundary conditions results in a block diagonal. A block diagonal is thereforealways assumed in the current implementation. The off-diagonal matrix-vector products are formed byeither using the Jacobian of the advection scheme or the combination of flux Jacobian and spectral radiusand is implemented either analytically or in a matrix-free form13, 14 . The LU-SGS solver is also useful,applied as a preconditioner in GMRES in the MFNK method15 replacing ILU.

IV. Results

In this section we present results of several solver strategies for solving inviscid and viscous flows. Theseresults can be considered as steps 1-3 discussed earlier. Eight different nonlinear solver strategies have beentested and will be illustrated by solving three supersonic flow problem; Mach 2 flow over a NACA0012 airfoil,Mach 3 inviscid flow over a blunt-wedge configuration, and finally Mach 3 laminar flow over a blunt-wedge.The solver strategies can be summarized as follows;

6 of 14


• MFNK/ILU(0) - This is the baseline solver. It uses the matrix-free Jacobian operator, and ILU(0)incomplete factorization of the edge-based global matrix.

• SI (10/.01) - This is the baseline semi-implicit solver, where ten inner iterations or a inner normreduction of 0.01 is specified.

• S1 - Initially, the SI(10/.01) solver is used until a specified norm reduction ||R||/||R||0 is reached, thenthe solver switches to MFNK/ILU(0) with η = 0.01 and runs to completion.

• S2 - Initially, the MFNK/LUSGS(2/.01), η = 0.1 (where 2 sweeps are requested) is run, the at aspecified norm reduction, the solver switches to MFNK/ILU(0), η = 0.01 for the completion of therun.

• S3 - The same as S2 except that η = 0.1 is used throughout, and switching is based on the number oflinear iterations. If more than the specified number of iterations is taken, the solver switches to ILU(0)and if fewer than the specified number are taken, the solver switches back to LUSGS(2/.01). A buffervalue defines a small overlap so the solver does not switch too often. The number of iterations is basedon the cost of the LUSGS operator vs. the cost of assembling the matrix and factorizing it. At presentnumber is determined prior to the simulation by running both preconditioners separately.

• S4 - The solver uses MFNK/ILU(0) , η = 0.1 but recomputes the preconditioner based on a specifiediteration frequency (MPF), typically 5-10 iterations.

• S5 - The solver begins by using the edge matrix as the Jacobian operator with ILU(0) (referred to asdeferred correction, NK/ILU(0)) and then switches to MFNK/ILU(0)η = 0.01 when a specified normreduction is achieved.

• S6 - The solver begins with first-order spatial discretization, using the edge matrix which is consistentwith first-order discretization, for the Jacobian operator and ILU(0) preconditioning, and then switchesto higher-order discretization and MFNK/ILU(0) η = 0.01 preconditioning.

Strategies S1, S2, S5 and S6 are triggered with a specified binary switch. S3 is triggered by the numberof linear iterations required for the particular nonlinear step, S4 is a static setting. There are many morestrategies that could be realized. These were chosen based on experience running the MFNK solver on avariety of different problems.

A. Supersonic Inviscid Flow

The first problem deals with two-dimensional Mach 2 inviscid flow over a NACA0012 airfoil at zero degreesangle of attack. An O-grid (129 × 33 × 2) is used for these simulations. In each simulation a steady-statesolution is sought and the solution is assumed to be converged when the L2 norm of the entire residual vectoris reduced by 10−8 relative to the initial norm. Local time stepping is used and the CFL varies from 1 to5000 using the adaptive formula eq. 10, with the parameter given in tables 1 and 2. Mach number contoursfor a typical steady-state solution are shown in figure 1. Nonlinear convergence behavior for the nonlinearsolvers including the two baseline solvers and the six other strategies are shown in figure 2. Convergencetimes show a slight sensitivity to the adaptive CFL parameter. Convergence times are also slightly sensitiveto η which is not presented. It is not the intention to find an optimal set of parameters for each solverstrategy, instead efforts were made to keep the solver parameters as consistent as possible so that differencesin run times would reflect on the behavior of the strategies themselves. The S4 summaries are presentedalong with the MFNK baseline summaries in table 1 because the S4 solver is very similar to the baselinesolver. Though run times vary with MPF and fA, clearly this strategy represents a boost in performance.

The most dramatic boost in performance is realized in strategy S1. This strategy was used by Nielsenet al.16 in a similar context, however, their formulatoin of the semi-implicit solver was different. In thiscase, running the semi-implicit solver for just 10−2 reduction in the residual norm and then switching tothe baseline solver, produces nearly a factor of 3 speedup! It would be reasonable to expect the run timesfor strategy S1 to lie somewhere in between the two baseline solver run times since the initial solver in S1is exactly the same as SI base1 and the final solver in S1 is exactly the same as MFNK/ILU(0) base1, butinstead, the run time for S1 is 36% of MFNK/ILU base1 and 29% of SI base1.

7 of 14


Figure 1. Mach contours for inviscid flow around a NACA0012 airfoil.

Table 1. Inviscid flow over NACA0012 airfoil at Ma=2.0.

Strategy fA MPF CPU

sec.

MFNK/ILU(0) base1 1.025 1 305

MFNK/ILU(0) base2 1.05 1 292

SI base1 1.025 378

SI base2 1.05 389

S4 5a 1.025 5 259

S4 5b 1.05 5 218

S4 5c 1.1 5 253

S4 10 1.2 10 249

Table 2. Inviscid flow over a NACA0012 airfoil, Ma=2.0.

Strategy comment switch CPU

sec.

S1 2 SI(10/.01)-MFNK/ILU(0) 10−2 111

S1 4 SI(10/.01)-MFNK/ILU(0) 10−4 137

S1 6 SI(10/.01)-MFNK/ILU(0) 10−6 248

S2 MFNK/LUSGS(2)-MFNK/ILU(0) 10−4 308

S3 MFNK/LUSGS(2)-MFNK/ILU(0) 3 its. 403

S5 NK/ILU(0)-MFNK/ILU(0) 10−4 211

S6 1st-2nd, NK/ILU(0)-MFNK/ILU(0) 10−4 279

NK/ILU(0)-MFNK/ILU(0)

8 of 14


0 100 200 300 400

Nonlinear Iterations

10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

||R

||/|

|R||

0

MFNK/ILU(0), FA=1.025


SI(10/.01), FA=1.025

SI(10/.01), FA=1.05

S4_5b, MPF=5, FA=1.05

(a) Nonlinear convergence behavior for; MFNK base, SIbase, and S4 strategies.

0 100 200 300 400


10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

||R

||/|

|R||

0


SI(10/.01), FA=1.025

S1_2, FA=1.025

S1_4, FA=1.025

S1_6, FA=1.025

(b) Nonlinear convergence behavior for S1 strategies.

0 100 200 300 400


10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

||R

||/|

|R||

0


S1_4

S2, LUSGS(2)/ILU(0)

S3, LUSGS(2)/ILU(0), 3 its.

S4_5b, MPF=5, FA=1.05

S5, NK/MFNK, ILU(0)

S6, 1st/2nd, NK/MFNK

SI(10/.01), FA=1.025

(c) Nonlinear convergence behavior for S1-S6 strategies.

Figure 2. Inviscid flow over a NACA0012 airfoil at Ma=2.0.

9 of 14


B. Supersonic Blunt Wedge

The second problem deals with supersonic flow over a blunt wedge shape with nose radius 0.0088 m andwedge angle 5.25 degrees. The inviscid case corresponds to Mach 3 conditions at an altitude of 8 km. Thetemperature was 236.215 K and the pressure was 3.56516× 104 Pa. The laminar flow problem correspondsto Mach 3 conditions at an altitude of 40 km and a Reynolds number, Rex = 2.4 × 105. The free streamtemperature and pressure for the laminar case was 250.35 K, and 2.87144 × 102 Pa respectively, and anisothermal boundary condition was applied to the blunt-wedge and the surface temperature was 300 K.Sutherland’s law was assumed for viscosity and a constant Prandtl number Pr=.72 was used. The mesh forboth the inviscid and laminar simulations was the same, (81 × 75 × 3). Local time stepping was used andthe CFL varies from 1 to 105 using the adaptive formula eq. 10, with fA = 1.01.

For these simulations eigenvalue smoothing was added to the Roe flux function to reduce pressure stairstepping in the vicinity of the shock. The formulation follows Peery and one to 105. Imlay.17

Pressure contours from the solution of the inviscid and laminar cases appear in figure 3. The pressurecontours show very similar patterns dominated by the presence of the shock wave. The flow is initializedwith uniform flow. A detached bow shock forms at the leading edge of the blunt wedge and propagates upstream until the equilibrium location is reached. This highly transient nonlinear behavior prevents rapidconvergence of the solution until the shock settles down to its final location. This behavior explains why theMFNK method performs poorly by itself. Only small incremental steps may be taken while the shock is intransit. However, each of these small steps are quite expensive compared to the relatively cheap semi-implicititerations. Once the shock wave settles down though, MFNK with its superior convergence behavior quicklyreaches a solution.

Run times for the base line MFNK and semi-implicit solver are compared to the S1 strategy, with differentswitch switch values, for both the inviscid and laminar problem in tables 3 and 4. Nonlinear convergencebehaviors are presented in figure 4. Just as in the previous case, a dramatic boost in solver performance isrealized by using strategy S1 for both the inviscid and laminar viscous cases.

Table 3. Inviscid flow over blunt wedge with Ma=3.0.

Strategy CPU

sec.

MFNK/ILU(0) 3,140

SI 2,776

S1 2 678

S1 3 553

S1 4 583

Table 4. Laminar flow over blunt wedge with Ma=3.0 and Rex=2.4 × 105.

Strategy CPU

sec.

MFNK/ILU(0) 2,837

SI 7,835

S1 2 1,042

S1 3 1,085

S1 4 1,292

10 of 14


(a) Inviscid flow values range from40,000 to 420,000.

(b) Laminar flow values rangefrom 400 to 3,400.

Figure 3. Pressure contours for blunt wedge flow.

0 100 200 300 400 500 600 700 800 900 1000


10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

||R

||/|

|R||

0

MFNK/ILU(0)

SI(10/.01)

S1_2

S1_3

S1_4

(a) Inviscid flow: MFNK base, SI base, and S1 strategies.

0 250 500 750 1000 1250 1500


10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

||R

||/|

|R||

0

MFNK/ILU(0)

SI(10/.01)

S1_2

S1_3

S1_4

(b) Laminar flow: MFNK base, SI base, and S1 strate-gies.

Figure 4. Nonlinear convergence behavior for supersonic blunt wedge flow.

11 of 14


V. Conclusions

In this work, we have set out to design and develop intelligent nonlinear solvers with the goal in mind ofusing established algorithms and libraries in new ways and combinations to improve upon the performanceand reliability of the solution methods. We have identified several solver strategies that hold promise anddemonstrated their potential on several shock capturing problems. While no claim of optimality can be made,and minimal ”tuning” was involved, dramatic performance increases have been realized when compared tothe two baseline solvers. The next step toward reaching our goal is to start making these solvers moreautonomous.

VI. Appendix

First-order accuracy is achieved by assuming a piecewise constant state vector within control volumes.Interface state values are taken as node values themselves,

U+I ≡ UI (24)

U−J ≡ UJ . (25)

For higher-order spatial accuracy, a piecewise linear variation within cells is assumed. Reconstruction ofthe interface states (+,-) use a multi-dimensional MUSCL extrapolation1819 along with the average nodalgradient. The Green-Gauss volume-averaged gradient is obtained by solving the surface integral

∫

V

∇φdV = δV∇φ =

∮

Γ

φn dΓ .

The discrete form is given by the edge-based formulation, including boundary face contributions,

∇φI ≈1

δVI

∑

J∈NE(I)

(

φI + φJ

2

)

nIJ |δΓIJ | (26)

∇φbI ≈

1

δVI

∑

J∈NE(I)

(

φbI + φJ

2

)

nIJ |δΓIJ | + φbIn

bI

∣

∣δΓbI

∣

∣ . (27)

where ∇φbI is on a boundary node. An alternative to Green-Gauss construction is weighted least-squares.

Haselbacher4 has summarized the least-squares formulation for arbitrary meshes. The three components ofthe gradient at each node I, can be written as a weighted sum of differences between edge IJ state data;

(φx)I =∑

J∈N(I)

ωxIJ (φJ − φI) (28)

(φy)I =∑

J∈N(I)

ωyIJ (φJ − φI) (29)

(φz)I =∑

J∈N(I)

ωzIJ (φJ − φI) (30)

where ∇φI = {φx, φy, φz}TI denote Cartesian derivative components and (ωx

IJ , ωyIJ , ωz

IJ) denote Cartesiancomponent weights. The gradients require that six weights be stored on each node and are assembled bylooping over all edges including virtual edges as defined by Haselbacher.4

Left (+) and right(-) interface states are computed using MUSCL extrapolation,

U+I = UI + Φ(∆+

I , ∆−I )

(

∇UI ·∆r

2

)

(31)

U−J = UJ − Φ(∆+

J , ∆−J )

(

∇UJ ·∆r

2

)

.

12 of 14


where ∆r = xJ − xI is a displacement vector, and Φ = [0, 1] is a gradient limiter. Gradients are limited bythe van Albada limiter, similar to that used by Luo et al.5 which is computed locally on each edge. The vanAlbada limiter is written as,

Φ(a, b) =ab + |ab| + ε

a2 + b2 + ε(32)

and the arguments are defined by,

∆+I = UJ − UI (33)

∆−I = 2∇UI · ∆r − (UJ − UI)

∆+J = 2∇UJ · ∆r − (UJ − UI)

∆−J = UJ − UI .

The advection scheme that Premo currently uses is the approximate Riemann solver of Roe20 with theentropy fix of Liou and van Leer21 and the implementation closely follows that of Whitaker.22

The Roe flux function can be written as,

FcIJ =

1

2

(

F(W+I ,nIJ) + F(W−

J ,nIJ) − |A|(W−J − W+

I ))

|δΓIJ | (34)

where A = ∂F∂W is the flux Jacobian matrix and |A| = |A(W+

I , W−J ,nIJ)| is evaluated using average states

(·) defined by Roe.20

In the next section, steady-state solution strategies will be developed which require the constructionof Jacobian matrix. Linearization of the advective flux function with respect to the conservative variablesincluding only edge node data (first-order) is evaluated using automatic differentiation (AD)23, 24 or a localedge-based finite difference, and can be represented as,

∂FI

∂WI= I

δV

∆t+

1

2

(

A(WI ,nIJ ) + |A| −∂|A|

∂WI(WJ − WI)

)

|δΓIJ |

∂FI

∂WJ=

1

2

(

A(WJ ,nIJ) − |A| −∂|A|

∂WJ(WJ − WI)

)

|δΓIJ | . (35)

The Jacobian operators in the semi-implicit solver are based upon a modified flux function, here referred toas ”symmetric dissipation” (SD), that was suggested by Luo et al.15 Here, artificial viscosity is based onthe magnitude of the spectral radius of the flux Jacobian,

FIJ =1

2(F (WI ,nIJ ) + F (WJ ,nIJ ) − |λIJ |(WJ − WI)) |δΓIJ | . (36)

The Jacobian terms are,

∂FSD

I

∂WI= I

δV

∆t+

1

2(A(WI ,nIJ) + |λIJ |I) |δΓIJ |

∂FSD

I

∂WJ=

1

2(A(WJ ,nIJ) − |λIJ |I) |δΓIJ | (37)

where,

|λIJ | = |uIJ · nIJ | + cIJ +2µ

ρ|nIJ · ∆rIJ |(38)

is the spectral radius of the flux Jacobian matrix and I is the identity matrix.Evaluating the viscous flux vector requires reconstructing primitive variable gradients at edge midpoints.

Gradients are reconstructed by blending edge node differences with nodal gradients,

∇UIJ =(∇UI + ∇UJ)

2+

(

UJ − UI −(∇UI + ∇UJ )

2· ∆rIJ

)

∆rIJ

|∆rIJ |2. (39)

Viscosity, conductivity and velocity components appearing in the energy dissipation terms are evaluatedat the I and J nodes and then averaged. The support for these gradients include distance-two nodes.

13 of 14


Therefore, exact discrete Jacobians cannot be contained on an edge graph. Instead, only the node differencesare included in the analytic representation of the Jacobian terms,

∇UIJ ≈ (UJ − UI)∆rIJ

|∆rIJ |2. (40)

This results in a very efficient algorithm for evaluating the terms but is consistent with eq. 39 only onCartesian grids.

In all cases, whether it be the evaluation the spatial residual equations eq. 4, or the nodal gradients oran approximation to the Jacobian operator, assembly only looping over edges and nodes. Boundary fluxesare assembled by looping over exterior faces of the primitive finite element mesh. Additional details can befound in Smith et al.10

References

1Kosko, B., Fuzzy Engineering , Prentice-Hall, Engelwood Cliffs, NJ, 1997.2Manzano, L., Lassaline, J., Wong, P., and Zingg, D., “A Newton-Krylov Algoritm for the Euler Equations Using Un-

structured Grids,” AIAA Paper 2003-0274 , January 2003.3Edwards, H. C. and Stewart, J. R., “SIERRA : A Software Environment for Developing Complex Multiphysics Appli-

cations,” Proceedings of First MIT Conference on Computational Fluid and Solid Mechanics, edited by K. J. Bathe, ElsevierScientific, 2001.

4Haselbacher, A. and Blazek, J., “Accurate and Efficient discretization of Navier-Stokes Equations on Mixed Grids,” AIAA

Journal , Vol. 38, No. 11, November 2000, pp. 2094–2102.5Luo, H., Baum, J. D., and Lohner, R., “Edge-Based Finite Element scheme for the Euler Equations,” AIAA Journal ,

Vol. 32, No. 6, June 1994, pp. 1180–1190.6Eisenstat, S. and Walker, J., “Choosing the forcing terms in an inexact Newton Method,” SIAM Journal of Scientific

Computing , Vol. 17, 1996, pp. 16–32.7Saad, Y. and Schultz, M., “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,”

SIAM Journal of Scientific and Statistical Computing , Vol. 7, 1986, pp. 856–869.8Brown, P. and Saad, Y., “Hybrid Krylov methods for nonlinear systems of equations,” SIAM Journal of Statistical

Computing , Vol. 11, 1990, pp. 450–481.9Knoll, D. and Keyes, D., “Jacobian-Free Newton-Krylov methods: a survey of approaches and applications,” Journal of

Computational Physics, Vol. 193, 2004, pp. 357–397.10Smith, T. M., Hooper, R. W., Ober, C. C., Lorber, A. A., and Shadid, J. N., “Comparison of Operators for Newton-Krylov

Method for Solving Compressible Flows on Unstructured Meshes,” AIAA Paper 2004-0743 , January 2004.11“Trilinos/AztecOO: Object-Oriented Aztec Linear Solver Package,” http://software.sandia.gov/Trilinos/doc

/aztecoo/doc/html/index.html.12“NOX & LOCA: Object-Oriented Nonlinear Solver And Continuation Packages,” http://software.sandia.gov/nox/.13Chen, R. and Wang, Z., “Fast, Block Lower-Upper Symmetric Gauss-Seidel Scheme for Arbitrary Grids,” AIAA Journal ,

Vol. 38, No. 12, 2000, pp. 2238–2244.14Sharov, D., Luo, H., and Baum, J. D., “Implementation of Unstructured Grid GMRES+LU-SGS Method on Shared-

Memory, Cache-Based Parallel Computers,” AIAA Paper 2000-0927 , January 2000.15Luo, H., Baum, J. D., and Lohner, R., “A fast, matrix-free implicit method for compressible flows on unstructured grids,”

Journal of Computational Science, Vol. 146, 1998, pp. 664–690.16Nielsen, E. J., Anderson, W. K., Walters, Robert, W., and Keyes, D. E., “Application of Newton-Krylov Methodology

to a Three-Dimensional Unstructured Euler Code,” AIAA Paper 1995-1733 , 1995.17Peery, K. and Imlay, S., “Blunt-Body Flow Simulations,” AIAA Paper 88-2904 , January 1988.18van Leer, B., “Towards the Ultimate Conservation Difference Scheme III. Upstream-Centered Finite-Difference Schemes

for Ideal Compressible Flow,” Journal of Computational Physics, Vol. 23, 1977, pp. 263–275.19Barth, T., “The Design and Application of Upwind Schemes on Unstructured Meshes,” AIAA Paper 89-0366 , January

1989.20Roe, P. L., “Approximate Riemann Solvers, Parameters Vectors and Difference Schemes,” Journal of Computational

Physics, Vol. 43, 1981, pp. 357–372.21Liou, M.-S. and van Leer, B., “Choice of Implicit and Explicit operators for the upwind differencing method,” AIAA

Paper 1988-0624 , January 1988.22Whitaker, D. L., “Three-Dimensional Unstructured Grid Euler Computations Using a Fully-Implicit, Upwind Method,”

AIAA Paper 93-3337 , January 1993.23Berggren, M., “Discrete adjoint equations and gradients for edge-based discretizations of the Euler equations,” Tech.

rep., Sandia National Laboratories, 2003, Unpublished.24Griewank, A., Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, SIAM, 2000.

14 of 14


Documents

Intelligent Nonlinear Solvers for Computational Fluid Dynamics