11

Click here to load reader

IMA18-06

Embed Size (px)

Citation preview

Page 1: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 1/11

INSTITUTE for MATHEMATICS(Graz University of Technology)

&INSTITUTE for MATHEMATICS

and

SCIENTIFIC COMPUTING

(University of Graz)

A. Bonfiglioli, B. Carpentieri and M. Sosonkina

A parallel solver for Euler andNavier-Stokes problems on unstructured

grids

Report No. 18/2006 September, 2006

Institute for Mathematics D,

Graz University of Technology

Steyrergasse 30

A-8010 Graz, Austria

Institute for Mathematics and

Scientific Computing,

University of Graz

Heinrichstrasse 36

A-8010 Graz, Austria

Page 2: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 2/11

EulFS : a Parallel CFD Code for the Simulation

of Euler and Navier-Stokes Problems on

Unstructured Grids

Aldo Bonfiglioli1, Bruno Carpentieri2, and Masha Sosonkina3

1 Dip.to di Ingegneria e Fisica dell’Ambiente, University of Basilicata, Potenza, Italy,[email protected]

2 Karl-Franzens University, Institut of Mathematics and Scientific Computing, Graz,Austria, [email protected]

3 Ames Laboratory/DOE, Iowa State University, Ames, USA, [email protected]

Abstract. We present results with a parallel CFD code that computessteady-state solutions of the Reynolds-Favre averaged Navier-Stokesequations for the simulation of the turbulent motion of compressibleand incompressible Newtonian fluids. We describe solution techniquesused for the discretization, algorithmic details of the implementationand report on preliminary experiments on 2D and 3D problems, for bothinternal and external flow configurations.

1 The Physical Problem

The simulation of fluid dynamic problems relies on the solution of Euler and

Navier-Stokes equations, a set of partial differential equations that describe thefundamental conservation laws of mass, motion quantity and energy applied tocontinuous flows. Approximate time-dependent values of the conserved variablesmay be computed by decomposing the computational domain Ω ⊆ R

d(d =2, 3) into a finite set of nonoverlapping control volumes C i and discretizing theconservation laws for each infinitesimal volume. The discretization results in asystem of nonlinear equations that can be solved using Newton’s method. Givena control volume C i, fixed in space and bounded by the control surface ∂ C i withinward normal n, the integral, conservation law form of the mass, momentum,energy and turbulence transport equations can be concisely written as:

C i

∂ Ui

∂t

dV = ∂C i

n · F dS − ∂C i

n · G dS + C i

S dV (1)

where we denote by U the vector of conserved variables. For compressible flows,we have U = (ρ,ρe,ρu, ν )

T , and for incompressible, constant density flows,

U = ( p, u, ν )T

. Throughout this paper, the standard notation is adopted for thekinematic and thermodynamic variables: u is the flow velocity, ρ is the density, p

is the pressure (divided by the constant density in incompressible, homogeneous

Page 3: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 3/11

2

flows), T is the temperature, e and h are the specific total energy and enthalpy,respectively, ν is a scalar variable related to the turbulent eddy viscosity ν via

a damping function. The operators F and G represent the inviscid and viscousfluxes, respectively. For compressible flows,

F =

ρu

ρuh

ρuu + pI

ν u

, G =

1

Re∞

0u · τ +∇q

τ

1

σ [(ν + ν )∇ν ]

,

and for incompressible, constant density flows,

F =

a2u

uu + pI

ν u

, G =

1

Re∞

0

τ 1

σ [(ν + ν )∇ν ]

.

Finally, S

is the source term, which has a non-zero entry only in the rowcorresponding to the turbulence transport equation:

S =

000

cb1 [1 − f t2] S ν + 1

σRe

cb2 (∇ν )

2

+

− 1

Re

cw1f w −

cb1κ2

f t2

ν d

2+ Ref t1∆U 2

.

In the case of high Reynolds number flows, turbulence effects are accounted for bythe Reynolds-Favre averaged Navier-Stokes (RANS) equations that are obtainedfrom the Navier-Stokes (NS) equations by means of a time averaging procedure.The RANS equations have the same structure as the NS equations with an

additional term, the Reynolds’ stress tensor, that accounts for the effects of theturbulent scales on the mean field. A closure problem arises, since the Reynolds’stresses require modeling. Using Boussinesq’s hypothesis as the constitutive lawfor the Reynolds’ stresses amounts to link the Reynolds’ stress tensor to themean velocity gradient through a scalar quantity which is called turbulent (oreddy) viscosity. With Boussinesq’s approximation, the RANS equations becomeformally identical to the NS equations, except for an “effective” viscosity (andturbulent thermal conductivity), sum of the laminar and eddy viscosities, whichappears in the viscous terms of the equations. In the present work, the turbulentviscosity is modeled using the Spalart-Allmaras one-equation model [7]. Despitethe non-negligible degree of empiricism introduced by turbulence modeling, itis recognized that the solution of the RANS equations still remains the onlyfeasible approach to perform computationally affordable simulations of problemsof engineering interest on a routine basis.

2 Solution Techniques

The compressible RANS equations are discretized in space using FluctuationSplitting (or residual distribution) schemes [9]. Introduced in the early eighties by

Page 4: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 4/11

3

P.L. Roe, and successively further developed by a number of groups worldwide,this class of schemes shares common features with both Finite Element (FE) and

Finite Volume (FV) methods. Just as with iso-P1 FE, the dependent variablesare stored in the vertices of the mesh, which is made of triangles in two spacedimensions (2D) and tetrahedra in three (3D), and assumed to vary linearlyin space. Control volumes (C i) are drawn around each gridpoint by joining (in2D) the centroids of gravity of the surrounding cells with the midpoints of allthe edges that connect that gridpoint with its nearest neighbors. An example of these polygonal shaped control volumes (so-called median dual cells) is shownby dashed lines in Fig. 1. Using a FV-type approach, the integral form of thegoverning equations (1) is discretized over each control volume C i; however,rather than calculating fluxes by numerical quadrature along the boundary ∂ C iof the median dual cell, as would be done with conventional FV schemes, the fluxintegral is first evaluated over each triangle (tetrahedron) in the mesh and thensplitted among its vertices, see Fig. 1(a). Gridpoint i will then collect fractions ΦT

iof the flux balances of all the elements by which it is surrounded, as schematicallyshown in Fig. 1(b).

Φ3

Φ2

Φ1

(a) The flux balance of cell T isscattered among its vertices.

(b) Gridpoint i gathers thefractions of cell residuals fromthe surrounding cells.

Fig. 1. Residual distribution concept.

This approach leads to a space-discretized form of Eq. (1) that reads: C i

∂ Ui

∂t dV =

T i

ΦT i

where

ΦT =

∂T

n · F dS −

∂T

n · G dS +

T

S dV

Page 5: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 5/11

4

is the flux balance evaluated over cell T and ΦT j is the fraction of cell residual

scattered to its j th vertex. Conservation requires that the sum over the vertices

of a given cell T of the splitted residuals ΦT j equals the total flux balance, i.e.,

j∈T ΦT j = ΦT . The properties of the scheme will depend upon the criteria used

to distribute the cell residual: distributing the convective flux balance along thecharacteristic directions gives the discretization an upwind flavour, while thedistribution of the viscous flux balance can be shown to be equivalent to astandard Galerkin FE discretization. It should also be stressed that, since thedependent variables are continuous across the element interfaces, the Riemannproblem model, so commonly used in most FV discretizations, is not adoptedin the present framework. As a consequence, this rather unusual approach toa FV-type discretization leads to a set of discrete equations that shows closerresemblance [3] to FE Petrov-Galerkin schemes rather than to FV ones. Fulldetails concerning the spatial discretization can be found in [2].

The discretization of the governing equations in space leads to the followingsystem of ordinary differential equations:

MdU

dt = R(U) (2)

where t is the physical time variable. In Eq. (2), M is the mass matrix and R(U)represents the spatial discretization operator, or nodal residual, which vanishesat steady state. We solve Eq. (2) in pseudo-time until a stationary solution isreached. Since we are interested in steady state solutions, the mass matrix can bereplaced by a diagonal matrix V, whose entries are the volumes (areas in 2D) of the median dual cells. The residual vector R(U) is a (block) array of dimensionequal to the number of meshpoints times the number of dependent variables, m;for compressible flows and a one-equation turbulence model m = d + 3 where

d is the spatial dimension. The i-th entry of R(U) represents the discretizedequation of the conservation laws for meshpoint i:

Ri =T i

ΦT i =

N ij=1

(Cij − Dij) Uj . (3)

In Eq. (3) the second summation ranges over the set N i of nodes surrounding(and including) meshpoint i, and the matrices Cij and Dij, respectively, accountfor the contribution of the inviscid and viscous terms in the governing equations;their detailed form is given in [2]. If the time derivative in equation (2) isapproximated using a two-point, one-sided finite difference formula:

dU

dt

= Un+1 − Un

∆t

,

then an explicit scheme is obtained by evaluating R(U) at time level n and animplicit scheme if R(U) is evaluated at time level n + 1. In this latter case,linearizing R(U) about time level n, i.e.

R(Un+1) = R(Un) +

∂ R

∂ U

n Un+1 − Un

+ H.O.T.

Page 6: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 6/11

5

we obtain the following implicit scheme:

1∆tn V − J

Un+1 − Un

= R(Un), (4)

where we denote by J the Jacobian of the residual ∂ R∂ U

. Eq. (4) represents a largenonsymmetric (though structurally symmetric) sparse linear system of equationsto be solved at each pseudo-time step for the update of the vector of the conservedvariables. Due to the compact stencil N i of the schemes, the sparsity patternof the Jacobian matrix coincides with the graph of the underlying unstructuredmesh, i.e. it involves only one-level neighbours. On average, the number of non-zero (block) entries per row equals 7 in 2D and 14 in 3D.

3 EulFS

We present experiments with an academic code [2] that simulates the turbulentmotion of compressible and incompressible Newtonian fluids on 2D and 3Dcomputational domains. The code computes the steady-state solution of theRANS equations for both inviscid (Euler’s equations) and viscous (Navier-Stokesequations) fluids, for internal and external flow configurations using unstructuredgrids. The numerical schemes adopted in the package are fluctuation splitting forthe spatial discretization of the conservation equations and pseudo-time stepping

for the time discretization. The analytical evaluation of the Jacobian matrix,though not impossible, is rather cumbersome [4] so that two alternatives havebeen adopted in the present implementation: one is based on an analyticallycalculated, but approximate Jacobian, the other uses a numerical approximationof the “true” Jacobian, obtained using one-sided finite differences formulae. In

both cases, the individual entries of the Jacobian matrix are computed and storedin memory.

The approximate (or Picard) linearization amounts to compute a givenJacobian entry as follows:

Jij ≈ Cij − Dij ,

i.e. it neglects the dependence of the convective and diffusive matrices upon thesolution, see Eq. (3).

In the finite difference (FD) approximation, the individual entries of thevector of nodal unknowns are perturbed by a small amount and the nodalresidual is then recomputed for the perturbed state Uj . The Jacobian entriesare then approximated using one-sided FD formulae:

Jij ≈ 1

Ri(Uj) − Ri(Uj)

.

Although the Jacobian matrix can be assembled using a single loop over all cells[4], its evaluation is computationally expensive, as it requires (d+1)×m residualevaluations. Therefore, the use of the FD approximation to the true Jacobianmatrix pays off only if a considerable reduction in number of iterations can

Page 7: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 7/11

6

be achieved over simpler iterative schemes. This can be obtained by exploitingthe quadratic convergence of Newton’s rootfinding method, which is recovered

from Eq. (4) as the timestep approaches infinity. However, the error reductionin Newton’s method is quadratic only if the initial guess is within a sufficientlysmall neighborhood of the steady state. This circumstance is certainly not met if the numerical simulation has to be started from “scratch”, so that in practice thefollowing two-step approach is adopted. In the early stages of the calculation, theturbulent transport equation is solved in tandem with the mean flow equations:the mean flow solution is advanced over a single time step using an approximate(Picard) Jacobian while keeping turbulent viscosity frozen, then the turbulentvariable is advanced over one or more pseudo-time steps using a FD Jacobianwith frozen mean flow variables. Due to the uncoupling between the mean flowand turbulent transport equations, this procedure will eventually converge tosteady state, but never yields quadratic convergence. Therefore, once the solutionhas come close to steady state, a true Newton strategy is adopted: the meanflow and turbulence transport equation are solved in fully coupled form andthe Jacobian is computed by FD. Also, the size of the time-step needs to berapidly increased to recover Newton’s algorithm and, according to the SwitchedEvolution Relaxation (SER) strategy proposed in [5], this is accomplished byletting ∆tn in Eq. (4) vary with the L2 norm of the residual at the initial andcurrent time-step as:

∆tn = ∆t||R(U0)||2||R(Un)||2

,

where ∆t is the time-step computed using the stability criterion of the explicittime integration scheme.

The code is implemented using the PETSc library [1] and has been ported to

different parallel computer architectures (SGI/Cray T3E-900, SUN E3500, DECAlpha, SUN Solaris and IBM RS6000, Linux Beowulf cluster) using the MPIstandard for the message-passing communications. The simulations presented inthis study are run on a Linux Beowulf cluster.

Experiments on the RAE Problem

The first test-case that we consider is the compressible, subsonic flow past thetwo-dimensional RAE2822 profile. Free-stream conditions are as follows: Machnumber M ∞ = 0.676, Reynolds’ number based on chord: ReC = 5.7 · 106, angleof attack: α∞ = 2.40. The computational mesh, which is shown in Fig. 2(a), ismade of 10599 meshpoints and 20926 triangles. The simulation is started fromuniform flow and the solution is advanced in pseudo-time using approximatelinearization. Once the L2 norm of the residual has been reduced below apre-set threshold, the fully coupled approach is put in place. The convergencehistory towards steady-state is shown in Fig. 2(b): only seven Newton iterationsare required to reduce the L2 norm of the residuals (mass, energy, x and y

momentum) to machine zero. For the inner linear solver, we use GMRES(30) [6]preconditioned by an incomplete LU factorization with pattern selection strategy

Page 8: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 8/11

7

(a) Computational mesh for theRAE2822 aerofoil.

(b) L2 residual norms versus solution time.

Fig. 2. Experiments on the RAE2822 airfoil.

based on the level of fill. The set F of fill-in entries to be kept for the approximatelower triangular factor L is given by

F = (k, i) | lev(lk,i) ≤ ,

where integer denotes a user specified maximal fill-in level. The level lev(lk,i)of the coefficient lk,i of L is defined as follows:

Initialization

lev(lk,i) =

0 if lk,i = 0 or k = i

∞ otherwiseFactorization

lev(lk,i) = min lev(lk,i) , lev(li,j) + lev(lk,j) + 1 .

The resulting preconditioner is denoted by I LU (). A similar strategy is adoptedfor preserving sparsity in the upper triangular factor U . We take x0 = 0 asinitial guess for GMRES and the stopping criterion as to reduce the originalresidual by 10−5, so that it can then be related to a norm-wise backward error.In Tables 1 and 2, we report on the number of iterations and the solution time(in seconds for a single-processor execution), respectively, obtained with I LU ()for different levels of fill-in from zero to four. Note that, in Fig. 2(b), the resultsare for I LU (2). It may be observed from Table 1, that the I LU () variants witha non-zero fill level lead to a more uniform iteration numbers needed to solvethe linear system in each Newton step.

A blocking strategy is incorporated in the construction of the preconditionerto exploit the inherent block structure of the coefficient matrix of the linearsystem. Block row operations replace standard row operations and diagonal

Page 9: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 9/11

8

blocks are inverted instead of the diagonal entries. The block size is equal tofive, that is the number of fluid dynamic variables in each computational node.

We observe that the use of block storage schemes combined with optimizedroutines available in PETSc for inverting the small diagonal blocks enables tosave both memory and setup costs. For this small problem, blocking of thephysical variables reduces the solution time of each linear system by a factor of three. Finally, the curves of the pressure distribution along the airfoil, reportedin Fig. 3(a), show a very good agreement between the simulation and theexperimental data.

Table 1. Iterations for solving theRAE problem using GMRES(30)preconditioned by block ILU ()

Newton iterIterations for ILU ()

0 1 2 3 41 56 35 22 19 172 115 55 29 23 213 111 60 44 27 234 85 53 32 22 205 56 43 23 19 166 116 55 36 24 197 82 35 25 20 17

Table 2. CPU time for solving theRAE problem using GMRES(30)preconditioned by block ILU ()

Newton iterSolution time (sec)

0 1 2 3 41 1.7 1.2 0.9 0.9 1.02 3.3 1.7 1.1 1.1 1.13 3.2 1.9 1.6 1.2 1.24 2.5 1.7 1.2 1.0 1.15 1.6 1.4 0.9 0.9 0.96 3.4 1.7 1.4 1.1 1.07 2.4 1.2 1.0 0.9 0.9

Experiments on the Stanitz Elbow

The three-dimensional test case that we have examined deals with the internalcompressible flow through the so-called Stanitz elbow. The simulation reproducesexperiments [8] conducted in early 1950’s at the National Advisory Committeefor Aeronautics (NACA), presently NASA, to study secondary flows in anaccelerating, rectangular elbow with 90 of turning. The chosen flow conditionscorrespond to a Mach number in the outlet section of 0.68 and Reynolds’ number4.3·105. Figure 3(b) shows the geometry along with the computed static pressurecontours. The computational mesh consists of 156065 meshpoints and 884736tetrahedral cells. The simulation has been run on 16 processors of a LinuxBeowulf cluster. Figures 4(a)–4(b) show the convergence history of the iterativesolution; we report on experiments with an Additive Schwarz preconditioner(see, e.g. [6]) with overlap=2 for GMRES, where the diagonal blocks areapproximately inverted using I LU (1). It can be seen that the decoupled strategywhich integrates the mean flow variables using Picard linearization and theturbulent transport equation using Newton linearization is not robust enough toobtain residual convergence to machine epsilon. Thus it needs to be accelerated

Page 10: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 10/11

9

0 0,5 1

x/C

-1,5

-1

-0,5

0

0,5

1

1,5

- C

p

Exp. data

LDA2 scheme

RAE 2822 case 1Pressure coefficient distribution along the airfoil

(a) RAE problem: comparison withexperimental data.

(b) Stanitz elbow geometry andcomputed static pressure contours.

Fig. 3. The RAE and “Stanitz” elbow problems.

using a true Newton algorithm close to the steady state. The only problem isto determine an effective strategy for switching from one integration scheme tothe other. In the present implementation, we adopt a criterium based on a user-defined tolerance for the residual reduction and a maximum number of (Picard)iterations. Although maybe not optimal, this strategy proves to be fairly robuston the reported experiments. In Table 3, we report on comparative results withrespect to number of iterations and solution time (in seconds) with a blockJacobi preconditioner. For the Additive Schwarz method, we performed tests

with different values of overlap observing very similar results.

0 500 1000 1500 2000 2500CPU mins

1e-12

1e-08

0,0001

L 2 n o r m o

f t h e r e s i d u a l s

massenergyx-momentumy-momentumz-momentum

3D flow through the Stanitz elbow

Picard iterationNewton iteration

(a) Convergence history for Picarditerations using ASM(2)+ILU(1).

0 50 100 150 200CPU secs

1e-13

1e-12

1e-11

1e-10

1e-09

1e-08

1e-07

L 2 n o r m o

f t h e r e s i d u a l s

massenergy

x-momentumy-momentum

z-momentum ν

3D flow through the Stanitz elbowASM(2) + ILU(1)

(b) Convergence history for Newtoniterations using ASM(2)+ILU(1).

Fig. 4. Experiments on the Stanitz elbow.

Page 11: IMA18-06

8/12/2019 IMA18-06

http://slidepdf.com/reader/full/ima18-06 11/11

10

Table 3. Solution cost for solving the Stanitz problem using GMRES(30)

Newton iterBJ+ILU(1) ASM(2)+ILU(1)

Iter CPU time Iter CPU time1 72 10.1 49 8.82 121 16.7 76 12.33 147 20.0 98 15.84 175 23.5 112 17.85 200 26.9 127 20.1

Although the solution of the RANS equation may require much lesscomputational effort of other simulation techniques like LES (Large EddySimulation) and DNS (Direct Numerical Simulation), severe numerical

difficulties may arise when the mean flow and turbulence transport equationare solved in fully coupled form, the Jacobian is computed exactly by meansof FD and the size of the time-step is rapidly increased to recover Newton’salgorithm. Indeed, on 3D unstructured problems reported successful experimentsare not numerous in the literature. The code is still in a development stage butthe numerical results are encouraging. Perspectives of future research includeenhancing both the performance and the robustness of the code on moredifficult configurations, and designing a multilevel incomplete LU factorizationpreconditioner for solving the inner linear system, that preserves the inherentblock structure of the coefficient matrix and scales well with the number of unknowns.

References

1. Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith.PETSc home page. http://www.mcs.anl.gov/petsc, 1998.

2. A. Bonfiglioli. Fluctuation splitting schemes for the compressible and incompressibleEuler and Navier-Stokes equations. IJCFD , 14:21–39, 2000.

3. J.-C. Carette, H. Deconinck, H. Paillere, and P.L. Roe. Multidimensional upwinding:its relation to finite elements. International Journal for Numerical Methods in

Fluids , 20:935–955, 1995.4. E. Issman. Implicit Solution Strategies for Compressible Flow Equations on

Unstructured Meshes . PhD thesis, Universite Libre de Bruxelles, 1997.5. W. Mulder and B. van Leer. Experiments with an Implicit Upwind Method for the

Euler Equations. Journal of Computational Physics , 59:232–246, 1985.6. Y. Saad. Iterative Methods for Sparse Linear Systems . SIAM, 2003.7. P.R. Spalart and S.R. Allmaras. A one-equation turbulence model for aerodynamic

flows. La Recherche-Aerospatiale , 1:5–21, 1994.8. John D. Stanitz, Walter M. Osborn, and Joghn Mizisin. An experimental

investigation of secondary flow in an accelerating, rectangular elbow with 90 of turning. Technical Note 3015, NACA, 1953.

9. E. van der Weide, H. Deconinck, E. Issman, and G. Degrez. A parallel, implicit,multi-dimensional upwind, residual distribution method for the Navier-Stokesequations on unstructured grids. Computational Mechanics , 23:199–208, 1999.