Upload
phamphuc
View
233
Download
0
Embed Size (px)
Citation preview
COSA
Compressible Finite Volume
Parallel Multiblock Multigrid
Navier-Stokes Solver
Dr M.Sergio Campobasso, Jernej DrofelnikUniversity of Lancaster, Lancaster LA1 4YR, UK
University of Glasgow, Glasgow G12 8QQ, UK
ASEArch study group meeting, 25th-26th April 2012
Outline
• Steady and Time-Domain (TD) Navier-Stokes (NS) equations
– Space discretization– Numerical integration
• Harmonic Balance (HB) NS equations and their numerical integration
• Code architecture
• Parallelization
• Sample results
1
Time-Domain Navier-Stokes equations
• Arbitrary Lagrangian-Eulerian (ALE) form of NS equations:
∂
∂t
(∫
C(t)
U dV
)+
∮
∂C(t)
(Φc − Φd) · dS = 0
U = [ρ ρu ρv ρε]T , ε = e +u2 + v2
2, H = ε +
p
ρ
Φc = Eci + Fcj − vb U , Φd = Edi + Fdj
Ec = [ρu ρu2+p ρuv ρuH]T , Ed = [0 τxx τxy uτxx + vτxy − qx]T
Fc = [ρv ρuv ρv2+p ρvH]T , Fd = [0 τxy τyy uτxy + vτyy − qy]T
τ = 2µ[s − 1/3(∇ · v)I] , s =1
2
(∇v + ∇
Tv)
, q = −k∇T
• k − ω Shear Stress Transport (SST) for turbulence closure
2
Space discretization of steady equations
• Convective fluxes discretized with Roe’s flux difference splitting (Roe 1981)and Van Leer’s second/third order MUSCL extrapolations (Van Leer 1974).
Φ∗
i,f =1
2[Φi,f(UL) + Φi,f(UR) −
∣∣∣∣∂Φi,f
∂U
∣∣∣∣ δU]
with Φ∗
i,f being the numerical approximation to the continuous flux componentΦi,f = Φi · n at a volume face along the face normal n.
• Diffusive fluxes discretized with second order central differences.
3
Space discretization and integration of steady equations
• Scheme stencil of NS equations
– 2D: 13 points– 3D: 25 points
• Explicit numerical integration based on 4-stage Runge-Kutta (RK) schemeand convergence acceleration by means of local time-stepping (LTS), centeredvariable-coefficient implicit residual smoothing (IRS), and multigrid (MG).
4
Time integration of space discretized TD equations
• After space-discretizing, one has to solve system of ODE’s:
VdQ
dt+ RΦ (Q) = 0
• Dual-time stepping: implicit second order discretization of dQ/dt to marchin physical time t:
Rg
(Qn+1
)=
3Qn+1− 4Qn + Qn−1
2∆tV + RΦ
(Qn+1
)= 0
and RK pseudo-time-marching with LTS/IRS/MG to get solution at eachphysical time:
V
(dQ
dτ
)n+1
+ Rg
(Qn+1
)= 0
5
MG integration of TD equations (for a given physical time)
• Using Jameson’s RK scheme (Jameson 1981)
W0 = Qn+1,l
Wk = W0−
αk∆τ
VLirs[Rg − fMG]
(Wk−1
), k = 1, NS
Qn+1,l+1 = WNS
• For each block W, Rg and fMG have length (imax × jmax) with structure
W = [W1 W2 · · · Wimax×jmax]T
and each subarray has length npde.
• Application of low-speed preconditioning and RK stabilization reported inCampobasso and Baba-Ahmadi, GT2011-45303
6
Harmonic Balance Navier-Stokes equations• Arbitrary Lagrangian-Eulerian (ALE) form of Navier Stokes equations:
ωD
∫
C(t)
UH dVH +
∮
∂C(t)
(Φc,H − Φd,H) · dSH = 0
where UH, Φc,H and Φd,H ∈ Rnpde×(2NH+1)
UH =[U(t0) U(t1) U(t2) · · · U(t2NH+1)
]T
Φc,H =[Φc(t0) Φc(t1) Φc(t2) · · · Φc(t2NH+1)
]T
Φd,H =[Φd(t0) Φd(t1) Φd(t2) · · · Φd(t2NH+1)
]T
NH is user-given number of complex harmonics and D is block antisymmetricmatrix of size (2NH + 1) × (2NH + 1)
• k − ω Shear Stress Transport (SST) for turbulence closure
7
MG integration of HB equations
• Integration based on same RK/LTS/IRS/MG approach used for steady problem:
(dQH
dτ
)V + Rg,H (QH) = 0 where Rg,H (QH) = ωQHV D + Rφ,H
• Update step reads:
WkH = W0
H − αk∆τV −1Lirs
[Rg,H
(Wk−1
H
)+ fMG,H
]
• For each block, WH, Rg,H and fMG,H have length (imax × jmax) withstructure
WH = [WH,1 WH,2 · · · WH,imax×jmax]T
and each subarray has length npde × (2NH + 1).
• LSP implementation and RK stabilization in GT2011-45303
8
Code architecture• Majority of the code is written in FORTRAN 77
• Finite Volume cell-centered structured multi-block grids
– Adjacent block connectivity via two rows of halo cells– No hanging nodes
• Steady and TD:
– W̃ is defined for each multigrid level and it has dimension W̃(N), where
N = npde ×nblock∑
i=1
imax(iblock) × jmax(iblock) × kmax(iblock)
W̃ = [W1,W2, ...,Wiblock, ...,Wnblock]
Wiblock(imax(iblock), jmax(iblock), kmax(iblock), npde)
9
Code architecture (cont’d)
• HB:
– W̃N is defined for each multigrid level and it has dimension W̃N(N), where
N = npde×nharms×nblock∑
i=1
imax(iblock)×jmax(iblock)×kmax(iblock)
W̃N = [W1,W2, ...,Wiblock, ...,Wnblock]
Wiblock(imax(iblock), jmax(iblock), kmax(iblock), npde, nharms)
• Memory of W̃N allocated dynamically for the entire grid level
• Integer offsets used to move block to block on each grid level
• Pointers used to move from one grid level to another
10
Code architecture (cont’d)• Parent/child structure. For each grid level nl:
subroutine smooth(nl)
...
pq=p_q(nl)
...
call roflux(q, ...)
...
subroutine roflux(q, ...)
...
do iblock = 1,nblock
iq=offset(iblock)
call roflux_b(q(iq), ...)
subroutine roflux_b(q(iq), ...)
block operations
end
end do
end
11
Steady/TD versus HB code structure• HB equations can be viewed as system of 2NH + 1 steady problems
coupled by source term ωQHV D.• Thus, memory requirement of HB solver is about 2NH + 1 that of the
steady or TD solver.• Most routines of HB solver have one additional loop level with respect
to those of steady and TD solvers.
Typical TD routinedo ib = 1,nblock
do k = 1,kmax
do j = 1,jmax
do i = 1,imax
operations on
arrays(i,j,k,1:npde,ib)
Typical HB routinedo ib = 1,nblock
do ih = 0,2*nh
do k = 1,kmax
do j = 1,jmax
do i = 1,imax
operations on
arrays(i,j,k,1:npde,1:ih,ib)
12
Parallelization of HB solver
• Steady and TD NS solvers feature distributed memory MPI parallelizationover blocks (grid partitions). Efficiency of this approach stems from smallnessof data transferred among blocks (halo data).
• for HB NS solver hybrid parallelization is used: distributed memory MPIparallelization over blocks, and shared memory OpenMP parallelizationover harmonics
• With pure MPI parallelization, node memory force number of MPI processesto be smaller than number of cores. Adding OpenMP threads retreivescomputational speed
• New multicore processors (IBM BG/Q) allow using more processes thanavailable cores
13
Parallelization options for HB solver
DISTRIBUTED (MPI)
• Handles many-partition analyses
• More efficient than shared (OpenMP)memory parallelization
• Efficiency may (?) decreaseswith increasing amount of processcommunication
SHARED (OpenMP)
• Applicable to harmonic, but also(alternatively) to block or cell loops
• Max. problem size dictated by nodememory
• Usually less efficient than MPI, butefficiency increases with work done byloop
HYBRID (or MIXED)
• Cluster as set of interlinked (in distributed fashion) shared memory nodes
• Geometric partitions handled by MPI
• Harmonic loops handled by OpenMP
14
Parallel communications
• MPI communications used to exchange halo data across grid cuts
• Each MPI process handles block set (minimum set size = 1)
• Global values (forces, residual RMSs, etc.) computed with MPI reduce
15
Parallel I/O
• Entire multiblock grid in single file (mesh.dat).
• Entire flow field in single file (restart).
• Entire solution (grid and flow field) in single TECPLOT file (flowtec.dat).
• All MPI processes read/write from/to same global file.
• Processes use MPI FILE OPEN and MPI FILE CLOSE to open and closefiles.
• Each process works out data (blocks set) location in the file using file pointer;MPI FILE SEEK moves the file pointer to the desired location.
16
• Data to that location written using MPI I/O functionality, e.g.MPI FILE WRITE.
17
Parallel performance of MPI
Pitching NACA 0015 airfoil:
• Number of blocks: 2048
• Block size: 48×48
• Overall number of cells: 4.7 million
• HB nharms = 8
• Cluster name: FERMI
• Cluster characteristics:
– IBM-BG/Q– IBM PowerA2 processor, 1.6 GHz– 16 cores per node– 16GB RAM per node
128 256 512 1024 20481
2
4
8
16
No. of cores
Spe
edup
idealXlfGfortran
18
Parallel performance of MPI (cont’d)
3-blade Vertical axis wind turbine:
• Number of blocks: 3072
• Average block size: 24×60
• Overall number of cells: 4.7 million
• HB nharms = 8
• Cluster name: FERMI
• Cluster characteristics:
– IBM-BG/Q– IBM PowerA2 processor, 1.6 GHz– 16 cores per node– 16GB RAM per node
128 256 512 1024 1536 30721
2
4
8
12
24
No. of cores
Spe
edup
idealXlfGfortran
19
Thank you for your attention
For further enquiries, please contact
M. Sergio CampobassoE-mail: [email protected]
Jernej DrofelnikE-mail: [email protected]
20