Upload
holly-west
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Solution of the Implicit Formulation of High Order Diffusion for the Canadian
Atmospheric GEM Model
“High Performance Computing and Simulation Symposium 2008”
Ottawa, Canada, April 14-16, 2008
Abdessamad Qaddouri & Vivian Lee
Atmospheric Science & Technology
Ottawa, Canada, April 14-16, 2008 2
Outline
• Introduction of GEM Model
• High order Diffusion equation and solution
• Parallelization of the solution
• Numerical performance Tests
• Conclusion
Ottawa, Canada, April 14-16, 2008 3
Numerical Weather Prediction (NWP)
• Physics
• Applied Mathematics
• Real-time applications
• Computers at Canadian Meteorological centre (CMC) IBM P5+
NECSX-5/32M2
NECSX-4/80M3
NECSX-4/16
NEC SX-3/44R
Cray1S
CDC176
CrayXMP 416
CDC 7600
NEC SX-3/44
NEC SX-6/80M10
1
10
100
1000
10000
100000
1000000
10000000
1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
MF
LO
PF
s
CrayXMP 28
IBM P4
Ottawa, Canada, April 14-16, 2008 4
0 5 10 30 365deterministic
forecastsprobabilistic
forecasts
(days)902
Statist
ical
(4 ti
mes pe
r yea
r)
1empiricalforecasts
2.5 km
reso
lution
(onc
e per
day)
15 km
reso
lution
(twice
per d
ay)
35 km
reso
lution
(onc
e per
day)
100 k
m reso
lution
(onc
e per
day)
250 k
m reso
lution
(twice
per m
onth)
250-
400 k
m reso
lution
(4 ti
mes pe
r yea
r)Forecast lead time
Ottawa, Canada, April 14-16, 2008 5
Var
iabl
e
Uni
form
Rotated
LimitedArea
15km= 574x641x58
35km=800x600x58
2.5km=672x494x58
Ottawa, Canada, April 14-16, 2008 6
Hydrostatic Model
• Horizontal motion (momentum)
• Thermodynamics, hydrostatic and state
• Continuity and boundary conditions
lnH
H Hd v
dR T p f
dt
Vk V F
ln ln ( ) 1; ;
d T d p gh pF
dt dt p RT
ln 0; , 0
bottom top
d p ZD Z Z
dt Z Z
Ottawa, Canada, April 14-16, 2008 7
Schematic for Semi lagrangian implicitMethod used for the integration of GEM Model
Discretization ...),,(
0)(
pTX
Xdt
dX
V
H
( )
( )
XX R
XR X
H
H 2
),(),,()(
~
)(~
tttdt
d
rVrVrV
rVr
Trajectory
)()(
)(
)1()1()1(
)1()()(
kkk
kkk
XXX
XRXX
NH
N
L
L
Nonlinear IterationsDiffusion
on specific fields
Ottawa, Canada, April 14-16, 2008 8
Horizontal High order Diffusion
• Horizontal prognostic field
• Damping rate
121 ; 2,4,6,8
mm m
t
Wave-length
Dam
ping
rat
e
Ottawa, Canada, April 14-16, 2008 9
Horizontal High order Diffusion…
• Horizontal prognostic field
• Implicit Discretization
121 ; 2,4,6,8
mm m
t
1 1
2 2
/22 1 1
22
2 22
1 1
1 1
1with cos
cos
m n n nm m
m n n
t t
R
a
Ottawa, Canada, April 14-16, 2008 10
Horizontal High order Diffusion …
• Del 4 Horizontal Diffusion
• Spatial Discretization
2
2 0
R
,
, 0
with , ; R
AA
A
P P P P
P P P P
III
r
r
Ottawa, Canada, April 14-16, 2008 11
Spatial disretization
2 21 1
1 1
2 2 21 1 2
1 1 2
21
1 1 1 1
0 1 1 01
11
1
1 1 1 1
1 1
cos cos
sin sin
cos cos cos
sin sin sin
cos
s
;
Nj
Ni
Ni Ni Ni Ni
P
P
1
2 21 1
1 1
in
cos cos
sin sin
Nj
Nj Nj
Nj Nj
Ottawa, Canada, April 14-16, 2008 12
Horizontal High order Diffusion …
• Fast Direct Solution
• Projection
1 1
1
; Z Z
with
Ni NiI I I I
ij i j ij i jI I
I II
NiI Ii i IIii
i
P P
P
0
with
A Z I Z r
A I Z
A ; I
I I I I I
I I I I
I IIP P P
Ottawa, Canada, April 14-16, 2008 13
Horizontal High order Diffusion …
• Direct Solution
• Matrix Form
, 1 , , , 1
, 1 , , , 1
1
1
A 0 A ( ) A 0
0 A ( ) A 0 A
r; 1, .
0
with
I I Ij j j j j j j j
I I Ij j j j j j j j
j Ij
j
j
Ij
j Ij
P
P
X
X j Nj
X
XZ
BXM
Ottawa, Canada, April 14-16, 2008 14
Horizontal High order Diffusion …
• Block Tri-diagonal problem solution
• Solution
1 1
2 2 2
3
1 1
with
1
11 1 1
M
M ( ) ( ); ; 2,
Nj Nj
Nj Nj
i i i i i
D E
F D E
F
D E
F D
L UD D F E i Nj
( ) ; ( ) * L Y B U X Y
Ottawa, Canada, April 14-16, 2008 15
Summary of the algorithm
• Analysis of the right hand side (FFT or MMM)
• Solution of (Nk*Ni) tri-diagonal Problems
• Synthesis of the solution (FFT or MMM)
,
1
r r ,
Ni
I Ij i i j
i
,
1
.
Ni
I Ii j i j
i
BXM
Ottawa, Canada, April 14-16, 2008 16
A Parallel algorithm
• Global Transposition (Ni/P,Nj/Q,Nk) (Nj/Q,Nk/P,Ni)
• Analysis of the right hand side
• Global Transposition (Nj/Q,Nk/P,Ni) (Nk/P,Ni/Q,Nj)
• Solution of the block tridiagonal problems
• Global Transposition (Nk/P,Ni/Q,Nj) (Nj/Q,Nk/P,Ni)
• Synthesis of the solution
• Global Transposition (Nj/Q,Nk/P,Ni) (Ni/P,Nj/Q,Nk)
Ottawa, Canada, April 14-16, 2008 17
35km mesoglobal runAt 72hr forecast
U component without diffusion
U component with DEL 6 diffusion
Ottawa, Canada, April 14-16, 2008 18
Table 1. Breakdown of timings in the major components of the Canadian 35Km mesoglobal operational model for an integration of 72 hours on 12 nodes (2 x 24 x 4)
Components Time(sec) Percentage
Rhs 14.08 1.48
Adv 247.71 26.01
Prep 14.24 1.49
Nli 33.11 3.48
Sol 71.06 7.46
Bac 13.4 1.41
Phy 435.19 45.7
Hzd 82.86 8.7
vspng 82.86 2.14
output 10.38 1.09
Others 9.91 1.04
Total 952.31 100
Ottawa, Canada, April 14-16, 2008 19
Table 2. MPI test runs for 35km mesoglobal (OpenMP=1);the number of calls to the diffusion is 964 times
Setup
P x Q
Number of
PEs
Nodes Diffusion
Time(sec)
Relative
Ideal
Speedup
Relative
Speedup
1x16 16 1 596.46 1 1
2x16 32 2 320.46 2 1.86
2x24 48 3 222.34 3 2.68
4x16 64 4 170.12 4 3.51
Ottawa, Canada, April 14-16, 2008 20
Table 3. MPI test runs for 17 Km mesoglobal (OpenMP=1); the number of calls to the diffusion is 964 times.
Setup
P x Q
Number of
PEs
Nodes Diffusion
Time(sec)
Relative
Ideal
Speedup
Relative
Speedup
2x16 32 2 1769.48 1 1
2x24 48 3 1206.01 1.5 1.47
4x16 64 4 915.83 2 1.93
4x20 80 5 764.13 2.5 2.32
4x24 96 6 646.64 3 2.74
7x16 112 7 620.98 3.5 2.85
8x16 128 8 595.77 4 2.97
Ottawa, Canada, April 14-16, 2008 21
MPI Relative Speedup
•35km Mesoglobal FFT 17km Mesoglobal FFT
Ottawa, Canada, April 14-16, 2008 22
Table 4. OpenMP test runs for 35Km mesoglobal configured (1 x 16 x OpenMP) using FFT: the number of calls to the diffusion is 964 times.
OpenMP Nodes Diffusion Time(sec)
Relative Ideal Speedup
Relative Speedup
1 1 596.46 1 1
4 4 186.41 4 3.2
8 8 132.27 8 4.51
Ottawa, Canada, April 14-16, 2008 23
Table 5. OpenMP test runs for 35Km mesoglobal configured(1 x 16 x OpenMP) using Matrix multiplication: the number of calls to the diffusion is 1084 times.
OpenMP Nodes Diffusion Time(sec) Relative Ideal Speedup
Relative Speedup
1 1 2129.93 1 1
4 4 588.08 4 3.62
8 8 348.44 8 6.11
Ottawa, Canada, April 14-16, 2008 24
OpenMP relative Speedup
•35km Mesoglobal FFT 35km Mesoglobal MXM
Ottawa, Canada, April 14-16, 2008 25
Conclusion
• An efficient implementation of the parallel Fast Direct Solution for the implicit formulation of horizontal diffusion problem
• Comparison with iterative methods like preconditioned Krylov methods.
Ottawa, Canada, April 14-16, 2008 26
Thank You!
Merci!