Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Peter Berczik
Astronomisches Rechen-Institut (ARI), Zentrum für Astronomie Univ. Heidelberg, Germany
SPH Astrophysics – “State of Art”.
SPHERIC 3rd, Lausanne, Switzerland, 4th – 6th June 2008
• Gas (particle) physics in astrophysics.
• Astrophysics SPH equations.
• Numerical astrophysics.
• Hardware accelerators.
• Recent multi-phase results.
Presentation plan:
- Astronomical observations- N-body inspiration ☺☺☺☺
• Gas (particle) physics in astrophysics.
• Astrophysics SPH equations.
• Numerical astrophysics.
• Hardware accelerators.
• Recent multi-phase results.
Presentation plan:
- Basic Equations- Cooling Function- Smoothing Length- Self Gravity- Time Integration- SPH - test
• Gas (particle) physics in astrophysics.
• Astrophysics SPH equations.
• Numerical astrophysics.
• Hardware accelerators.
• Recent multi-phase results.
Presentation plan:
- Computers- Codes- Results
• Gas (particle) physics in astrophysics.
• Astrophysics SPH equations.
• Numerical astrophysics.
• Hardware accelerators.
• Recent multi-phase results.
Presentation plan:
- GRAPE (only gravity)- MPRACE/FPGA (gravity + SPH)- GPU!!! ☺☺☺☺ (gravity + SPH)
• Gas (particle) physics in astrophysics.
• Astrophysics SPH equations.
• Numerical astrophysics.
• Hardware accelerators.
• Recent multi-phase results.
Presentation plan:
- Speedup- Accuracy- First results
Collaborators & Grants:
� Naohito Nakasato Univ. of Aizu, Japan� Keigo Nitadori Tokyo Univ., Japan� Ingo Berentzen & Rainer Spurzem Univ. Heidelberg� G.M. Martinez, G. Lienhart, A. Kugel, R. Maenner Univ. Mannheim� A. Burkert, M. Wetzstein, T. Naab, H. Vasquez Univ. Munich
� DFG SFB: No. 439/B11: 2005 - 2008
� Volkswagen/Baden-Württemberg, GRACE project: 2005 - 2008
Formation of the Universe
Observations
Observations
Observations
Observations
Isolated galaxy evolution
http://www-hpcc.astro.washington.edu/
GASOLINE: Wadsley, Stadel & Quinn, 2003
~few 10^6 SPH particles
Star Formation
Bate, Bonnell, Bromm, 2002
The calculation required ~100k CPU hours (~11.4 years) on the SGI Origin 3800 (64 CPU) of the United Kingdom Astrophysical Fluids Facility (UKAFF).
SPH: Benz, Bowers, Cameron & Press, 1990
OpenMP + Sink Particles: Bate, Bonnell & Price, 1995
~few 10^6 SPH particles
High mass stars can forms by gas (competitive) accretion!!!
Star Formation
Galaxy Collisions
Galaxy Collisions
GADGET 2.0 Springel, 2005
http://www.mpa-garching.mpg.de/gadget/
GADGET 2.0 details
BH’s in galaxies (MW - Sgr A*)
Mergers of Galaxies & MBH’s [Begelman, Blandford & Rees, 90’s]
Galaxy Collisions ≈ BH’s collisions
Galaxy Collisions ≈ BH’s collisions
Multiple Massive Black Holes NGC6240
strong ongoing merger…Komossa et al. 2002
Two AGN in each of Nuclei separation ~1kpc Chandra X-Ray
M82: The bright spots in the center are M82: The bright spots in the center are
supernova remnants and Xsupernova remnants and X--ray binaries. The ray binaries. The
luminosity of the Xluminosity of the X--ray binaries suggests that ray binaries suggests that
most contain a black hole. A close encounter most contain a black hole. A close encounter
with a large galaxy, M81, in the last 100 Myr is with a large galaxy, M81, in the last 100 Myr is
thought to be the cause of the starburst activity.thought to be the cause of the starburst activity.
Ebisuzaki et al. 2002Ebisuzaki et al. 2002
Galaxy Collisions ≈ BH’s collisions
GADGET 2.0 Simulations
Future Observations
Gravitational Wave Detection - LISA
Two of the strongest potential sources in the low-frequency (LISA)
regime are:
•Coalescence of binary supermassive black holes•Extreme-mass-ratio inspiral into supermassive black holes
Proto-Planet formation
GASOLINEMayer, Lufkin et al. , 2006
Mayer et al. , 2002, 2003, 2004
Largest astrophysical N-body simulations
The Astrophysical Journal, Nov. 1941
Dissertation Univ. Lund (Schweden) 1937: A study of double and multiple galaxies.
Galaxies often in groups and pairs. Satellit galaxies distributed unevenly. [Holmberg-Effect]
Father of numerical Astrophysics……with 200 light bulbs
Erik Holmberg (1908-2000)
ASCI-Q (LANL) ~30 Tflops ~250M USD
Earth Simulator ~40 Tflops ~350M USD
48 x GRAPE6 ~48 Tflops~3M USD
Makino, 2002
Nowadays real supercomputers…
GRAPE Gordon Bell prizes
GRAPE history tree
GRAPE’s all over the World
http://www.astrogrape.orghttp://www.astrogrape.org
iti
ii a
dt
rd rr
=2
2
i
i
v
rr
r
jiij rrrrrr −=
Basic idea of any N-body code
j
ijij
jij r
r
mGf
rr
)(
2/322 ε+⋅
−=
jm )1( −⋅∝ NN
oldt ttt oldnew ∆+=
oldi
oldi
v
rr
r
newi
newi
v
rr
r
2
tavv i
newi
newi
∆⋅+= rrr
jiij rrrrrr −=
2
tavv i
oldi
newi
∆⋅+= rrr
tvrr newi
oldi
newi ∆⋅+= rrr
∑≠= +
⋅−=
N
ijjij
ij
ji r
r
mGa
;12/322 )(
rr
ε
Basic idea of any N-body code
it
i
i
i
v
rr
r
j
jm
i
j
N
N
Basic idea of any N-body code
C OBTAIN THE “FULL” FORCE FOR ALL BODY’S.C …………
DO 10 I = 1,NAX(I) = 0.0AY(I) = 0.0AZ(I) = 0.0DO 20 J = 1,N
IF (J.EQ.I) GO TO 20DX_IJ = X(I) - X(J)DY_IJ = Y(I) - Y(J)DZ_IJ = Z(I) - Z(J)DR2 = DX_IJ*DX_IJ + DY_IJ*DY_IJ + DZ_IJ*DZ_IJ + EPS 2TEMP = M(J)/( DR2*SQRT(DR2) )AX(I) = AX(I) – TEMP*DX_IJAY(I) = AY(I) - TEMP*DY_IJAZ(I) = AZ(I) - TEMP*DZ_IJ
20 CONTINUE10 CONTINUEC …………
Basic idea of any N-body code
∑≠= +
⋅−=
N
ijjij
ij
ji r
r
mGa
;12/322 )(
rr
ε
ijij
jij r
r
mGf
rr
)(
2/322 ε+⋅
−=
~N ~N^2
∑≠=
=N
ijjiji fa
;1
rr
Basic idea of any GRAPE N-body code:
http://www.metrix.co.jp
Commerce GRAPE6a boards
ISM “Ecology”
Tumlinson, 2004: astro-ph/0411249
WARM - HOT: SPH
COLD: N-body + Viscosity
STAR: N-body + SSP
Our Multi-Phase GRAPE SPH code
STARCOLD
WARM - HOT
SF
FB: SW + PN
C & E FB: SNII + SNIa
SPH Warm - Hot gas10^4 – 10^7 K
Cold gas clumps10^2 – 10^4 K
N-bodySSP
N-bodyDRAGCOLLDP-C&E
Our Multi-Phase GRAPE SPH code
WARM - HOT
COLD STAR
DE + DZ (O; Fe)
Our Multi-Phase GRAPE SPH code
∑=
⋅≡N
jijji Wm
1
ρ
extiiiiiji
N
jij
j
j
i
ij
i WPP
mdt
vd Φ∇−Φ∇−∇⋅
Π++⋅−= ∑
=
rrrr
122
~ρρ
( ) iii uP ⋅⋅−= ργ 1
Basic Equations
Monaghan, 1977; Lucy, 1977
( )i
iiijiji
N
jij
j
j
i
ij
i WvvPP
mdt
du
ρρρΛ−Γ+∇⋅−⋅
Π++⋅= ∑
=
rrr
122
~2
1
Monaghan & Gingold, 1983
[ ]else0
0)(if- 2 <⋅⋅+⋅⋅=Π ijijijijijij
ij
vrcrrρµβµα
22
)()(
ijji
jijiijij
hrr
rrvvh
⋅+−
−⋅−⋅=
εµ
rr
rrrr
( )jiij ρρρ +=2
1 ( )jiij hhh +=2
1
01.0
2
1
===
εβα
( )jiij ccc +=2
1
Basic Equations
hr
hr
hr
hr
hr
hr
hhrW
≤<≤<≤
−+−
⋅⋅
=2
21
10
0
)2(
)()(11
);( 341
3432
23
3π
Monaghan & Lattanzio, 1985 Hernquist & Katz, 1989
);( ijjiij hrrWWrr −=
( ) ijjiij ff Π⋅+=Π2
1~ ( )( ) ( ) iiii
i
ihcvv
vf
⋅+×∇+⋅∇
⋅∇=
εrrrr
rr
Balsara, 1995; Steinmetz, 1996
Basic Equations
( )jiij hhr ;max2 ⋅≤
Inside “2•h” NB = const =50
Define smoothing length
Using the ANN with kdtree
http://www.cs.umd.edu/~mount/ANN/
2HΛ
( ) [ ]( )( )pii
i
mn
nu
⋅=
⋅Λ≅Λ≡Λ
µρρ
2
2* HFe,T,...Z,,
Delgarno & McCray, 1972; Sutherland & Dopita, 1993
tuuu
tvvrr
tavv
ni
ni
pi
pi
ni
ni
pi
ni
ni
pi
∆⋅+=
∆⋅++=
∆⋅+=
&
rrrr
rrr
2)(
2)(
2)(
2)(
1
11
1
tuuuu
tvvrr
taavv
pi
ni
ni
ni
ni
ni
ni
ni
pi
ni
ni
ni
∆⋅++=
∆⋅++=
∆⋅++=
+
++
+
&&
rrrr
rrrr
Predictor step:
Corrector step:
Integrator
⋅⋅=∆i
i
i
i
i
i
i
i
u
u
c
h
v
h
a
ht
&rr ;;;
2min1.0Time step:
RIT & ARI 32 node GRAPE6a clusters
MAO 8+1 node GRAPE6 blx64 cluster
• 9 x2 dual-core Xeon 2.0 GHz • 9 GRAPE6 blx64• 5 TB RAID• Infiniband switch (2x10 Gb/s)• Speed: ~1 Tflops• N up to 2M• Cost: ~100k EUR• Funding: NASU
ARI 32 node GRAPE6a cluster:
32x2 64 bit-Xeon P4, 3.2 GHz (~2 Gfps)32 GRAPE6a (~120 Gfps)32 FPGA-MPRACE (~20 Gfps)3.5 TB RAID5 disk systemInfiniband, dual port network (~20 Gb/s)
Summary speed: ~4 TfpsN (direct summation) up to 4M
Volkswagen/Baden-Württemberg ~400k EUR
GRACE=GRAPE + MPRACE:
MPRACE FPGA board
FP arithmetic: 16 or 24 mantissa
MPRACE FPGA board
Pressure force pipeline
iiii tvrm ;;;rr
jj rmr
;
ii ar
;φ
N
Ntree ~N/N p
Parallel TREE gravity on the cluster
Jun Makino: TREE+GRAPE code
Makino, PASJ, 43 , 621 (1991)
Inter. list on host ~NInter. list length -> short…
Makino, PASJ, 56 , 521 (2004)Fukushige, Makino & Kawai, PASJ, 57 , 1009 (2005)One interaction list is shared among NGR particles!Inter. list on host ~N/NGRInter. list length -> larger…
NGR
Parallel TREE gravity on the cluster
64
32
16
8
4
2
1
0.5 64 32 16 8 4 2 1
∆TCPU (sec)
Number of particles: N (in M)
Uniform & Plummer sphere, one full force calculation, G=M=R=1, ε=10-2
1 min
010204081632∼N
Parallel TREE gravity on the cluster
Parallel TREE gravity on the cluster
Parallel TREE gravity on the cluster
Adiabatic collapse of a cold gas sphere.
Evrard, 1988Steinmetz & Muller, 1993Carraro et al., 1998Springel et al., 2001 rR
M 1
2 2⋅
⋅⋅=
πρ
R
MGEG
2
3
2 ⋅⋅−=
R
MGu
⋅⋅= 05.01=== RMG
SPH - test
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5 3
Energy
t
EKIN
ETHE
EPOT
ETOT
10k 50k100k
SPH - test
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5 3
Energy
t
EKIN
ETHE
EPOT
ETOT
1k 2k 4k 8k16k32k
Berczik (NCPU=1) Nakasato (NCPU=4)
10-6
10-5
10-4
10-3
0 0.5 1 1.5 2 2.5 3
Energy error: |
∆ETOT|/|ETOT|
t
10k 50k100k
SPH - test
Berczik (NCPU=1) Nakasato (NCPU=4)
10-6
10-5
10-4
10-3
0 0.5 1 1.5 2 2.5 3
Energy error: |
∆ETOT|/|ETOT|
t
1k 2k 4k 8k16k32k
Scaling results
0.1
1
10
100
1000
1000 800 400 200 100
One timestep:
∆TCPU (sec)
Number of particles: N (in k)
NCPU = 01
02 04 08 16
∼N1.1
0.1
1
10
100
1000
1 2 4 8 16
One timestep:
∆TCPU (sec)
Number of CPU: NCPU
N = 100k 200k 400k 800k 1000k∼NCPU
-1.15
GRAPE + SPH code: One timestep integration
SPH speedup with MPRACE
0.001
0.01
0.1
1
10
100
1 4 16 64 256 1024
∆T (one shared timestep) [sec]
N [in K]
SPH on CPUSPH on MPRACE-1
ratio: CPU/MPRACE-1
0.01
0.1
1
1 4 16 64 256 1024
Time fraction
N [in K]
NBGRAV
SPH on CPU
0.01
0.1
1
1 4 16 64 256 1024
Time fraction
N [in K]
NBGRAV
SPH on MPRACE-1
2007…GeForce 8800 GTX, 128 Stream Proc., 768 MBGeForce 8800 GTS, 128 Stream Proc., 512 MBGeForce 8800 GT, 112 Stream Proc., 512 MB
2008…GeForce 9800 GTX, 128 Stream Proc., 512 MBGeForce 9800 GX2, 256 Stream Proc., 1 GBGeForce 9800 GT, 64 Stream Proc., 512 MB
Hardware
CPU vs. GPU speedup timeline
Hardware
Hardware
GeForce 8800 GTX:
575 MHz * 128 processors * 2 flop/inst * 2 inst/clock = 333 Gflops
Hardware
GeForce 8800 GTX:
575 MHz * 128 processors * 2 flop/inst * 2 inst/clock = 333 Gflops
CUDA
Simple CUDA example
it
i
i
i
v
rr
r
j
jm
i
j
N
N
Basic idea of any N-body code
ijij
jij r
r
mGf
rr
)(
2/322 ε+⋅
−=
~N ~N^2
∑≠=
=N
ijjiji fa
;1
rr
Basic idea of GRAPE/GPU N-body code
GPU
Basic idea of any parallel N-body code
i
j
procloc N
NN = particlei −
Basic idea of any parallel N-body code
i
j
procloc N
NN =
particlej −
Basic idea of any parallel N-body code
i
j
procloc N
NN =
particleji −,
Some communication scheme...
GPU N-body speedup timeline
2007/032007/02 2007/06 2007/11
All on a same GPU: 8800 GTX (G80)
Hamada et al. 2008: Direct GPU code
GPU N-body gravity
GRAPE6a
Results for 8800 GTS 512MB (G92)
Hierarchical Individual Block Time Steps
Our own GRAPE/GPU N-body code
4th order Hermite scheme
ii a
dt
rd rr
=2
2
ftp://ftp.ari.uniftp://ftp.ari.uni -- heidelberg.de/pub/staff/berczik/phiheidelberg.de/pub/staff/berczik/phi -- GRAPEGRAPE//
Harfst et al, NewA, 12 , 357 (2007) [astro-ph/0608125]
1e-008
1e-007
1 2 4 8 16 32 64 128
∆E/E
Particle number: N [k]
Plummer: G=M=1; ETOT=-1/4; ε=4/N; η=0.01
CPUSSE
SSE-2coreSSE-4core
GPUG6a
0.1
1
10
100
1000
1 2 4 8 16 32 64 128
Speed: GFlops
Particle number: N [k]
Plummer: G=M=1; ETOT=-1/4; ε=4/N; η=0.01
CPUSSE
SSE-2coreSSE-4core
GPUG6a
GPU results
Nitadori, Berczik et al. 2007.11
10-9
10-8
10-7
10-6
10-5
10-4
10-3
0.01 0.1 1 10 100
|(φ-φcpu)/φcpu|
SSE CPUGRAPE6aMPRACE1
GPU
0.01 0.1 1 10 100
r [NB]; Plummer, N=64k, G=M=1, Etot=-1/4
|(a-acpu)/acpu|
SSE CPUGRAPE6aMPRACE1
GPU
0.01 0.1 1 10 100
|(j-jcpu)/jcpu|
GPU results
Nitadori, Berczik et al. 2007.11
GPU 4th vs. 6th order results:
GPU 4th vs. 6th order results:
Jun Makino: TREE+GRAPE/GPU code
Makino, PASJ, 43 , 621 (1991)
Inter. list on host ~NInter. list length -> short…
Makino, PASJ, 56 , 521 (2004)Fukushige, Makino & Kawai, PASJ, 57 , 1009 (2005)One interaction list is shared among NGR particles!Inter. list on host ~N/NGRInter. list length -> larger…
NGR
Parallel TREE GPU gravity
Hamada et al. 2008: TREE+GRAPE/GPU code
Parallel TREE GPU gravity
Parallel TREE GPU gravity
Simple GPU SPH code
i
j
N
BN
0.01
0.1
1
1 4 16 64 256 1024
Time fraction
N [in K]
NBGRAV
SPH on CPU
0.01
0.1
1
1 4 16 64 256 1024
Time fraction
N [in K]
NBGRAV
SPH on GPU
0.001
0.01
0.1
1
10
100
1 4 16 64 256 1024
∆T (one shared timestep) [sec]
N [in K]
SPH on CPUSPH on MPRACE-1
SPH on GPUratio: CPU/MPRACE-1
ratio: CPU/GPU
SPH speedup with GPU
SPH astrophysical results:
TREE-GRAPE + MPRACE (on 4 nodes)
M = 2000 M o R = 3 pc fully -> H 2
Isothermal evolution.
Initial density distr. ~1/r
T = 20 K (c_sound = 0.3 km/sec)
V_merge = 5 km/sec
Calculation time 3*t_ff = 6 Myr
Resolution is h_min = 1e-4 pc
SPH MPRACE/CPU speedup ~10
Total GRAPE+MPRACE/CPU speedup ~15
N = 2x4k DT_CPU = 52 min2x8k 1.74 hours2x16k 3.5 hours2x32k 6.9 hours2x64k 14 hours
* 2x128k 28 hours2x256k 55 hours2x512k 111 hours
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6
Y, p
c
X, pc
SPH astrophysical results:
SPH astrophysical results:
SPH astrophysical results:
SPH astrophysical results: