Upload
dinhtruc
View
217
Download
0
Embed Size (px)
Citation preview
www.idris.fr Institut du développement et des ressources en informatique scientifique
IDRIS Site Update Pascal Voury, User Support Team
Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
2 Institut du développement et des ressources en informatique scientifique
IDRIS Location
SPXXL / ScicomP – Lugano, May 2013
3 Institut du développement et des ressources en informatique scientifique
History of Supercomputing at IDRIS
SPXXL / ScicomP – Lugano, May 2013
4 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
History of Supercomputing at IDRIS
• Parallel scalar systems
• Cray T3D (1995) • Cray T3E (1996) • IBM SP3 (2001) • IBM P4 (2002)
• IBM P4+ (2003) • IBM BG/P (2008)
• IBM P6 (2009)
IBM BG/Q IBM x3750
Vector systems
Cray C98 (1993) Cray C94 (1994)
Fujitsu VPP300 (1997) NEC SX-5 (2000) NEC SX-8 (2006)
RIP Feb 2012
5 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
6 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
7 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
8 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
9 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
10 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
0,1
1
10
100
1000
10000
100000
1000000
10000000
100000000 Pe
rfor
man
ce (G
flops
)
Evolution des performances
Numéro 1
Numéro 500
IDRIS - somme
Earth Simulator
Slide provided by M.A. Foujols, IPSL, CNRS
11 Institut du développement et des ressources en informatique scientifique
SPXXL / ScicomP – Lugano, May 2013
12 Institut du développement et des ressources en informatique scientifique
Turing, IBM BG/Q
SPXXL / ScicomP – Lugano, May 2013
• 4 racks • 64 nodes per I/O node, • Everything is then proportional : 65TB memory, 65536 cores, 836 Tflops
etc.
• 2.2 TB disc shared with the x3750 (BW theoretically overloaded by 25 I/O nodes : 50GB/s) in 5 DDN SFA10K cabinets.
13 Institut du développement et des ressources en informatique scientifique
Ada, IBM x3750-M4
SPXXL / ScicomP – Lugano, May 2013
• 332 nodes with 4 Sandy Bridge E5-4650 8 cores @ 2.7 GHz • 28 nodes have 256GB memory, all the others have 128GB • Roughly 49 TB memory and 233Tflops • 2 nodes for interactive login, with discs • Plus 4 nodes x3870-M5 Westmere @ 2.67 GHz with 1 TB memory each
(and discs) for pre- and post-processing • GPFS 3.5, LoadLeveler 5.1, poe 1.2.12 • Mellanox InfiniBand FDR10 with a 648 ports switch and a second level of
36 ports switches (each node has 2 links to 2 switches)
• New for us : diskless nodes, optimization of the memory requirement for the OS image. Same HW, different SW stack. And another different set for post-processing.
14 Institut du développement et des ressources en informatique scientifique
Infrastructure
SPXXL / ScicomP – Lugano, May 2013
15 Institut du développement et des ressources en informatique scientifique
The good, the bad, the ugly
SPXXL / ScicomP – Lugano, May 2013
• From a User Support point of view : • The good : BG/Q pretty stable, as BG/P was. Surprised to learn IBM would
not correct the bugs in the software stack provided. • The « not so good » for BG/Q: hardware problems not detected by IBM
(NaNs, QCD code); change of I/O performance strategy vs. BG/P. • The bad for x3750 :
− RDMA engine halting the whole configuration : seems solved by Mellanox expertise for 2 weeks.
− Latency of support for Intel problems. Lack of experience on our side (Power for 12 years!). For example : how to limit the RSS memory taken by an OpenMP job ?
• The ugly for x3750: poe environment mandatory for performance, on our Intel platform. Could be OK, but we still have bugs with poe that we don’t have using Intel MPI (Buring Issue).
16 Institut du développement et des ressources en informatique scientifique
E5 4650 internal architecture :
SPXXL / ScicomP – Lugano, May 2013
17 Institut du développement et des ressources en informatique scientifique
Political Developments
SPXXL / ScicomP – Lugano, May 2013
• IDRIS is not buying its own computers for the CNRS any more : GENCI does.
GENCI means Grand Equipement National pour le Calcul Intensif. Owned for 49 % by the French State represented by the Ministry for Higher Education and Research, for 20 % by CEA, 20 % by CNRS, 10 % by the Universities and 1% by INRIA. Created in 2007, GENCI provides funding and assumes ownership. Also promotes the organization of an European HPC area and participates to its achievements; GENCI is the french representative in PRACE.
• IBM did not promote its Power architecture, • No clear visibility yet on BG’s future, • Why would GENCI still buy an IBM for an Intel based computer ?
18 Institut du développement et des ressources en informatique scientifique
Future Technical Developments
SPXXL / ScicomP – Lugano, May 2013
• Archive : robot is fine. Do we need a new design for our system? − HSM on one of the supercomputers − « Classical » design, because of limited financial enveloppe, with a disc
cache as big as possible; SSD discarded because of the price • Currently used as backup for result files in a batch; should evolve to a pure
archive system ? − 2.2 Po disc WORKDIR on the computers − Increase capacity, even at the cost of increased latency.
19 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Ada
SPXXL / ScicomP – Lugano, May 2013
• DRAKKAR, Climatology : NEMO ocean model, 5 Mh (4000 cores)
ORCA 12 domain with it 3600 subdomains
20 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Ada
SPXXL / ScicomP – Lugano, May 2013
• DEUS-PUR, Astrophysics : 4 Mh (64 to 9000 cores)
21 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Ada
SPXXL / ScicomP – Lugano, May 2013
• SELTRAN, Molecular Dynamics : custom GROMACS, 1.5 Mh (64 cores) • LIQSIM, ab-initio Molecular Dynamics : CP2K, 0.3 Mh (512 cores)
Non-aqueous ionic solution Gromacs patched with PLUMED plug-in performance
22 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• PrecLQCD, lattice QCD : 31.5 Mh (4 racks) • StabMat, QCD & QED, proton weight : 30 Mh (2 racks to 4 racks, Juelich) • BigDFT, ab-initio ion Li batteries: 16.6 Mh
23 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• MesoNH, Tiwi Island « Hector » storm : 13 Mh (1 rack, Global Array I/Os)
24 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• ZoomBHA, astrophysics (Black Hole Accretion): 12 Mh
25 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• GYSELA, Tokamak plasmas : 11 Mh (100 000 threads, 4 racks) Fluctuations of electrostatic potential when turbulence starts in plasma
26 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• MHDTURB, RAMSES magnetohydrodynamics code : 11Mh (2 racks)
27 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• APAFA, AVBP LES combustion : 10 Mh (2 racks, I/O perf. problems)
28 Institut du développement et des ressources en informatique scientifique
Grand Challenges on Turing
SPXXL / ScicomP – Lugano, May 2013
• ECOPREMS, LES combustion with stratification : 9.3 Mh
400 million tetrahedras
50 million tetrahedras
29 Institut du développement et des ressources en informatique scientifique
Our typical Workload
SPXXL / ScicomP – Lugano, May 2013
2012 figures for the decomissioned computers, very few changes (except political choices) • X3750 : 246 projects, 1100 individual users, 60 Mh allocated for 2013.
26 Million hours on Power6 in 2012
30 Institut du développement et des ressources en informatique scientifique
Our typical Workload
SPXXL / ScicomP – Lugano, May 2013
• BG/Q : 90 projects, 400 users, 297 Mh allocated for 2013.
245 Million hours on BG/P in 2012