Scientific Supercomputing Center Karlsruhe
HP XC4000 at SSCKxc2.rz.unikarlsruhe.de
hwwxc2.hww.de
Organization – Infrastructure – Architecture
Nikolaus GeersRudolf Lohner
Rechenzentrum, SSCKUniversität Karlsruhe (TH)
[email protected][email protected]karlsruhe
Scientific Supercomputing Center Karlsruhe
hkzbw
» High Performance Computing Competence Center of the State of BadenWürttemberg
» Founded in 2003 by the Universities of Karlsruhe and StuttgartUniversity of Heidelberg joined in 2004
» Coordinate HPC competency to build a center that is competitive at an international level
» HPCsystem for nation wide usage HLRSHPCsystem for state wide usage SSCK
» Grid Computing across both sites
» Research activities
– Cooperate with end users in development of new HPC applications• Life sciences• Environment research• Energy research
– Grid computing
Scientific Supercomputing Center Karlsruhe
hww: Cooperation with Industry
» Höchstleistungsrechner für Wissenschaft und Wirtschaft (hww) GmbHHigh Performance Computing for Science and Industry
» Joint operation and management of HPC systems within hww GmbH
– Universities of Stuttgart, Karlsruhe and Heidelberg
– TSystems SfR
– Porsche
» End user support
– academic users HLRS / SSCK
– industry and research labs TSystems
» Through hww the new HP XC system will be available to customers from universities as well as from industry and research labs.
Scientific Supercomputing Center Karlsruhe
High Performance Computing Competence Center (HPTC3)
» Cooperation of SSC Karlsruhe, HP and Intel
– Similar Cooperation is planned with AMD
» Extending XC system and testing of new features
– Integration of XC and Lustre
– Integration of different node types into XC system
– High availability of critical resources
– Monitoring
» Training and Education
– Usage of XC system
– Optimization and tuning of application codes
» Porting and tuning of ISV codes
» Program development tools
Scientific Supercomputing Center Karlsruhe
Development of HPC Systems xc1 and xc2 at SSCK
Q1/04 Q4/04 Q1/05 Q3/06 15.1.2007 Q1/07
Phase 0Landes HLR
Phases 1 and 1aLandes HLR
Start ofInstallation
Phase 2Landes HLR
Nov. 06Dec. 06
Start ofTest Operation
Phase 2Landes HLR
Start ofProd.Operation
Phase 2Landes HLR
xc1HLR of Univ,Shutdown of
IBM SP
xc1 xc2
Scientific Supercomputing Center Karlsruhe
HP XC – Installation Schedule (Phase 2)
Phase 0 (Q1 2004)» 12 2way nodes (Itanium2)
» 4 file server nodes– 2 TB shared storage
» Single rail Quadrics interconnect
Phase 1 (Q4 2004)» 108 2way nodes
– Intel Itanium2
» 8 file server nodes– Approx. 11 TB storage system
» Single rail Quadrics interconnect
Phase 1 (Q1 2005)
» 6 16way nodes– Intel Itanium2 – 2 partitions with 8 CPUs each
» Single rail Quadrics interconnect
Phase 2 (Q3 2006)» 750 4way nodes
– two sockets – dual core AMD 2,6 GHz, 16 GB
» 10 server nodes
» Infiniband DDR Interconnect
» 56 TB storage system
Q1/04 Q4/04 Q1/05 Q4/06
today
» Total of 3.000 processor cores
» Total of 15,6 TFlop/s peak performance
» Total of 12 TB of main memory
~300 proc. | ~2 TFlop/s | ~2 TB mem.Test System
Q1/07
Scientific Supercomputing Center Karlsruhe
Time Schedule of xc2
» September 2006– delivery and assembly of racks
» October 2006– cabling of Admin network and InfiniBand interconnect– Software installation– First internal testing
» November 2006– Further internal testing– Early ‘friendly’ users– Start of acceptance test
» January 2007– End of acceptance test
» January 15, 2007– Start of production service
Scientific Supercomputing Center Karlsruhe
Challenges: Room Layout
Front
Rack
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fron
t
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fro
nt
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Ra
ck
Front
Rack
Front
Rack
Front
Rack
MC
S
Front
Ra
ckM
CS
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE3
IBE5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE
1
SFS
» 20 racks for compute nodes
» 8 racks for network switches
» Maximum cable length for IB DDR is 8 m
Scientific Supercomputing Center Karlsruhe
Challenges: Cabling
Front
Ra
ck
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fro
nt
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fro
nt
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Front
Ra
ck
Front
Ra
ck
Front
Rack
Front
Ra
ck
Front
Rack
MC
S
Front
Rack
MC
S
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE
3IB
E5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE1
SFS
S
S
S
S
S
S
» cable ducts on top of racks for InfiniBand cables
Scientific Supercomputing Center Karlsruhe
Challenges: Cabling
» Cable ducts on top of racks
» Cable ducts under raised floor
Front
Ra
ck
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fro
nt
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Ra
ck
Front
Ra
ck
Front
Ra
ck
Front
Ra
ck
Front
Rack
MC
S
Front
Ra
ckM
CS
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE3
IBE5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE1
SFS
Scientific Supercomputing Center Karlsruhe
Challenge: Cooling
» HP Modular Cooling System added to each rack
Scientific Supercomputing Center Karlsruhe
HP XC4000 at a Glance
xc2.rz.unikarlsruhe.dehwwxc2.hww.de
Scientific Supercomputing Center Karlsruhe
HP XC4000@SSCK: The Key Figures
» 750 fourway compute nodes– 3000 cores
» 2 eightway login nodes
» 10 service nodes
» 10 file server nodes
» InfiniBand DDR interconnect
» 15.6 TFlop/s peak performance
» 12 TB main memory
» 56 TB shared storage
» 110 TB local storage
xc2.rz.unikarlsruhe.de
Scientific Supercomputing Center Karlsruhe
Compute Nodes for MPI Applications
» 750 fourway nodes HP DL 145 G2
– Two dual core CPUs• 2.6 GHz, 5.2 GFlop/s per core• 1 MB L2 cache per core
» 16 GB main memory per node
– 4 GB per core
» 146 GB local disk space
» Fast InfiniBand DDR interconnect
– Latency: ~ 3 µsec– Bandwidth: 1600 MB/s at application (MPI) level
» Parallel MPI applications, up to O(1000) tasks
Scientific Supercomputing Center Karlsruhe
HP DL 145 G2 Block Diagramm
OpteronCPU 1
PC3200DDR1
PC3200DDR1
400 MHz 400 MHzOpteronCPU 2HT link
6,4 GB/s 6,4 GB/s
PCIExpress
HT
link
HT
link
InfiniBand 2.0 GB/s
Peripherals
Scientific Supercomputing Center Karlsruhe
AMD dual core Opteron Processor
Core 1 Core 2
64KBICache
64KBDCache
1 MB L2 cache
System request Queue
64KBICache
64KBDCache
1 MB L2 cache
Crossbar
Integrated memory controller
64 b
it
64 b
it
HT
link
1
HT
link
2
HT
link
3
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR Network
» InfiniBand DDR Network
» Full fat tree structure
» 2 GB/s peak bandwidth (bidirectional)
» 3 µsec latency
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR Network
750 compute nodes
10 service nodes
10 file server nodes 65 leaf switches 3 core switches
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: nonduplex pingpong
0
200
400
600
800
1000
1200
1400
1600
1800
0 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000
Message Length[B]
Ban
dwid
th [M
B/s
]
One MPI process per node Two MPI processes per node Four MPI processes per node
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: nonduplex pingpong
0
200
400
600
800
1000
1200
1400
1600
1800
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Message Length[B]
Ban
dwid
th [M
B/s
]
One MPI process per node Two MPI processes per node Four MPI processes per node
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: duplex pingpong
0
500
1000
1500
2000
2500
3000
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000 10000000
Message Length[B]
Tota
l Ban
dwid
th [M
B/s
]
One MPI process per node Two MPI processes per node Four MPI processes per node
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: duplex pingpong
0
500
1000
1500
2000
2500
3000
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Message Length[B]
Tota
l Ban
dwid
th [M
B/s
]
One MPI process per node Two MPI processes per node Four MPI processes per node
Scientific Supercomputing Center Karlsruhe
Login Nodes
» 2 eightway nodes HP DL 545 G2
– Four dual core CPUs• 2.6 GHz, 5.2 GFlop/s per core• 1 MB L2 cache per core
» 32 GB main memory per node
– 4 GB per core
» 292 GB local disk space
» Interactive access
– File management, job submission– Program development (compilation, short test runs etc.)– Debugging– Pre and Postprocessing
Scientific Supercomputing Center Karlsruhe
Parallel File System HP SFS
» Shared storage for all nodes of XC system
» 10 file server nodes
– 2 MDS / Admin– 2 OSS for $HOME– 6 OSS for $WORK
» 56 TB file space
– 8 TB $HOME– 48 TB $WORK
» Expected bandwidth
– Read / write from one node: 340 MB/s / 340 MB/s– Total read / write bandwidth of $HOME: 600 MB/s / 360 MB/s– Total read / write bandwidth of $WORK: 3600 MB/s / 2200 MB/s
Scientific Supercomputing Center Karlsruhe
Characteristics of File Systems
» $HOME
– shared, i.e. identical view on all nodes– permanent files– regular backup– files space limited by quotas– used for many small files
» $WORK
– shared, i.e. identical view on all nodes– semi permanent files, one week lifetime of files– best used by large files, sequential files access
» $TMP
– local, nodes on different nodes see different $TMP– temporary files, will be discarded at job end– best used by temporary scratch files
Scientific Supercomputing Center Karlsruhe
Software Environment
» HP XC version 3.0 software stack– HP XC Linux for HPC (based on Red Hat Enterprise Linux Advanced
Server Version 3.0– nagios, syslogng, …– SLURM, local addon JMS (job_submit … )– HP MPI– Modules package (module add … )
» HP SFS file system (based on lustre)
» Compilers– gnu, Intel, PGI, PathScale
» Debuggers– gdb, ddt
» Profilers
» Applications
Scientific Supercomputing Center Karlsruhe
XC2 in Comparison with XC1
» Identical software environment
» Different processor architecture
» 4way nodes instead of 2way nodes
» Similar ratio of
– Memory size : floating point performance– Communication bandwidth : floating point performance
» Number of CPUs (cores) increased by factor of 10
– Much larger jobs O(1000) MPI processes• Fine grain parallelization• Finer resolution
– More jobs in parallel
Scientific Supercomputing Center Karlsruhe
XC2 in Comparison with XC1
1,21
1,47
1,07
3,44
1,10
SPARC
PLESOCC
IMDMETRAS
FDEM/LINSOL
Scientific Supercomputing Center Karlsruhe
Early ‘friendly’ users
» You will help us to stabilize and improve the system.
» You may get a lot of CPU cycles for your research work.
» But
– We cannot guarantee the high stability of a production system.– We may have to shut down the system without warning.– Not all software components may work as desired.– Scalability of some tools may be a problem.
» If you can work with these restrictions and want to become an early user of the xc2, please send an email to
[email protected]karlsruhe.de
Scientific Supercomputing Center Karlsruhe
Compilers on XC2
» GNU Compilers and third party compilers
Scientific Supercomputing Center Karlsruhe
Compilers and module command
» module add compilerwhere compiler stands for: gnu/3, gnu/4, intel, pgi or pathscale
» Environment variables modified by this command:– PATH– LD_LIBRARY_PATH– MANPATH– FC, F77, CC F90, CXX– MPI_F77, MPI_F90, MPI_CC MPI_CXX– CFLAGS, FFLAGS– ACMLPATH
– Some compiler specific variables, i.e. LM_LICENSE_FILE etc.
» By default the command module add intelis executed during login.
Scientific Supercomputing Center Karlsruhe
module add intel
» Environment variables modified by this command:– PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added– FC = ifort– F77 = ifort MPI_F77 = ifort– F90 = ifort MPI_F90 = ifort– CC = icc MPI_CC = icc– CXX = icpc MPI_CXX = icpc– CFLAGS = – FFLAGS = – ACMLPATH =
Scientific Supercomputing Center Karlsruhe
module add pgi
» Environment variables modified by this command:– PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added– FC = pgf77– F77 = pgf77 MPI_F77 = pgf77– F90 = pgf90 MPI_F90 = pgf90– CC = pgcc MPI_CC = pgcc– CXX = pgcc MPI_CXX = pgcc– CFLAGS = – FFLAGS = – ACMLPATH =
Scientific Supercomputing Center Karlsruhe
module add pathscale
» Environment variables modified by this command:– PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added– FC = pathf77– F77 = pathf77 MPI_F77 = pathf77– F90 = pathf90 MPI_F90 = pathf90– CC = pathcc MPI_CC = pathcc– CXX = pathcc MPI_CXX = pathcc– CFLAGS = – FFLAGS = – ACMLPATH =
Scientific Supercomputing Center Karlsruhe
module add gnu/3
» Environment variables modified by this command:– PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added– FC = gf77– F77 = gf77 MPI_F77 = gf77– F90 = MPI_F90 = – CC = gcc MPI_CC = gcc– CXX = g++ MPI_CXX = g++– CFLAGS = – FFLAGS = – ACMLPATH =
Scientific Supercomputing Center Karlsruhe
module add gnu/4 or module add gnu
» Environment variables modified by this command:– PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added– FC = gfortran– F77 = gfortran MPI_F77 = gfortran– F90 = gfortran MPI_F90 = gfortran– CC = gcc MPI_CC = gcc– CXX = g++ MPI_CXX = g++– CFLAGS = – FFLAGS = – ACMLPATH =