Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Lawrence Livermore National Laboratory
Thomas SpelceDevelopment Environment Group
LLNL-PRES-411030
Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551This work performed under the auspices of the U.S. Department of Energy by
Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344
Sequoia and the Petascale Era
SCICOMP 15May 20, 2009
2
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
The Advanced Simulation and Computing (ASC) Programdelivers high confidence prediction of weapons behavior
Integrated Codes
Physics and Engineering Models
Verification andValidation
Codes to predictsafety and reliability
Models andunderstanding NNSA Science Campaigns
Experiments Legacy UGTs
Experiments providecritical validation data
ASC integrates all of the science and engineering that makes stewardship successful
3
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
ASC pursued three classes of systems to cost effectivelymeet current (and anticipate future) compute requirements
Capability systems ==> the mostchallenging integrated design calculations• More costly but proven• Production workload
Capacity systems ==> day to day work• Less costly, somewhat less reliable• Throughput for less demanding
problems
Advanced Architectures ==>performance, power consumption, etc.• Targeted but demanding workload• Tomorrow’s mainstream solutions?
The “three curves” (Capability, Capacity and Advanced Architectures) approach hasbeen successful in delivering good cost performance across the spectrum of need…
Performance
Time
FY01 FY05
Purple
MCRWhite
Q
Peloton
TLCC (Juno)
BlueGene/L
Roadrunner
Red
Blue
Sequoia
Low-cost capacity
Original concept:develop capability
Mainframes (RIP)
Thunder
Higher performance andlower power consumption
4
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Sequoia represents largest increase in computationalpower ever delivered for NNSA Stockpile Stewardship
1/06 7/06 12/06
1/10 7/10 12/10
Market Survey
CD0 Approved
CD1 Approved Selection
1/07 7/07 12/07
1/08 7/08 12/08
Contract PackageSequoia Plan Review
Dawn Early Science Transition to Classified Dawn GA
Write RFP
Sequoia Build Decision
Sequoia Parts Commit & Option Sequoia Parts Build
Sequoia Early Science Transition to Classified Sequoia Operational ReadinessCD4 Approved
Sequoia Five Years Planned Lifetime Through CY17
Sequoia contract award
Phased System Deliveries
Sequoia final system acceptance
1/12 7/12 12/12
1/11 7/11 12/11
Sequoia Demo
Dawn Phase 1 Dawn Phase 2
Dawn system acceptance
Vendor Response
CD2/3 ApprovedDawn LA
1/09 12/097/09
5
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
“Dawn speeds a man on his journey, and speeds him too in his work” ...Hesiod (~700 B.C.E)
Dawn Specifications• IBM BG/P architecture• 36,864 compute nodes (500TF)• 147,456 PPC 450 cores• 4GB memory per node (147.5TB)• 128-to-1 compute to I/O node ratio• 288 10GE links to file system
Dawn Installation• Feb 27th - final rack delivery• March 5th - 36 Rack integration complete• March 15-24th – Synthetic WorkLoad start• End of March - Acceptance (planned)
ibm.com/systems/deepcomputing/bluegene/
6
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
14 TF/s4 TB36 KW
Rack
36 racks0.5 PF/s144 TB1.3 MW>8 Day MTBF
System
13.6 GF/s4.0 GB DDR213.6 GB/s Memory BW0.75 GB/s 3D Torus BW
Compute Card
850 MHz PPC 4504 cores/4 threads13.6 GF/s Peak8 MB EDRAM
Chip
435 GF/s128 GB
Node Card
DAWN SEQUOIA Initial Delivery
7
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
288 – 10GbE
14 – 1GbE
14 – 1GbE
2 – 1GbE
3 – 1GbE
1 – 10GbE
2 – 1GbE 4 x 4 –1GbE
8 x 4 –10GbE
Dawn Core (9 x 4 BG/P Racks)
144 – 1GbE
Primary Backup 2 – 1GbE
2 – FC4
2 – 10GbE12 –10GbE
2 – FC4 2 – FC4 2 – FC4
2 – 10GbE2 – 10GbELocal Disk
SERVICE SERVICE HMCSERVICELOGIN
HTC
LLNL
DAWN Initial Delivery Infrastructure
E-netCore
8
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Sequoia Target Architecture and Infrastructure
Production Operation FY12-FY17• 20PF/s, 1.6 PB Memory• 96 racks, 98,304 nodes• 1.6 M cores (1 GB/core)• 50 PB Lustre file system• 6.0 MW power (160 times
more efficient than Purple)
Will be used as a 2D ultra-resand 3D high-res UncertaintyQuantification (UQ) engine
Will be used for 3D sciencecapability runs exploring keymaterials science problems
9
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
High performance material science simulations willcontribute directly to ASC programmatic success
Six physics/materials science applications targetedfor early implementation on Sequoia infrastructure• Qbox – Quantum molecular dynamics for
determination of material equation of state• DDCMD – Molecular dynamics for material
dynamics• Miranda – 3D Continuum fluid dynamics for
interfacial mixing• ALE3D – 3D Continuum mechanics for ignition
and detonation propagation of explosives• LAMPPS – Molecular dynamics for shock
initiation in high explosives• ParaDiS – Dislocation dynamics for high
pressure strength in materials
10
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Single Sequoia Platform Mandatory Requirement is P ≥ 20
P is “peak” of the machine measured in petaFLOP/s Target requirement is P + S ≥ 40
• S is weighted average of five “marquee” benchmark codes• Four code package benchmarks
− UMT, IRS, AMG, and SPhot− Program goal is 24x the Purple capability throughput
• One “science workload” benchmark from SNL− LAMMPS (molecular dynamics)− Program goal is 20x-50x BGL for science capability
Purple - 100TF/sPurple - 100TF/s BlueGene /L – 367TF/sBlueGene /L – 367TF/s
11
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Sequoia Operating System Perspective
1-N CN… Light weight kernel on compute nodes Optimized for scalability and reliability
As simple as possible Extremely low OS noise Direct access to interconnect hardware
OS features Linux/Unix syscall compatible w/ I/O syscalls Support for dynamic lib runtime loading Shared memory regions
Open source
Linux/Unix OS on I/O nodes Leverage large Linux/Unix base & community
Enhance TCP offload, PCIe, I/O Standard File Systems - Lustre, NFSv4, etc. Aggregates N CN for I/O & admin Open source
Compute Nodes
Sequoia ION and InterconnectLinux/Unix
FSD Perf tools totalview
Lustre Client NFSv4
SLURMD
MPI
Application
GLIBC
Sequoia CN and Interconnect
NPTL Posix threadsglibc dynamic loading
ADI
hardware transport
RASFutexsyscallsShared
Memory
MPI
Application
GLIBC
Sequoia CN and Interconnect
NPTL Posix threadsglibc dynamic loading
ADI
hardware transport
RASFutexsyscallsShared
Memory
MPI
Application
GLIBC
Sequoia CN and Interconnect
NPTL Posix threadsglibc dynamic loading
ADI
hardware transport
RASFutexsyscallsShared
Memory
MPI
Application
GLIBC
Sequoia CN and Interconnect
Posix threads, OpenMP, SE/TMglibc dynamic loading
ADI
hardware transport
RASFunction Shipped
syscalls SMP
UDP TCP/IPLNet
Function Shipped
syscalls
12
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Sequoia Software Stack – Applications Perspective
Code Development Tools
C/C++/FortranCompilers, Python
LWK
, Lin
ux/U
nix
Opt
imiz
ed M
ath
Libs APPLICATION
IP
UDPTCP
SOCKETSLustre Client
Clib/F03 runtime
MPI2
Interconnect Interface
User Space Kernel Space
ADI
Parallel Math Libs
External Network
LNet
OpenMP, Threads, SE/TM
Function Shippedsyscalls
SLUR
M/M
oab
RAS,
Con
trol S
yste
mCo
de D
ev T
ools
Infra
stru
ctur
e
13
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
The tools that users know and love will be available onSequoia with improvements and additions as needed
InfrastructureDebugging Performance Features
Ope
ratio
nal S
cale
DyninstPAPIStack
Walker
OpenMP ProfilingInterface
MRNet
PMPI
APAIDPCL
Valgrind
OTF
SE/TMMonitor
LaunchMONSTAT TV memlight
memP
TotalView
ThreadCheck
MemCheck
SE/TMDebugger
New LightweightFocus Tools
mpiP
TAUO|SS
OpenMPAnalyzer
gprof
SE/TMAnalyzer
105 -
106 -
107 -
104 -
1 -
Existing
New
14
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Application programming requirements and challenges
Availability of 1.6M cores pushes all-MPI codes to extreme concurrency
Availability of many threads on manySMP cores encourages low-levelparallelism for higher performance
Mixed MPI/SMP programmingenvironment and possibility ofheterogeneous compute distributionbrings load imbalance to the fore
I/O and visualization requirementsencourage innovative strategies tominimize memory and bandwidthbottlenecks
MPIScalingMPIScaling
SMPThreadsSMPThreads
I/O &VisualizationI/O &Visualization
HybridModelsHybridModels
15
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
MP
I_FI
NA
LIZE
The RFP asked interested vendors to addressa “Unified Nested Node Concurrency” model
MPI Tasks on a node are processes (one is shown) with multiple OS threads(Thread0-3 shown)
Thread0 is “Main thread” & Thread1-3 are helper threads that morph from Pthreadto OpenMP worker to TM/SE compiler generated threads via runtime support
Hardware support to significantly reduce overheads for thread repurposing andOpenMP loops and locks
MA
IN
Thread0Thread1Thread2Thread3
MP
I_IN
IT
Func
t1
MP
I Cal
l1-3
MP
I Cal
l
Func
t2
MP
I Cal
l
MP
I Cal
l
TM/S
E
TM/S
E
Ope
nMP
1-3
Func
t1
MP
I Cal
l
1-3
MP
I Cal
l
MA
IN
Exi
t
Ope
nMP
Ope
nMP
Ope
nMP
1) Pthreads born with MAIN2) Only Thread0 calls functions to nest parallelism3) Pthreads based MAIN calls OpenMP based Funct14) OpenMP Funct1 calls TM/SE based Funct25) Funct2 returns to OpenMP based Funct16) Funct1 returns to Pthreads based MAIN
MP
I Cal
l
WWW
WWW
1-3 1-3 1-3 1-3
16
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Previous systems have prepared the way for Sequoia
BG/L experience informs Dawn/Sequoia scalability OpenMP & Posix threads experience on Linux/AIX Integrated codes regularly run at Purple capability Dawn will be used for code development
• SMP parallelism• Python• Larger memory per core than BG/L• Some critical UQ analysis as well
Sequoia will be a Tri-Lab ASC resource• Video conferences for coordination
DAWN Initial Delivery
17
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
A diverse team and a new Scalable ApplicationPreparation Project ensure success on Sequoia
LC Hotline, User Training and Documentationaddress routine issues
ADEPT team provides expertise in compilers,debuggers, performance tools
Access to IBM experts, including an on-site IBMapplications analyst
Staff to work closely with the application teams Ongoing ANL/IBM/LLNL BlueGene collaboration Engaging third-party vendors, university research
partners, and the open source community
18
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
New Petascale Computing Enabling Technologies (PCET)LDRD is addressing key barriers to predictive simulation
Debugging103 Cores Load Balance
104 Cores Fault Tolerance
105 Cores Multicore106 Cores
Vector FP Units/Accelerators?
107 Cores Power?108 Cores
Purple
BG/L
PetascaleExascale
PCET creates essential capabilities for exascale core counts
19
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
PCET strategy mitigates risk to assure immediateimpact on application drivers and longer term success
ShorterTerm Payoff
Load balanceanalysis
Cache obliviousdata layouts Checkpoint
compression
Behavioraland performance
equivalenceclasses
Petascale capable& Exascale prepared
Multicore-aware algorithmsApplication-level fault toleranceWell-balanced application load
Automated error analysis
Current capabilitiesMPI large grain parallelismBasic checkpoint/restart
Ill-defined load imbalancesDebugging < 4096 cores
Terascale capabilitiesMulticore-adapted algorithms
Faster checkpoint/restartUnderstood load imbalances
Targeted debugging
20
Lawrence Livermore National Laboratory10th International LCI ConferenceLLNL-PRES-411030
Take-away: Computational science on Sequoia at full-scale will be culmination of many years of hard work
Innovative orevolutionary
architecture ideas
R&Dcontracts
Flexible contracts with targets as requirements
Milestoneprogress
Initial delivery& integration
Computationalscience R&D
Periodicreviews
Rigorousreview
We’rehere withDawn ID