Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ORNL is managed by UT-Battelle
for the US Department of Energy
Accelerating
Research and
Development
Using the Titan
Supercomputer
Fernanda Foertter
HPC User Support, ORNL
2
What is the Leadership Computing
Facility (LCF)?
• Collaborative DOE Office of Science program at ORNL and ANL
• Mission: Provide the computational and data resources required to solve the most challenging problems.
• 2-centers/2-architectures to address diverse and growing computational needs of the scientific community
• Highly competitive user allocation programs (INCITE, ALCC).
• Projects receive 10x to 100x more resource than at other generally available centers.
• LCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts).
3
The OLCF has delivered five systems and
six upgrades to our users since 2004
• Increased our system capability by 10,000x
• Strong partnerships with computer designers and architects
• Worked with users to scale codes by 10,000x
• Science delivered through strong user partnerships to scale codes and algorithms
Jaguar XT3 • Dual core upgrade Phoenix X1
• Doubled size
• X1e
Jaguar XT4 • Quad core upgrade
Jaguar XT5 • 6 core upgrade
Titan XK7 • GPU upgrade
2004
2005 2007
2008
2012
4
Science breakthroughs at the OLCF:
Biomass as a viable, sustainable feedstock for hydrogen
production for fuel cells, Nano Letters (2011)
J. Phys. Chem. Lett. (2010)
71 & 74 citations, respectively
World’s first continuous simulation of
21,000 years of Earth’s climate
history, Science (2009)
116 citations
Largest simulation of a galaxy’s worth of
dark matter, showed for the first time the
fractal-like appearance of dark matter
substructures, Nature (2008)
326 citations, 3/2014
MD simulations show
selectivity filter of a trans-
membrane ion channel is
sterically locked open by
hidden water molecules,
Nature (2013)
Global Warming preceded by increasing
carbon dioxide concentrations during the
last deglaciation, Nature (2012).
64 citations, 3/2014
Researchers solved the 2D Hubbard model
and presented evidence that it predicts
HTSC behavior, Phys. Rev. Lett (2005)
105 citations, 3/2014
First-Principles Flame Simulation Provides Crucial
Information to Guide Design of Fuel-Efficient Clean
Engines, Proc. Combust. Insti. (2007)
78 citations, 3/2014
Calculation of the number of bound
nuclei in nature, Nature (2012), 36
citations, 3/2014 , 36 citations, 3/2014
SELECTED science and engineering advances over the period 2003 - 2013
Astrophysicists discover
supernova shock-wave instability,
Astrophys. J. (2003)
254 citations, 3/2014
Demonstrated that three-body forces
are necessary to describe the
long lifetime of 14C
Phys. Rev. Lett. (2011)
28 citations, 3/2014
2007 2008 2009 2010 2011 2013 2012 2004 2005 2006 2014 2003
5
No more free lunch
Herb Sutter: Dr. Dobb’s Journal:
http://www.gotw.ca/publications/concurrency-ddj.htm
6
Power consumption of 2.3 PF (Peak) Jaguar: 7 megawatts, equivalent to that of a small city (5,000 homes)
Power is THE problem
7
Using traditional CPUs
is not economically feasible
20 PF+ system: 30 megawatts (30,000 homes)
8
Why GPUs? Hierarchical Parallelism
High performance and power efficiency
on path to exascale
• Hierarchical parallelism improves scalability of applications
• Expose more parallelism through code refactoring and source code directives
– Doubles performance of many codes
• Heterogeneous multicore processor architecture: Using right type of processor for each task
• Data locality: Keep data near processing
– GPU has high bandwidth to local memory for rapid access
– GPU has large internal cache
• Explicit data management: Explicitly manage data movement between CPU and GPU memories
CPU GPU Accelerator
• Optimized for sequential multitasking • Optimized for many
simultaneous tasks
• 10 performance per socket
• 5 more energy-efficient systems
9
#2
8.2 Megawatts
27 Pflops (Peak)
17.59 PFlops
(Linpack)
10
11
Roadmap to Exascale
Titan 27 PF 600 TB DRAM Hybrid GPU/CPU
2012
2017
2022
OLCF-5: 1 EF 20 MW
OLCF-4: 100-250 PF 4000 TB memory > 20MW 6 day resilience
Our Science requires that we advance computational capability 1000x
over the next decade.
What are the Challenges?
12
Requirements gathering
DOE/SC and LCFs support a
diverse user community
• Science benefits and impact of future
systems are examined on an ongoing
basis
• LCF staff have been actively engaged in community assessments of future computational needs and solutions
• Computational science roadmaps are developed in collaboration with leading domain scientists
• Detailed performance analyses are conducted for applications to understand future architectural bottlenecks
• Analysis of INCITE, ALCC, Early Science, and Center for Accelerated Application Readiness (CAAR) projects history and trends
Science Category Represented Research
Areas
Biology
Bioinformatics
Biophysics
Life Sciences
Medical Science
Neuroscience
Proteomics Systems Biology
Chemistry Chemistry Physical Chemistry
Computer Science Computer Science
Earth Science Climate Geosciences
Engineering
Aerodynamics
Bioenergy
Combustion
Turbulence
Fusion Fusion Energy Plasma Physics
Materials
Materials Science
Nanoelectronics
Nanomechanics
Nanophotonics Nanoscience
Nuclear Energy Nuclear Fission Nuclear Fuel Cycle
Physics
Accelerator Physics
Astrophysics
Atomic/Molecular Physics
Condensed Matter Physics
High Energy Physics
Lattice Gauge Theory
Nuclear Physics Solar/Space Physics
13
Requirements Process
• https://www.olcf.ornl.gov/wp-content/uploads/2013/01/OLCF_Requirements_TM_2013_Final.pdf
• Surveys are a “lagging indicator” that tend to tell us what problems the users are seeing now, not what they expect to see in the future
14
OLCF User Requirements Survey – Key Findings
• Memory bandwidth was reported as the greatest need
• Local memory capacity was not a driver for most users, perhaps in recognition of cost trends
• 76% of users said there is still a moderate to large amount of parallelism to extract in their code, but…
• 85% of users rated difficulty level of extracting that parallelism as moderate to difficult - often requires application refactoring
– Highlights training needs and community based efforts for application readiness
Hardware feature Ranking
Memory Bandwidth 4.4
Flops 4.0
Interconnect Bandwidth 3.9
Archival Storage
Capacity
3.8
Interconnect Latency 3.7
Disk Bandwidth 3.7
WAN Network
Bandwidth
3.7
Memory Latency 3.5
Local Storage Capacity 3.5
Memory Capacity 3.2
Mean Time to Interrupt 3.0
Disk Latency 2.9
Rankings from OLCF users
1=not important, 5=very important
15
Center for Accelerated Application
Readiness (CAAR)
• We created CAAR as part of the Titan project to help prepare applications for accelerated architectures
• Goals:
– Work with code teams to develop and implement strategies for exposing hierarchical parallelism for our users applications
– Maintain code portability across modern architectures
– Learn from and share our results
• We selected six applications from across different science domains and algorithmic motifs
16
CAAR Plan
• Comprehensive team assigned to each app – OLCF application lead
– Cray engineer
– NVIDIA developer
– Other: other application developers, local tool/library developers, computational scientists
• Single early-science problem targeted for each app – Success on this problem is ultimate metric for success
• Particular plan-of-attack different for each app – WL-LSMS – dependent on accelerated ZGEMM
– CAM-SE– pervasive and widespread custom acceleration required
• Multiple acceleration methods explored – WL-LSMS – CULA, MAGMA, custom ZGEMM
– CAM-SE– CUDA, directives
– Two-fold aim
– Maximum acceleration for model problem
– Determination of optimal, reproducible acceleration path for other applications
17
WL-LSMS Illuminating the role of material disorder, statistics, and fluctuations in nanoscale materials and systems.
S3D Understanding turbulent combustion through direct numerical simulation with complex chemistry. .
NRDF Radiation transport – important in astrophysics, laser fusion, combustion, atmospheric dynamics, and medical imaging – computed on AMR grids.
CAM-SE Answering questions about specific climate change adaptation and mitigation scenarios; realistically represent features like precipitation patterns / statistics and tropical storms.
Denovo Discrete ordinates radiation transport calculations that can be used in a variety of nuclear energy and technology applications.
LAMMPS A molecular dynamics simulation of organic polymers for applications in organic photovoltaic heterojunctions , de-wetting phenomena and biosensor applications
Early Science Challenges for Titan
18
Effectiveness of GPU Acceleration?
OLCF-3 Early Science Codes -- Performance on Titan XK7
Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU)
Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific problem size chosen
Application Cray XK7 vs. Cray XE6
Performance Ratio*
LAMMPS* Molecular dynamics
7.4
S3D Turbulent combustion
2.2
Denovo 3D neutron transport for nuclear reactors
3.8
WL-LSMS Statistical mechanics of magnetic materials
3.8
19
Additional Applications from Community Efforts
Current Performance Measurements on Titan
Application Cray XK7 vs. Cray XE6 Performance Ratio*
AWP-ODC Seismology
2.1
DCA++ Condensed Matter Physics
4.4
QMCPACK Electronic structure
2.0
RMG (DFT – real-space, multigrid) Electronic Structure
2.0
XGC1 Plasma Physics for Fusion Energy R&D
1.8
Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU)
Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific problem size chosen
20
All Codes Will Need Rework To Scale!
• Up to 1-2 person-years required to port each code from Jaguar to Titan
– Takes work, but an unavoidable step required for exascale regardless of the type of processors. It comes from the required level of parallelism on the node
– Also pays off for other systems—the ported codes often run significantly faster CPU-only (Denovo 2X, CAM-SE >1.7X)
• We estimate possibly 70-80% of developer time is spent in code restructuring, regardless of whether using OpenMP / CUDA / OpenCL / OpenACC / …
• Each code team must make its own choice of using OpenMP vs. CUDA vs. OpenCL vs. OpenACC, based on the specific case—may be different conclusion for each code
• Our users and their sponsors must plan for this expense.
21
High-impact science across a broad range
of disciplines (2013)
Superconductivity
“Doping dependence of spin
excitations and correlations
with high-temperature super-
conductivity in iron pnictides,“
Meng Wang(IOP CAS Beijing),
Nature Communications.
December (2013)
Paleoclimate Science
“Northern Hemisphere forcing
of Southern Hemisphere
climate during the last
deglaciation,”
Feng He (UW Madison), et
al., Nature, February (2013)
Molecular Biology
MD simulations show selectivity filter
of a trans-membrane ion channel is
sterically locked open by hidden water
Jared Ostmeyer, et al. (U.
Chicago) Nature, Sept. (2013)
Polymer Science
“Self-Organized and Cu-
Coordinated Surface Linear
Polymerization”
Qing Li, B. Sumpter (ORNL),
Nature Scientific Reports.
July (2013)
Molecular Biology
“A phenylalanine rotameric
switch for signal-state control in
bacterial chemoreceptors”
D. Ortega (UTK),
Nature Communications
December (2013)
Complex Oxide Materials
“Atomically resolved
spectroscopic studyof Sr2IrO4:
Experiment and theory,” Qing
Li (ORNL), E.G. Eguiluz (UTK)
Nature Scientific Reports.
October (2013)
22
Increasing Usage of GPUs
As measured by ALTD against linked libraries
INCITE 2013
INCITE 2014
23
Advancing Department of Energy’s
Strategic Plan
Goal 2: Maintain a vibrant U.S. effort
in science and engineering as a
cornerstone of our economic
prosperity, with clear leadership in
strategic areas.
Priority :
• Lead Computational Sciences and
High-Performance Computing
Targeted Outcome:
• Continue to develop and deploy
high-performance computing
hardware and software systems
through exascale platforms.
24
Advancing Department of Energy’s
Strategic Plan Goal 1: Catalyze the timely,
material, and efficient
transformation of the nation’s
energy system and secure U.S.
leadership in clean energy
technologies.
Priority:
“We will facilitate the transfer of
our computer simulation
capability to industry with the goal
of accelerating energy technology
innovation by improving designs,
compressing the design cycle and
easing the transitions to scale,
thereby enhancing US economic
competitiveness.”
25
New Industry Projects
3
5 5 6
8
14
12
15
7
0
2
4
6
8
10
12
14
16
18
20
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Number of New Industry Projects Launched at OLCF
Current as of March 2014
*
* Year in progress
26
Growth of Industry Projects
3 5 5
7
12
19 21
29
23
0
5
10
15
20
25
30
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Number of Industry Projects Underway at OLCF
Current as of March 2014
*
* Year in progress
27
Human skin barrier
Global flood maps
Engine cycle-to-cycle variation
Fuel efficient jet engines
Wind turbine resilience
Welding Software
Demonstrated small molecules can have large
and varying impact on skin permeability
depending on their molecular
characteristics—important for
product efficacy and safety
Developed
fluvial and
pluvial high
resolution global
flood maps to
enable
insurance firms
to better price
risk and reduce
loss of life and
property
Developing novel
approach to using
massively parallel,
multiple
simultaneous
combustion cycle
simulations to
address cycle-to-
cycle variations in
spark ignition
engine
Conducting first-of-a-kind high-
fidelity LES computations
of flow in turbomachinery components for
more fuel efficient, next-generation jet
engines
First time simulation of ice formation within million-molecule water droplets is
expanding understanding of freezing at the
molecular level to enhance wind
turbine resilience in cold climates
Evaluating large-scale HPC and
GPU capability of critical welding
simulation software and
further developing & testing weld optimization
algorithm
Innovation through Industrial
Partnerships
28
Aircraft design
Consumer product stability
Gasoline engine injector
Jet engine efficiency
Li-ion batteries
Underhood cooling
Unexpected
discovery of
multiple solutions
for steady RANS
equations with
separated flow
helps explain why
numerical
modeling
sometimes fails
to capture
maximum lift
Developed
method to
measure impact
of additives,
such as dyes
and perfumes,
on properties of
lipid systems
such as fabric
enhancer and
other formulated
products
Optimizing
multihole gasoline
spray injector
nozzle designs for
better in-cylinder
fuel-air mixture
distributions,
greater fuel
efficiency and
reduced physical
prototypes
Accurate predictions
of atomization of liquid fuel
by aerodynamic forces enhance
combustion stability, improve
efficiency, and reduce emissions
New classes of solid inorganic Li-ion electrolytes
could deliver high ionic
and low electronic conductivity and good
electrochemical stability
Developed a new, efficient and automatic
analytical cooling package
optimization process leading to one of a kind
design optimization of
cooling systems
Innovation through Industrial
Partnerships
29
Catalysis Design
innovation Aircraft design
Industrial fire suppression
Turbo machinery efficiency
Long-haul truck fuel efficiency
Demonstrated biomass as a
viable, sustainable feedstock
for hydrogen production for
fuel cells; showed nickel is a
feasible catalytic alternative to platinum
Accelerating design of shock
wave turbo compressors
for carbon capture and
sequestration
Simulated takeoff and landing
scenarios improved
a critical code for estimating characteristics of commercial
aircraft, including lift, drag, and controllability
Developing high-fidelity modeling
capability for fire growth and
suppression; fire losses
account for 30% of U.S. property
loss costs
Simulated unsteady flow in turbo machinery,
opening new opportunities for
design innovation and efficiency improvements.
Simulations reduced by 50%
the time to develop a unique system of add-on
parts that increases fuel
efficiency by 7−12% for long-haul (18-
wheeler) trucks
Innovation through Industrial
Partnerships
30
Education and Training at OLCF
• Tutorials online
• Events open to general public
• Upcoming events
– OpenACC Hackathon
– Data Workshop
• Contact me
31
LCF User Programs
60% INCITE
6 Billions Hours
60 Projects
30% ALCC
10% Director’s Discretionary
32
Getting Started at OLCF:
Project Allocation Requests
https://www.olcf.ornl.gov/support/getting-started/
33
INCITE 2015 Call opens soon
• Planning Request for Information (RFI)
• Call opens April, 2014. Closes June, 2014
• Expect to allocate more than 5 billion core-hours
• Expect 3X oversubscription
• Awards to be announced in November for CY 2015
• Average award to exceed 50 million core-hours
• INCITE Proposal Writing Webinars! Contact information
Julia C. White, INCITE Manager
April 22, 2014 1:30pm EST
May 15, 2014 9:30am EST
34
Conclusions
• DOE will continue to develop and deploy high-performance computing hardware and software systems through exascale platforms.
• Exascale will require investment in preparing applications
• OLCF will continue to support users via training and staffing.
• Road to exascale will make impossible science, possible.
35
Acknowledgements
OLCF Staff: Jack Wells
Visualization: Mike Matheson and Dave Pugmire (ORNL) for visualizations
OLCF user requirements process: Ricky Kendall & Doug Kothe (ORNL)
OLCF-3 Vendor Partners: Cray, AMD, NVIDIA, CAPS, Allinea
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.