About the Test of Time Award• The “Test of Time” award recognizes an outstanding paper
from a past SC Conference that has deeply influenced the HPC discipline. It recognizes the historical impact of authors and the clear expression that the paper has changed HPC trends. The award is also an incentive for researchers and students to send their best work to SC and a tool to understand why and how results last in the HPC discipline.
• Eligible papers are those published in the SC Proceedings between 10 and 25 years ago.
Brief History of the Award• The Test of Time Award was established and first
awarded at the SC13 Conference for the conference’s 25th anniversary.
• The first winner was William Pugh, for “The Omega test: a fast and practical integer programming algorithm for dependence analysis,” Proc. SC91.
The ToTA Committee for SC14• Ewing Lusk, co-chair• Katherine Yelick, co-chair• Franck Cappello• Michael Heroux• Jeffrey Hollingsworth• Lennart Johnsson• Ken Miura• Leonid Oliker• Vivak Sarkar• Rob Shreiber• Mateo Valero
The SC14 Test of Time Award Winners
• Bruce Hendrickson and Rob Leland, for “A multilevel algorithm for partitioning graphs,” Proc. SC95.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
Graphs and HPC in the '90s:A Distant Mirror or Merely Distant?
Bruce HendricksonSenior Manager for Extreme-Scale Computing
Sandia National Laboratories, Albuquerque, NM
University of New Mexico, Computer Science Dept.
Robert LelandVice President for Research and CTO
Sandia National Laboratories, Albuquerque, NM
Outline
• Context:– HPC in the early 90s
• Content:– Multilevel graph partitioning
• Thread 1:– Direct Impact
• Thread 2:– Combinatorial Scientific Computing
• Thread 3:– Abstractions
• Conclusions
Merely Distant?
• “Scaled Speedup” concept introduced in ’87– by Gustafson, Montry & Benner at Sandia
• Eugene Brooks’ “Attack of the Killer Micros” talk at SC’90
• Draft MPI standard presented at SC’93
• First Top 500 list in 1993– Of top 10 machines, only one had as many as 1024 processors
• Four were vector machines with fewer than 20 processors
• 1993 Gordon Bell Prize– Solving Boltzmann’s Equation on a 1024 processor CM5– 60 Gflops
A Distant Mirror?
• Parallelism was replacing vectors for HPC, but the details
weren’t at all clear
• Technology and economic drivers were understood, but
multiple visions were competing for the future– Remember Thinking Machines?, NCube?, Kendall Square?
• The community was messily groping towards clarity about
the right fundamental perspectives and questions for
massive parallelism
Parallelism ExposedNew Algorithmic Challenges
• Efficient collective communication operations• Bulk-synchronous processing• Load balancing
• Horst Simon had proposed a graph partitioning model for load balancing for mesh-based computational science problems
– Vertices are computations. Edges encode data dependencies.– Cut few edges to evenly divide the vertices– Model is flawed, but broadly applicable– “All models are wrong, but some are useful.” – George Box
• Building on work with Alex Pothen, Horst championed spectral partitioning – using an eigenvector of a matrix to partition graph
– Intuition from structural analysis
Multilevel Graph Partitioning
• Eigenvectors are expensive to compute, so Horst and Steve
Barnard devised a multigrid-based algorithm– We couldn’t improve on their method
• We hit upon the idea of adapting the multi-level concept to refine
partitions rather than refining numerical values– Used simple graph notions like matching and edge contraction– Discrete algorithm techniques from computer science– Popularized by our Chaco software
Contract Partition Expand & Refine
An Idea Whose Time Had Come
• Graph partitioning is used as an abstraction in multiple domains
– Natural concept in many divide-and-conquer settings
• Researchers in two other communities independently proposed essentially the same algorithm at about the same time.
– Bui and Jones for sparse matrix reordering– Cong and Smith for VLSI placement
Thread 1: Direct Impact
• Excellent cost/performance tradeoff made multilevel
partitioning the algorithm-of-choice. Embraced and
enhanced by many others– Metis (Kumar & Karypis), JOSTLE (Walshaw), SCOTCH
(Pellegrini) and nearly all subsequent tools.
• Better load balancing abstraction proposed by Umit
Çatalyürek and Cevdet Aykanat in late 90’s using a
hypergraph model
• Longevity of utility enabled by a remarkable 20 years of
stability in the way we design and program HPC machines– 6 orders of magnitude improvement in HPC performance.
A Virtuous Circle…
Architectures
ProgrammingModels
Algorithms
Software
Commodity Clusters
ExplicitMessagePassing
Bulk SynchronousParallel
MPI
… is Coming to an End
Moore’s Law continues Transistor count still doubles every 24 months
Dennard scaling stalls – key parameters flatline: Voltage Clock Speed Power Performance/clock
Thread 2:Combinatorial Scientific Computing
• Graph algorithms play important niche roles in many areas
of computational science– Parallelism, Sparse matrices, Multigrid, Mesh generation,
Computational biology, Chemistry, Statistical physics, etc.
• Alex Pothen and I helped stand up a community on this
theme – Combinatorial Scientific Computing
• The community is thriving– 6 SIAM workshops– Several journal special issues
• Discrete algorithms have become widely recognized as
playing an important role in computational science
Thread 3:Abstractions
• Graph partitioning is an abstraction to simplify thinking about
algorithm / machine interplay– Supports performance portability across machines– Imperfectly but usefully represents both algorithm and machine
• Future machines are vastly more complex (and still unknown)– Heterogeneous nodes, more prone to errors, complex memory
hierarchies, etc.
• How do we shield application developers from this complexity!?– Need good abstractions at multiple layers!– Note: these could be software interfaces, but could also be
conceptual instead
Needed Abstractions
• Simplified machine model for programmers– Simple interface for managing (or hiding) node heterogeneity– Managing resilience– Dealing with complex memory hierarchies
• Performance portability across diverse architectures
• These need to intersect in a natural way with our
application / library / runtime software stack
Promising Abstractions
• Task-based programming models– E.g. Charm++, Legion, Uintah, PaRSEC– Create many more tasks than processors– Schedule tasks at runtime– Allows for performance portability of high-level code
• Kokkos memory abstractions– Polymorphic Multidimensional Arrays– Decouple array layout and what memory space from algorithm– Match layout to architecture without modifying algorithm’s
implementation– Supports performance portability across different architectures– Employs template metaprogramming (another CS contribution)– By Carter Edwards and others at Sandia Labs
What Happens Next?
• Virtuous circle will not survive the coming disruptions in
high performance computing
• But existing codes cannot be allowed to die– Billions of dollars in investment in software
• Computer science will have to play an ever larger role
• New programming models, algorithms and abstractions will
be needed
Conclusions
• Everything has changed– C++ instead of Fortran– Unstructured instead of structured– Multi-physics instead of single physics– UQ and optimization instead of forward solution– Template metaprogramming instead of huh?
• Yet some key things are the same– Technology changes are forcing a major paradigm shift– We need to prepare for a future we can’t yet discern
• Computer science will play a pivotal role in the challenges ahead
• Objects in the mirror are closer than they appear!
Thanks
• Fred Howes and the DOE Office of Science Applied Math Program for funding this work
• Ed Barsis, Bill Camp & Dick Allen for a great research environment
• Collaborators from early 90’s not already mentioned:– Bob Benner, Karen Devine, John Gilbert, Mike Heath– Scott Hutchinson, John Lewis, Steve Plimpton– John Shadid, Ray Tuminaro, Courtenay Vaughan– David Womble