HPC lab projects

HPC LabDavid A. Bader, E. Jason Riedy, Henning Meyerhenke, (horde of students...)

HPC Lab Projects

• UHPC (DARPA)

– Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

– CHASM: Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing

• GTFOLD (NIH): Combinatorial and Computational Methods for the Analysis, Prediction, and Design of Viral RNA Structures

• PETA-APPS (NSF): Petascale Simulation for Understanding Whole-Genome Evolution

• Graph500 (Sandia): Establish benchmarks for high-performance data-intensive computations on parallel, shared-memory platforms

• STING (Intel): An open-source dynamic graph package for Intel platforms

• CASS-MT (DoD): Graph Analytics for Streaming Data on Emerging Platforms

• GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically

Scaling Parallel Execution for Cloud-based Bioinformatics

22

HPC Lab Projects

And yet more...• Burton (NSF): Develop software and algorithmic

infrastructure for massively multithreaded architectures.

• Dynamic Graph Data Structures in X10 (IBM): Develop and evaluate graph data structures in X10

• I/UCRC Center for Hybrid and Multicore Productivity Research, CHMPR (NSF)

33

Overall goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks

Program Objectives:One PFLOPS, single cabinet including self-contained cooling

50 GFLOPS/W (equivalent to 20 pJ/FLOP)

Total cabinet power budget 57KW, includes processing resources, storage and cooling

Security embedded at all system levels

Parallel, efficient execution models

Highly programmable parallel systems

Scalable systems – from terascale to petascale

Architectural Drivers:Energy EfficientSecurity and DependabilityProgrammability

Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch

David A. Bader (CSE)Echelon Leadership Team

Ubiquitous High PerformanceComputing (DARPA): Echelon

4

Ubiquitous High PerformanceComputing (DARPA): CHASMOverall goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks

Program Objectives:Develop applications, benchmarks, and metrics

Drive UHPC development

Support performance analysis of UHPC systems

Architectural Drivers:New architectures require new benchmarksEvaluating usability requires applicationsExisting metrics do not encompass alll UHPC goals

CHASM: Challenge Applications and Scalable Metrics for Ubiquitous High Performance

Computing

Dan Campbell, GTRI, co-PI

5

GTFold (NIH):RNA Secondary Structure Prediction

Program GoalsAccurate structure of large viruses such as:

•Influenza•HIV•Polio•Tobacco Mosaic•Hanta

FACULTY

Christine Heitsch (Mathematics)David A. BaderSteve Harvey (Biology)

6

PetaApps (NSF):Phylogenetics Research on IBM Blue WatersAs part of the IBM PERCS team, we designed the IBM Blue Waters supercomputer that will sustain petascale performance on our applications, under the DARPA High Productivity Computing Systems program.

www.phylo.org

FACULTY

David A. Bader, CSE

• GRAPPA: Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithm

• Freely-available, open-source, GNU GPL• already used by other computational phylogeny groups, Caprara,

Pevzner, LANL, FBI, Smithsonian Institute, Aventis, GlaxoSmithKline, PharmCos.

• Gene-order Phylogeny Reconstruction• Breakpoint Median• Inversion Median

• over one-billion fold speedup from previous codes• Parallelism scales linearly with the number of processors

7

http://www.phylo.org/

Graph500 (SNL):Exploration of shared-memory graph benchmarks

• Establish benchmarks for high-performance data-intensive computations on parallel, shared-memory platforms.

• NOT LINPACK!

• Spec, reference implementations at http://graph500.org

• Ranking debuted at SC10

• Press: IEEE Spectrum, Computerworld, HPCWire, MIT Tech. Review, EE Times, slashdot, etc...

8

Image Source: Nexus (Facebook application)

7

5

3

8

2

4 6

1

9

Problem Class

Size

Toy (10) 17 GiBMini (11) 140 GiB

Small (12) 1.1 TiBMedium (13) 18 TiBLarge (14) 140 TiBHuge (15) 1.1 PiB

Image Source: Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Science 302, 1722-1736, 2003

8

http://graph500.org/

STING (Intel):Spatio-Temporal Interaction Networks and GraphsAn open-source dynamic graph package for Intel platforms

• Develop and tune the STING package to analyze streaming, graph-structured data for Intel multi- and manycore platforms.

• To support platforms from server farms (NYSE, Facebook) to hand-held devices

• Span update scales from terabytes per day to human entry rates

• Basis for algorithmic and performance work

9

Photo © Intel

Photo © CTL Corp.

Photo © Intel

Intel: Parallel Algorithms in Non-Numeric Computing

CASS-MT:Center for Adaptive Supercomputing Software

• DoD-sponsored, launched July 2008• Pacific-Northwest Lab

– Georgia Tech, Sandia, WA State, Delaware

• The newest breed of supercomputers have hardware set up not just for speed, but also to better tackle large networks of seemingly random data. And now, a multi-institutional group of researchers has been awarded more than $12M to develop software for these supercomputers. Applications include anywhere complex webs of information can be found: from internet security and power grid stability to complex biological networks.

10

Example: Mining Twitter for Social Good

11

ICPP 2010

Image credit: bioethicsinstitute.org

http://twitter.com/

GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically Scaling Parallel Execution for Cloud-based Bioinformatics

Next Generation Sequencing experiments produce a large amount of small base pair strings (reads)

Task: Assemble (concatenate) reads appropriately into larger substrings (contigs)Two main assembly approaches, both graph-based (de Bruijn vs. overlap/string graph)Objectives: Improve running time and ultimately also assembly accuracy

Approach:

Use overlap/string graph for higher accuracy

Parallelism to reduce running time

Compression to reduce memory consumption

Assembly

Parallel Genome Sequence Assembly

12

Pasqual:New memory-efficient, parallel fast sequence assembler

● Pasqual: Our parallel (shared memory, OpenMP) sequence assembler

● Run on commodity server (8 cores, 16 hyperthreads)

● Memory usage reduced to ca. 50% for large data sets

● Running time compared to sequential assemblers:24 to 325 times faster!

● Biologists can assembler larger data sets faster

Experimental Results Memory Usage and Running Time

13

Technology

HPC lab projects