Upload
jason-riedy
View
838
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation at Emory regarding current projects and possibilities for interaction
Citation preview
HPC LabDavid A. Bader, E. Jason Riedy, Henning Meyerhenke, (horde of students...)
HPC Lab Projects
• UHPC (DARPA)
– Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes
– CHASM: Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing
• GTFOLD (NIH): Combinatorial and Computational Methods for the Analysis, Prediction, and Design of Viral RNA Structures
• PETA-APPS (NSF): Petascale Simulation for Understanding Whole-Genome Evolution
• Graph500 (Sandia): Establish benchmarks for high-performance data-intensive computations on parallel, shared-memory platforms
• STING (Intel): An open-source dynamic graph package for Intel platforms
• CASS-MT (DoD): Graph Analytics for Streaming Data on Emerging Platforms
• GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically
Scaling Parallel Execution for Cloud-based Bioinformatics
22
HPC Lab Projects
And yet more...• Burton (NSF): Develop software and algorithmic
infrastructure for massively multithreaded architectures.
• Dynamic Graph Data Structures in X10 (IBM): Develop and evaluate graph data structures in X10
• I/UCRC Center for Hybrid and Multicore Productivity Research, CHMPR (NSF)
33
Overall goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks
Program Objectives:One PFLOPS, single cabinet including self-contained cooling
50 GFLOPS/W (equivalent to 20 pJ/FLOP)
Total cabinet power budget 57KW, includes processing resources, storage and cooling
Security embedded at all system levels
Parallel, efficient execution models
Highly programmable parallel systems
Scalable systems – from terascale to petascale
Architectural Drivers:Energy EfficientSecurity and DependabilityProgrammability
Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes
“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch
David A. Bader (CSE)Echelon Leadership Team
Ubiquitous High PerformanceComputing (DARPA): Echelon
4
Ubiquitous High PerformanceComputing (DARPA): CHASMOverall goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks
Program Objectives:Develop applications, benchmarks, and metrics
Drive UHPC development
Support performance analysis of UHPC systems
Architectural Drivers:New architectures require new benchmarksEvaluating usability requires applicationsExisting metrics do not encompass alll UHPC goals
CHASM: Challenge Applications and Scalable Metrics for Ubiquitous High Performance
Computing
Dan Campbell, GTRI, co-PI
5
GTFold (NIH):RNA Secondary Structure Prediction
Program GoalsAccurate structure of large viruses such as:
•Influenza•HIV•Polio•Tobacco Mosaic•Hanta
FACULTY
Christine Heitsch (Mathematics)David A. BaderSteve Harvey (Biology)
6
PetaApps (NSF):Phylogenetics Research on IBM Blue WatersAs part of the IBM PERCS team, we designed the IBM Blue Waters supercomputer that will sustain petascale performance on our applications, under the DARPA High Productivity Computing Systems program.
www.phylo.org
FACULTY
David A. Bader, CSE
• GRAPPA: Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithm
• Freely-available, open-source, GNU GPL• already used by other computational phylogeny groups, Caprara,
Pevzner, LANL, FBI, Smithsonian Institute, Aventis, GlaxoSmithKline, PharmCos.
• Gene-order Phylogeny Reconstruction• Breakpoint Median• Inversion Median
• over one-billion fold speedup from previous codes• Parallelism scales linearly with the number of processors
7
Graph500 (SNL):Exploration of shared-memory graph benchmarks
• Establish benchmarks for high-performance data-intensive computations on parallel, shared-memory platforms.
• NOT LINPACK!
• Spec, reference implementations at http://graph500.org
• Ranking debuted at SC10
• Press: IEEE Spectrum, Computerworld, HPCWire, MIT Tech. Review, EE Times, slashdot, etc...
8
Image Source: Nexus (Facebook application)
7
5
3
8
2
4 6
1
9
Problem Class
Size
Toy (10) 17 GiBMini (11) 140 GiB
Small (12) 1.1 TiBMedium (13) 18 TiBLarge (14) 140 TiBHuge (15) 1.1 PiB
Image Source: Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Science 302, 1722-1736, 2003
8
STING (Intel):Spatio-Temporal Interaction Networks and GraphsAn open-source dynamic graph package for Intel platforms
• Develop and tune the STING package to analyze streaming, graph-structured data for Intel multi- and manycore platforms.
• To support platforms from server farms (NYSE, Facebook) to hand-held devices
• Span update scales from terabytes per day to human entry rates
• Basis for algorithmic and performance work
9
Photo © Intel
Photo © CTL Corp.
Photo © Intel
Intel: Parallel Algorithms in Non-Numeric Computing
CASS-MT:Center for Adaptive Supercomputing Software
• DoD-sponsored, launched July 2008• Pacific-Northwest Lab
– Georgia Tech, Sandia, WA State, Delaware
• The newest breed of supercomputers have hardware set up not just for speed, but also to better tackle large networks of seemingly random data. And now, a multi-institutional group of researchers has been awarded more than $12M to develop software for these supercomputers. Applications include anywhere complex webs of information can be found: from internet security and power grid stability to complex biological networks.
10
Example: Mining Twitter for Social Good
11
ICPP 2010
Image credit: bioethicsinstitute.org
GALAXY (NIH, PI Dr. J. Taylor, Emory): Dynamically Scaling Parallel Execution for Cloud-based Bioinformatics
Next Generation Sequencing experiments produce a large amount of small base pair strings (reads)
Task: Assemble (concatenate) reads appropriately into larger substrings (contigs)Two main assembly approaches, both graph-based (de Bruijn vs. overlap/string graph)Objectives: Improve running time and ultimately also assembly accuracy
Approach:
Use overlap/string graph for higher accuracy
Parallelism to reduce running time
Compression to reduce memory consumption
Assembly
Parallel Genome Sequence Assembly
12
Pasqual:New memory-efficient, parallel fast sequence assembler
● Pasqual: Our parallel (shared memory, OpenMP) sequence assembler
● Run on commodity server (8 cores, 16 hyperthreads)
● Memory usage reduced to ca. 50% for large data sets
● Running time compared to sequential assemblers:24 to 325 times faster!
● Biologists can assembler larger data sets faster
Experimental Results Memory Usage and Running Time
13