Upload
georgina-alexander
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to Research 2011Introduction to Research 2011
Ashok Srinivasan
Florida State University
www.cs.fsu.edu/~asriniva
Ashok Srinivasan
Florida State University
www.cs.fsu.edu/~asriniva
Images from ORNL, IBM, NVIDIAImages from ORNL, IBM, NVIDIA
Part of the machine room at ORNLPart of the machine room at ORNL
The Cell processor powers the Roadrunner at LANL
The Cell processor powers the Roadrunner at LANL
NVIDIA GPUs power Tianhe-1A in China
NVIDIA GPUs power Tianhe-1A in China
OutlineOutline
Research High Performance Computing Applications and Software
Multicore processors Massively parallel processors Computational nanotechnology Simulation-based policy making
Potential Research Topics
Research AreasResearch Areas
High Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical Software
Current topics: Computational Nanotechnology, HPC on Multicore Processors, Massively Parallel Applications
New Topics: Simulation-based policy analysis
Old Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image Compression
Importance of SupercomputingImportance of Supercomputing
Fundamental scientific understanding Nano-materials, drug design
Solution of bigger problems Climate modeling
More accurate solutions Automobile crash tests
Solutions with time constraints Disaster mitigation
Study of complex interactions for policy decisions Urban planning
Some ApplicationsSome Applications
Increasing relevance to industry In 1993, fewer than 30% of top 500 supercomputers were
commercial, now, 57% are commercial A variety of application areas
Commercial Finance and insurance Medicine Aerospace and Automobiles Telecom Oil exploration Shoes! (Nike) Potato chips! Toys!
Scientific Weather prediction Earthquake modeling Epidemic modeling Materials Energy Computational biology Astro-physics
Supercomputing PowerSupercomputing Power
The amount of parallelism too is increasing, with the high end having over 200,000 cores
The amount of parallelism too is increasing, with the high end having over 200,000 cores
Geographic DistributionGeographic Distribution
North America has over half the top 500 systems However, Europe and East Asia too have a significant
share China is determined to be a supercomputing
superpower Two of its national supercomputing centers have top-five
supercomputers
Japan has the top machine and two in the top five Planning a $ 1.3 billion exascale supercomputer in 2020
Challenges in SupercomputingChallenges in Supercomputing
Hardware can be obtained with enough money But obtaining good performance on large systems is difficult
Some DOE applications ran at 1% efficiency on 10,000 cores They will have to deal with a million threads soon, and with a
billion at the exa-scale Don’t think of supercomputing as a means of solving current
problems faster, but as a means of solving problems we earlier thought we could not solve
Development of software tools to make use of the machines easier
Architectural TrendsArchitectural Trends
Massive parallelism 10K processor systems will be commonplace Large end already has over 500K processors
Single chip multiprocessing All processors will be multicore Heterogeneous multicore processors
Cell used in the PS3 GPGPU 80-core processor from Intel Processors with hundreds of cores are already commercially
available
Distributed environments, such as the Grid But it is hard to get good performance on these
systems
Accelerating Applications with GPUsAccelerating Applications with GPUs
Over a hundred cores per GPU Hide memory latency with thousands of threads Can accelerate a traditional computer to a teraflop GPU cluster at FSU
Quantum Monte Carlo applications Algorithms
Linear algebra, FFT, compression, etc
Small Discrete Fourier Transforms Small Discrete Fourier Transforms (DFT) on GPUs(DFT) on GPUs
GPUs are effective for large DFTs, but not small DFTs However, they can be effective for a large number of small DFTs
Useful for AFQMC
We use the asymptotically slow matrix-multiplication based DFT for very small sizes
We combine it with mixed-radix for larger sizes
We use asynchronous memory transfer to deal with host-device data transfer overhead
Comparison of DFT PerformanceComparison of DFT Performance
Comparison of 512 simultaneous DFTs without host-device data transfer
2-D DFTs
3-D DFTs
Petascale Quantum Monte CarloPetascale Quantum Monte Carlo
Originally a DOE funded project involving collaboration between ORNL, UIUC, Cornell, UTK, CWM, and NCSU
Now funded by ORAU/ORNL
Scale Quantum Monte Carlo applications to petascale (one million gigaflops) machines Load balancing, fault tolerance, other optimizations
Load BalancingLoad Balancing
In current implementations, such as QWalk and QMCPack, cores send excess walkers to cores with fewer walkers
In the new algorithm (alias method), cores may send more than their excess, and receive walkers even if they originally had an excess Load can be balanced with each core receiving from at most
one other core
Also optimal in maximum number of walkers received
Total number of walkers sent may be twice the optimal
Performance ComparisonPerformance Comparison
Mean number of walkers migrated
Maximum number of receives
Comparisons with QWalk
Process-Node AffinityProcess-Node Affinity
Node allocation is not necessarily ideal for minimizing communication
Process-node affinity can, therefore, be important
Allocated nodes for a 12,000 core run on Jaguar
Load Balancing with AffinityLoad Balancing with Affinity
Renumbering the nodes improves load balancing and AllGather time
Basic load balancing Load balancing after renumbering
Results on Jaguar
Potential Research TopicsPotential Research Topics
High Performance Computing on Multicore Processors Algorithms, Applications, and libraries on GPUs
Applications on Massively Parallel Processors Quantum Monte Carlo applications Load balancing and communication optimizations
Simulation-based policy decisions Combine scientific computing with models of social interactions
to help make policy decisions