167
Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced Computing Center [email protected] December 3, 2001 The University of Texas at Austin Texas Advanced Computing Center

Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Embed Size (px)

Citation preview

Page 1: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction toHigh Performance Computing:

Parallel Computing, Distributed  Computing, Grid Computing and More

Dr. Jay Boisseau

Director, Texas Advanced Computing Center

[email protected]

December 3, 2001

The University of Texas at AustinTexas Advanced Computing Center

Page 2: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Outline

• Preface

• What is High Performance Computing?

• Parallel Computing

• Distributed Computing, Grid Computing, and More

• Future Trends in HPC

Page 3: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Purpose

• Purpose of this workshop:– to educate researchers about the value and

impact of high performance computing (HPC) techniques and technologies in conducting computational science and engineering

• Purpose of this presentation:– to educate researchers about the techniques and

tools of parallel computing, and to show them the possibilities presented by distributed computing and Grid computing

Page 4: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Goals

• Goals of this presentation are to help you:1. understand the ‘big picture’ of high performance

computing

2. develop a comprehensive understanding of parallel computing

3. begin to understand how Grid and distributed computing will further enhance computational science capabilities

Page 5: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Content and Context

• This material is an introduction and an overview– It is not a comprehensive HPC, so further reading

(much more!) is recommended.

• Presentation is followed by additional speakers with detailed presentations on specific HPC and science topics

• Together, these presentations will help prepare you to use HPC in your scientific discipline.

Page 6: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Background - me

• Director of the Texas Advanced Computing Center (TACC) at the University of Texas

• Formerly at San Diego Supercomputer Center (SDSC), Artic Region Supercomputing Center

• 10+ years in HPC

• Known Luis for 4 years - plan to develop strong relationship between TACC and CeCalCULA

Page 7: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Background – TACC

• Mission:– to enhance the academic research capabilities of

the University of Texas and its affiliates through the application of advanced computing resources and expertise

• TACC activities include:– Resources– Support– Development– Applied research

Page 8: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

TACC Activities

• TACC resources and support includes:– HPC systems – Scientific visualization resources– Data storage/archival systems

• TACC research and development areas: – HPC– Scientific Visualization– Grid Computing

Page 9: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Current HPC Systems

FDDI

HiPPI

CRAY SV116 CPU, 16GB

Memory

ARCHIVE640GB

CRAY T3E256+ procs

128 MB/proc

500GBaurora

golden

IBM SP64+ procs

256 MB/proc

azure

300GB

AscendRouter

Page 10: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

New HPC Systems

• Four IBM p690 HPC servers– 16 Power4 Processors

• 1.3 GHz: 5.2 Gflops per proc,83.2 Gflops per server

– 16 GB Shared Memory• >200 GB/s memory bandwidth!

– 144 GB Disk

• 1 TB disk to partition across servers

• Will configure as single system (1/3 Tflop) with single GPFS system (1 TB) in 2Q02

Page 11: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

New HPC Systems

• IA64 Cluster– 20 2-way nodes

• Itanium (800 MHz) processors

• 2 GB memory/node

• 72 GB disk/node

– Myrinet 2000 switch – 180GB shared disk

• IA32 Cluster– 32 2-way nodes

• Pentium III (1 GHz) processors

• 1 GB Memory

• 18.2 GB disk/node

– Myrinet 2000 Switch

750 GB IBM GPFS parallel file system for both clusters

Page 12: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

World-Class Vislab

• SGI Onyx2– 24 CPUs, 6 Infinite Reality 2 Graphics Pipelines– 24 GB Memory, 750 GB Disk

• Front and Rear Projection Systems– 3x1 cylindrically-symmetric Power Wall– 5x2 large-screen, 16:9 panel Power Wall

• Matrix switch between systems, projectors, rooms

Page 13: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

More Information

• URL: www.tacc.utexas.edu

• E-mail Addresses:– General Information: [email protected]– Technical assistance: [email protected]

• Telephone Numbers:– Main Office: (512) 475-9411– Facsimile transmission: (512) 475-9445– Operations Room: (512) 475-9410

Page 14: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Outline

• Preface

• What is High Performance Computing?

• Parallel Computing

• Distributed Computing, Grid Computing, and More

• Future Trends in HPC

Page 15: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

‘Supercomputing’

• First HPC systems were vector-based systems (e.g. Cray)– named ‘supercomputers’ because they were an

order of magnitude more powerful than commercial systems

• Now, ‘supercomputer’ has little meaning– large systems are now just scaled up versions of

smaller systems

• However, ‘high performance computing’ has many meanings

Page 16: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

HPC Defined

• High performance computing:– can mean high flop count

• per processor• totaled over many processors working on the same

problem• totaled over many processors working on related

problems

– can mean faster turnaround time• more powerful system• scheduled to first available system(s)• using multiple systems simultaneously

Page 17: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

My Definitions

• HPC: any computational technique that solves a large problem faster than possible using single, commodity systems– Custom-designed, high-performance processors

(e.g. Cray, NEC)– Parallel computing– Distributed computing– Grid computing

Page 18: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

My Definitions

• Parallel computing: single systems with many processors working on the same problem

• Distributed computing: many systems loosely coupled by a scheduler to work on related problems

• Grid Computing: many systems tightly coupled by software and networks to work together on single problems or on related problems

Page 19: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Importance of HPC

• HPC has had tremendous impact on all areas of computational science and engineering in academia, government, and industry.

• Many problems have been solved with HPC techniques that were impossible to solve with individual workstations or personal computers.

Page 20: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Outline

• Preface

• What is High Performance Computing?

• Parallel Computing

• Distributed Computing, Grid Computing, and More

• Future Trends in HPC

Page 21: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

What is a Parallel Computer?

• Parallel computing: the use of multiple computers or processors working together on a common task

• Parallel computer: a computer that contains multiple processors:– each processor works on its section of the

problem– processors are allowed to exchange information

with other processors

Page 22: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel vs. Serial Computers

• Two big advantages of parallel computers:1. total performance

2. total memory

• Parallel computers enable us to solve problems that:– benefit from, or require, fast solution– require large amounts of memory– example that requires both: weather forecasting

Page 23: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel vs. Serial Computers

• Some benefits of parallel computing include:– more data points

• bigger domains• better spatial resolution• more particles

– more time steps • longer runs• better temporal resolution

– faster execution• faster time to solution• more solutions in same time• lager simulations in real time

Page 24: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Serial Processor Performance

0

20

40

60

1 6

Time (years)

per

form

ance

Moore'sLaw

Future(?)

Although Moore’s Law ‘predicts’ that single processor performance doubles every 18 months, eventually physical limits on manufacturing technology will be reached

Page 25: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Types of Parallel Computers

• The simplest and most useful way to classify modern parallel computers is by their memory model:– shared memory– distributed memory

Page 26: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

P P P P P P

BUS

Memory

M

P

M

P

M

P

M

P

M

P

M

P

Network

Shared memory - single address space. All processors have access to a pool of shared memory. (Ex: SGI Origin, Sun E10000)

Distributed memory - each processor has it’s own local memory. Must do message passing to exchange data between processors. (Ex: CRAY T3E, IBM SP, clusters)

Shared vs. Distributed Memory

Page 27: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

P P P P P P

BUS

Memory

Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors, or SMPs (Sun E10000)

P P P P

BUS

Memory

P P P P

BUS

Memory

Network

Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. Easier to scale than SMPs (SGI Origin)

Shared Memory: UMA vs. NUMA

Page 28: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Distributed Memory: MPPs vs. Clusters

• Processor-memory nodes are connected by some type of interconnect network– Massively Parallel Processor (MPP): tightly

integrated, single system image.– Cluster: individual computers connected by s/w

CPU

MEM

CPU

MEM CPU

MEM

CPU

MEM

CPU

MEM

CPU

MEM CPU

MEM

CPU

MEM CPU

MEM

CPU

MEM

CPU

MEM

CPU

MEM

CPU

MEM

CPU

MEM CPU

MEM

CPU

MEM CPU

MEM

CPU

MEM

InterconnectNetwork

Page 29: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processors, Memory, & Networks

• Both shared and distributed memory systems have:1. processors: now generally commodity RISC

processors

2. memory: now generally commodity DRAM

3. network/interconnect: between the processors and memory (bus, crossbar, fat tree, torus, hypercube, etc.)

• We will now begin to describe these pieces in detail, starting with definitions of terms.

Page 30: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Related Terms

Clock period (cp): the minimum time interval between successive actions in the processor. Fixed: depends on design of processor. Measured in nanoseconds (~1-5 for fastest processors). Inverse of frequency (MHz).

Instruction: an action executed by a processor, such as a mathematical operation or a memory operation.

Register: a small, extremely fast location for storing data or instructions in the processor.

Page 31: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Related Terms

Functional Unit (FU): a hardware element that performs an operation on an operand or pair of operations. Common FUs are ADD, MULT, INV, SQRT, etc.

Pipeline : technique enabling multiple instructions to be overlapped in execution.

Superscalar: multiple instructions are possible per clock period.

Flops: floating point operations per second.

Page 32: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Related Terms

Cache: fast memory (SRAM) near the processor. Helps keep instructions and data close to functional units so processor can execute more instructions more rapidly.

Translation-Lookaside Buffer (TLB): keeps addresses of pages (block of memory) in main memory that have recently been accessed (a cache for memory addresses)

Page 33: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Memory-Related Terms

SRAM: Static Random Access Memory (RAM). Very fast (~10 nanoseconds), made using the same kind of circuitry as the processors, so speed is comparable.

DRAM: Dynamic RAM. Longer access times (~100 nanoseconds), but hold more bits and are much less expensive (10x cheaper).

Memory hierarchy: the hierarchy of memory in a parallel system, from registers to cache to local memory to remote memory. More later.

Page 34: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Interconnect-Related Terms

• Latency: – Networks: How long does it take to start sending a

"message"? Measured in microseconds.– Processors: How long does it take to output

results of some operations, such as floating point add, divide etc., which are pipelined?)

• Bandwidth: What data rate can be sustained once the message is started? Measured in Mbytes/sec or Gbytes/sec

Page 35: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Interconnect-Related Terms

Topology: the manner in which the nodes are connected. – Best choice would be a fully connected network

(every processor to every other). Unfeasible for cost and scaling reasons.

– Instead, processors are arranged in some variation of a grid, torus, or hypercube.

3-d hypercube 2-d mesh 2-d torus

Page 36: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Memory Problem

• Processors issue instructions roughly every nanosecond.

• DRAM can be accessed roughly every 100 nanoseconds (!).

• DRAM cannot keep processors busy! And the gap is growing:– processors getting faster by 60% per year– DRAM getting faster by 7% per year (SDRAM and

EDO RAM might help, but not enough)

Page 37: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Memory Performance Gap

µProc60%/yr.

DRAM7%/yr.

1

10

100

1000

19

80

19

81

19

83

19

84

19

85

19

86

19

87

19

88

19

89

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

DRAM

CPU19

82

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

“Moore’s Law”

From D. Patterson, CS252, Spring 1998 ©UCB

Page 38: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processor-Memory Performance Gap

• Problem becomes worse when remote (distributed or NUMA) memory is needed– network latency is roughly 1000-10000

nanoseconds (roughly 1-10 microseconds)– networks getting faster, but not fast enough

• Therefore, cache is used in all processors– almost as fast as processors (same circuitry)– sits between processors and local memory– expensive, can only use small amounts– must design system to load cache effectively

Page 39: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

CPU

Main Memory

Cache

Processor-Cache-Memory

• Cache is much smaller than main memory and hence there is mapping of data from main memory to cache.

Page 40: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

CPU

Cache

LocalMemory

RemoteMemory

SPEED SIZE COST/BIT

Memory Hierarchy

Page 41: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Cache-Related Terms

• ICACHE : Instruction cache

• DCACHE (L1) : Data cache closest to registers

• SCACHE (L2) : Secondary data cache– Data from SCACHE has to go through DCACHE

to registers– SCACHE is larger than DCACHE – Not all processors have SCACHE

Page 42: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Cache Benefits

• Data cache was designed with two key concepts in mind– Spatial Locality

• When an element is referenced its neighbors will be referenced also

• Cache lines are fetched together• Work on consecutive data elements in the same cache

line

– Temporal Locality• When an element is referenced, it might be referenced

again soon• Arrange code so that data in cache is reused often

Page 43: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

cache

main memory

Direct-Mapped Cache

• Direct mapped cache: A block from main memory can go in exactly one place in the cache. This is called direct mapped because there is direct mapping from any block address in memory to a single location in the cache.

Page 44: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

cache

Main memory

Fully Associative Cache

• Fully Associative Cache : A block from main memory can be placed in any location in the cache. This is called fully associative because a block in main memory may be

associated with any entry in the cache.

Page 45: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

2-way set-associative cache

Main memory

Set Associative Cache

• Set associative cache : The middle range of designs between direct mapped cache and fully associative cache is called set-associative cache. In a n-way set-associative cache a block from main memory can go into N (N > 1) locations in the cache.

Page 46: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Cache-Related Terms

Least Recently Used (LRU): Cache replacement strategy for set associative caches. The cache block that is least recently used is replaced with a new block.

Random Replace: Cache replacement strategy for set associative caches. A cache block is randomly replaced.

Page 47: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Example: CRAY T3E Cache

• The CRAY T3E processors can execute– 2 floating point ops (1 add, 1 multiply) and– 2 integer/memory ops (includes 2 loads or 1 store)

• To help keep the processors busy– on-chip 8 KB direct-mapped data cache– on-chip 8 KB direct-mapped instruction cache– on-chip 96 KB 3-way set associative secondary

data cache with random replacement.

Page 48: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Putting the Pieces Together

• Recall:– Shared memory architectures:

• Uniform Memory Access (UMA): Symmetric Multi-Processors (SMP). Ex: Sun E10000

• Non-Uniform Memory Access (NUMA): Most common are Distributed Shared Memory (DSM), or cc-NUMA (cache coherent NUMA) systems. Ex: SGI Origin 2000

– Distributed memory architectures:• Massively Parallel Processor (MPP): tightly integrated

system, single system image. Ex: CRAY T3E, IBM SP• Clusters: commodity nodes connected by interconnect.

Example: Beowulf clusters.

Page 49: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Symmetric Multiprocessors (SMPs)

• SMPs connect processors to global shared memory using one of:– bus– crossbar

• Provides simple programming model, but has problems:– buses can become saturated– crossbar size must increase with # processors

• Problem grows with number of processors, limiting maximum size of SMPs

Page 50: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Shared Memory Programming

• Programming models are easier since message passing is not necessary. Techniques:– autoparallelization via compiler options– loop-level parallelism via compiler directives– OpenMP– pthreads

• More on programming models later.

Page 51: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Massively Parallel Processors

• Each processor has it’s own memory:– memory is not shared globally– adds another layer to memory hierarchy (remote

memory)

• Processor/memory nodes are connected by interconnect network– many possible topologies– processors must pass data via messages– communication overhead must be minimized

Page 52: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Communications Networks

• Custom– Many vendors have custom interconnects that

provide high performance for their MPP system– CRAY T3E interconnect is the fastest for MPPs:

lowest latency, highest bandwidth

• Commodity– Used in some MPPs and all clusters– Myrinet, Gigabit Ethernet, Fast Ethernet, etc.

Page 53: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Types of Interconnects

• Fully connected– not feasible

• Array and torus– Intel Paragon (2D array), CRAY T3E (3D torus)

• Crossbar– IBM SP (8 nodes)

• Hypercube– SGI Origin 2000 (hypercube), Meiko CS-2 (fat tree)

• Combinations of some of the above– IBM SP (crossbar & fully connected for 80 nodes)– IBM SP (fat tree for > 80 nodes)

Page 54: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clusters

• Similar to MPPs– Commodity processors and memory

• Processor performance must be maximized

– Memory hierarchy includes remote memory– No shared memory--message passing

• Communication overhead must be minimized

• Different from MPPs– All commodity, including interconnect and OS– Multiple independent systems: more robust– Separate I/O systems

Page 55: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Cluster Pros and Cons

• Pros– Inexpensive– Fastest processors first– Potential for true parallel I/O– High availability

• Cons:– Less mature software (programming and system)– More difficult to manage (changing slowly)– Lower performance interconnects: not as scalable

to large number (but have almost caught up!)

Page 56: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Distributed Memory Programming

• Message passing is most efficient– MPI– MPI-2– Active/one-sided messages

• Vendor: SHMEM (T3E), LAPI (SP• Coming in MPI-2

• Shared memory models can be implemented in software, but are not as efficient.

• More on programming models in the next section.

Page 57: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

“Distributed Shared Memory”

• More generally called cc-NUMA (cache coherent NUMA)

• Consists of m SMPs with n processors in a global address space:– Each processor has some local memory (SMP)– All processors can access all memory: extra

“directory” hardware on each SMP tracks values stored in all SMPs

– Hardware guarantees cache coherency– Access to memory on other SMPs slower (NUMA)

Page 58: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

“Distributed Shared Memory”

• Easier to build because of slower access to remote memory (no expensive bus/crossbar)

• Similar cache problems

• Code writers should be aware of data distribution

• Load balance: Minimize access of “far” memory

Page 59: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

DSM Rationale and Realities

• Rationale: combine ease of SMP programming with scalability of MPP programming at much at cost of MPP

• Reality: NUMA introduces additional layers in SMP memory hierarchy relative to SMPs, so scalability is limited if programmed as SMP

• Reality: Performance and high scalability require programming to the architecture.

Page 60: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clustered SMPs

• Simpler than DSMs:– composed of nodes connected by network, like an

MPP or cluster– each node is an SMP– processors on one SMP do not share memory on

other SMPs (no directory hardware in SMP nodes)– communication between SMP nodes is by

message passing– Ex: IBM Power3-based SP systems

Page 61: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clustered SMP Diagram

Network

P P P P

BUS

Memory

P P P P

BUS

Memory

Page 62: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Reasons for Clustered SMPs

• Natural extension of SMPs and clusters– SMPs offer great performance up to their

crossbar/bus limit– Connecting nodes is how memory and

performance are increased beyond SMP levels– Can scale to larger number of processors with less

scalable interconnect– Maximum performance:

• Optimize at SMP level - no communication overhead• Optimize at MPP level - fewer messages necessary for

same number of processors

Page 63: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clustered SMP Drawbacks

• Clustering SMPs has drawbacks– No shared memory access over entire system,

unlike DSMs– Has other disadvantages of DSMs

• Extra layer in memory hierarchy• Performance requires more effort from programmer than

SMPs or MPPs

• However, clustered SMPs provide a means for obtaining very high performance and scalability

Page 64: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clustered SMP: NPACI “Blue Horizon”

• IBM SP system:– Power3 processors: good peak performance (~1.5

Gflops)– better sustained performance (highly superscalar

and pipelined) than for many other processors– SMP nodes have 8 Power3 processors– System has 144 SMP nodes (1154 processors

total)

Page 65: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Programming Clustered SMPs

• NSF: Most users use only MPI, even for intra- node messages

• DoE: Most applications are being developed with MPI (between nodes) and OpenMP (intra-node)

• MPI+OpenMP programming is more complex, but might yield maximum performance

• Active messages and pthreads would theoretically give maximum performance

Page 66: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Data parallelism Task parallelism

Types of Parallelism

• Data parallelism: each processor performs the same task on different sets or sub-regions of data

• Task parallelism: each processor performs a different task

• Most parallel applications fall somewhere on the continuum between these two extremes.

Page 67: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Data vs. Task Parallelism

• Example of data parallelism:– In a bottling plant, we see several ‘processors’, or

bottle cappers, applying bottle caps concurrently on rows of bottles.

• Example of task parallelism;– In a restaurant kitchen, we see several chefs, or

‘processors’, working simultaneously on different parts of different meals.

– A good restaurant kitchen also demonstrates load balancing and synchronization--more on those topics later.

Page 68: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Example: Master-Worker Parallelism

• A common form of parallelism used in developing applications years ago (especially in PVM) was Master-Worker parallelism:– a single processor is responsible for distributing

data and collecting results (task parallelism)– all other processors perform same task on their

portion of data (data parallelism)

Page 69: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel Programming Models

• The primary programming models in current use are– Data parallelism - operations are performed in

parallel on collections of data structures. A generalization of array operations.

– Message passing - processes possess local memory and communicate with other processes by sending and receiving messages.

– Shared memory - each processor has access to a single shared pool of memory

Page 70: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel Programming Models

• Most parallelization efforts fall under the following categories.– Codes can be parallelized using message-passing

libraries such as MPI.– Codes can be parallelized using compiler

directives such as OpenMP.– Codes can be written in new parallel languages.

Page 71: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Programming Models Architectures

• Natural mappings– data parallel CM-2 (SIMD machine)

– message passing IBM SP (MPP)

– shared memory SGI Origin, Sun E10000

• Implemented mappings– HPF (a data parallel language) and MPI (a

message passing library) have been implemented on nearly all parallel machines

– OpenMP (a set of directives, etc. for shared memory programming) has been implemented on most shared memory systems.

Page 72: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

SPMD

• All current machines are MIMD systems (Multiple Instruction, Multiple Data) and are capable of either data parallelism or task parallelism.

• The primary paradigm for programming parallel machines is the SPMD paradigm: Single Program, Multiple Data– each processor runs a copy of same source code– enables data parallelism (through data

decomposition) and task parallelism (through intrinsic functions that return the processor ID)

Page 73: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

OpenMP - Shared Memory Standard

• OpenMP is a new standard for shared memory programming: SMPs and cc-NUMAs.– OpenMP provides a standard set of directives,

run-time library routines, and– environment variables for parallelizing code under

a shared memory model.– Very similar to Cray PVP autotasking directives,

but with much more functionality. (Cray now uses supports OpenMP.)

– See http://www.openmp.org for more information

Page 74: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

program add_arraysparameter (n=1000)real x(n),y(n),z(n)read(10) x,y,z

do i=1,n x(i) = y(i) + z(i)enddo...end

Fortran 77

program add_arraysparameter (n=1000)real x(n),y(n),z(n)read(10) x,y,z

!$OMP PARALLEL DOdo i=1,n x(i) = y(i) + z(i)enddo...end

Fortran 77 + OpenMP

Highlighted directive specifies that loop is executed in parallel. Each processor executes a subset of the loop iterations.

OpenMP Example

Page 75: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPI - Message Passing Standard

• MPI has emerged as the standard for message passing in both C and Fortran programs. No longer need to know MPL, PVM, TCGMSG, etc.

• MPI is both large and small:– MPI is large, since it contains 125 functions which

give the programmer fine control over communications

– MPI is small, since message passing programs can be written using a core set of just six functions.

Page 76: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

PE 0 calls MPI_SEND to pass the real variable x to PE 1.PE 1 calls MPI_RECV to receive the real variable y from PE 0

if(myid.eq.0) then call MPI_SEND(x,1,MPI_REAL,1,100,MPI_COMM_WORLD,ierr)endif

if(myid.eq.1) then call MPI_RECV(y,1,MPI_REAL,0,100,MPI_COMM_WORLD, status,ierr)endif

MPI Examples - Send and Receive

MPI messages are two-way: they require a send and a matching receive:

Page 77: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPI Example - Global Operations

PE 6 collects the single (1) integer value n from all other processors and puts the sum (MPI_SUM) into into sum

call MPI_REDUCE(n,allsum,1,MPI_INTEGER,MPI_SUM,6, MPI_COMM_WORLD,ierr)

MPI also has global operations to broadcast and reduce (collect) information

PE 5 broadcasts the single (1) integer value n to all other processors

call MPI_BCAST(n,1,MPI_INTEGER,5, MPI_COMM_WORLD,ierr)

Page 78: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPI Implementations

• MPI is typically implemented on top of the highest performance native message passing library for every distributed memory machine.

• MPI is a natural model for distributed memory machines (MPPs, clusters)

• MPI offers higher performance on DSMs beyond the size of an individual SMP

• MPI is useful between SMPs that are clustered

• MPI can be implemented on shared memory machines

Page 79: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Extensions to MPI: MPI-2

• A standard for MPI-2 has been developed which extends the functionality of MPI. New features include:– One sided communications - eliminates the need

to post matching sends and receives. Similar in functionality to the shmem PUT and GET on the CRAY T3E (most systems have analogous library)

– Support for parallel I/O– Extended collective operations– No full implementation yet - it is difficult for

vendors

Page 80: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPI vs. OpenMP

• There is no single best approach to writing a parallel code. Each has pros and cons:– MPI - powerful, general, and universally available

message passing library which provides very fine control over communications, but forces the programmer to operate at a relatively low level of abstraction.

– OpenMP - conceptually simple approach for creating parallel codes on a shared memory machines, but not applicable to distributed memory platforms.

Page 81: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPI vs. OpenMP

• MPI is the most general (problems types) and portable (platforms, although not efficient for SMPs)

• The architecture and the problem type often make the decision for you.

Page 82: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel Libraries

• Finally, there are parallel mathematics libraries that enable users to write (serial) codes, then call parallel solver routines :– ScaLAPACK is for solving dense linear system of

equations, eigenvalues and least square problems. Also see PLAPACK.

– PETSc is for solving linear and non-linear partial differential equations (includes various iterative solvers for sparse matrices).

– Many others: check NETLIB for complete survey:http://www.netlib.org

Page 83: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Hurdles in Parallel Computing

There are some hurdles in parallel computing:– Scalar performance: Fast parallel codes require

efficient use of the underlying scalar hardware– Parallel algorithms: Not all scalar algorithms

parallelize well, may need to rethink problem• Communications: Need to minimize the time spent doing

communications• Load balancing: All processors should do roughly the

same amount of work

– Amdahl’s Law: Fundamental limit on parallel computing

Page 84: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Scalar Performance

• Underlying every good parallel code is a good scalar code.

• If a code scales to 256 processors but only gets 1% of peak performance, it is still a bad parallel code.– Good news: Everything that you know about serial

computing will be useful in parallel computing!– Bad news: It is difficult to get good performance

out of the processors and memory used in parallel machines. Need to use cache effectively.

Page 85: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

0.1

1

10

100

1 10 100

Number of processors

tim

e

parallel

serial

In this case, the parallel code achieves perfect scaling, but does not match the performance of the serial code until 32 processors are used

Serial Performance

Page 86: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

main memory

cache

CPU

A simplified memoryhierarchy

Small& fast

Big& slow

The data cache was designed with two key concepts in mind:

Spatial locality - cache is loaded an entire line (4-32 words) at a time to take advantage of the fact that if a location in memory is required, nearby locations will probably also be required

Temporal locality - once a word is loaded into cache it remains there until the cache line is needed to hold another word of data.

Use Cache Effectively

Page 87: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Non-Cache Issues

• There are other issues to consider to achieve good serial performance:– Force reductions, e.g., replacement of divisions

with multiplications-by-inverse– Evaluate and replace common sub-expressions– Pushing loops inside subroutines to minimize

subroutine call overhead– Force function inlining (compiler option)– Perform interprocedural analysis to eliminate

redundant operations (compiler option)

Page 88: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel Algorithms

• The algorithm must be naturally parallel!– Certain serial algorithms do not parallelize well.

Developing a new parallel algorithm to replace a serial algorithm can be one of the most difficult task in parallel computing.

– Keep in mind that your parallel algorithm may involve additional work or a higher floating point operation count.

Page 89: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Parallel Algorithms

– Keep in mind that the algorithm should• need the minimum amount of communication (Monte

Carlo algorithms are excellent examples)• balance the load among the processors equally

– Fortunately, a lot of research has been done in parallel algorithms, particularly in the area of linear algebra. Don’t reinvent the wheel, take full advantage of the work done by others:

• use parallel libraries supplied by the vendor whenever possible!

• use ScaLAPACK, PETSc, etc. when applicable

Page 90: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Busy timeIdle time

t

PE 0PE 1

The figures below show the timeline for parallel codes run on two processors. In both cases, the total amount of work done is the same, but in the second case the work is distributed more evenly between the two processors resulting in a shorter time to solution.

PE 0PE 1

Synchronizationpoints

Load Balancing

Page 91: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Communications

• Two key parameters of the communications network are– Latency: time required to initiate a message. This

is the critical parameter in fine grained codes, which require frequent interprocessor communications. Can be thought of as the time required to send a message of zero length.

– Bandwidth: steady-state rate at which data can be sent over the network.This is the critical parameter in coarse grained codes, which require infrequent communication of large amounts of data.

Page 92: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Latency and Bandwidth Example

• Bucket brigade: the old style of fighting fires in which the townspeople formed a line from the well to the fire and passed buckets of water down the line– latency - the delay until the first bucket to arrives

at the fire– bandwidth - the rate at which buckets arrive at the

fire

Page 93: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Sequential: t = t(comp) + t(comm)Overlapped: t = t(comp) + t(comm) - t(comp) t(comm)

More on Communications

• Time spent performing communications is considered overhead. Try minimize the impact of communications:– minimize the effect of latency by combining large

numbers of small messages into small numbers of large messages.

– communications and computation do not have to be done sequentially, can often overlap communication and computations

Page 94: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

• dial• “Hi mom”• hang up• dial• “How are things?”• hang up• dial• “in the U.S.?”• hang up• dial• At this point many mothers would not pick up the next call.

• dial• “Hi mom. How are things in the U.S.?. Yak, yak...”• hang up

By transmitting a single large message, Ionly have to pay the price for the dialinglatency once. I transmit more informationin less time.

The following examples of “phoning home” illustrate the value of combining many small messages into a single larger one.

Combining Small Messages into Larger Ones

Page 95: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

In the following example, a stencil operation is performed on a 10 x 10 array that has been distributed over two processors. Assume periodic boundary conditions.

Boundary elements - requires datafrom neighboring processor

Interior elements

• Initiate communications• Perform computations on interior elements• Wait till communications are finished• Perform computations on boundary elements

Stencil operation:y(i,j)=x(i+1,j)+x(i-1,j)+x(i,j+1)+x(i,j-1)

PE0 PE1

Overlapping Communications and Computations

Page 96: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors. Two equivalent expressions for Amdahl’s Law are given below:

tN = (fp/N + fs)t1 Effect of multiple processors on run time

S = 1/(fs + fp/N) Effect of multiple processors on speedup

Where:fs = serial fraction of codefp = parallel fraction of code = 1 - fs

N = number of processors

Amdahl’s Law

Page 97: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

0

50

100

150

200

250

0 50 100 150 200 250

Number of processors

spee

dup

fp = 1.000

fp = 0.999

fp = 0.990

fp = 0.900

It takes only a small fraction of serial content in a code to degrade the parallel performance. It is essential to determine the scaling behavior of your code before doing production runs using large numbers of processors

Illustration of Amdahl’s Law

Page 98: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Amdahl’s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications. In reality, communications (and I/O) will result in a further degradation of performance.

0

10

20

30

40

50

60

70

80

0 50 100 150 200 250Number of processors

spee

dup Amdahl's Law

Reality

fp = 0.99

Amdahl’s Law Vs. Reality

Page 99: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

More on Amdahl’s Law

• Amdahl’s Law can be generalized to any two processes of with different speeds

• Ex.: Apply to fprocessor and fmemory:– The growing processor-memory performance gap

will undermine our efforts at achieving maximum possible speedup!

Page 100: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Generalized Amdahl’s Law

• Amdahl’s Law can be further generalized to handle an arbitrary number of processes of various speeds. (The total fractions representing each process must still equal 1.)

• This is a weighted Harmonic mean. Application performance is limited by performance of the slowest component as much as it is determined by the fastest.

Ravg = 1

fi

R ii 1

N

Page 101: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Gustafson’s Law

• Thus, Amdahl’s Law predicts that there is a maximum scalability for an application, determined by its parallel fraction, and this limit is generally not large.

• There is a way around this: increase the problem size– bigger problems mean bigger grids or more

particles: bigger arrays– number of serial operations generally remains

constant; number of parallel operations increases: parallel fraction increases

Page 102: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The 1st Question to Ask Yourself Before You Parallelize Your Code

• Is it worth my time? – Do the CPU requirements justify parallelization?– Do I need a parallel machine in order to get

enough aggregate memory?– Will the code be used just once or will it be a major

production code?

• Your time is valuable, and it can be very time consuming to write, debug, and test a parallel code. The more time you spend writing a parallel code, the less time you have to spend doing your research.

Page 103: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The 2nd Question to Ask Yourself Before You Parallelize Your Code

• How should I decompose my problem?– Do the computations consist of a large number of

small, independent problems - trajectories, parameter space studies, etc? May want to consider a scheme in which each processor runs the calculation for a different set of data

– Does each computation have large memory or CPU requirements? Will probably have to break up a single problem across multiple processors

Page 104: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Distributing the Data

• Decision on how to distribute the data should consider these issues:– Load balancing:

Often implies an equal distribution of data, but more generally means an equal distribution of work

– Communications:Want to minimize the impact of communications, taking into account both size and number of messages

– Physics:Choice of distribution will depend on the processes that are being modeled in each direction.

Page 105: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

A good distribution if the physics of theproblem is the same in both directions.Minimizes the amount of data that mustbe communicated between processors.

If expensive global operations need to becarried out in the x-direction (ex. FFTs), this is probably a better choice.

A Data Distribution Example

Page 106: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Imagine that we are doing a simulationin which more work is required for thegrid points covering the shaded object.

Neither data distribution from the previous example will result in good load balancing.

May need to consider an irregular gridor a different data structure.

A More Difficult Example

Page 107: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Choosing a Resource

• The following factors should be taken into account when choosing a resource:– What is the granularity of my code?– Are there any special hardware features that I

need or can take advantage of?– How many processors will the code be run on?– What are my memory requirements?

• By carefully considering these points, you can make the right choice of computational platform.

Page 108: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Granularity is a measure of the amount of work done by each processor between synchronization events.

PE 0PE 1

Low-granularity application

PE 0PE 1

High-granularity application

Generally, latency is the critical parameter for low-granularity codes, while processor performance is the key factor for high-granularity applications.

Choosing a Resource: Granularity

Page 109: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Choosing a Resource: Special Hardware Features

• Various HPC platforms have different hardware features that your code may be able to take advantage of. Examples include:– Hardware support for divide and square root

operations (IBM SP)– Parallel I/O file system (IBM SP)– Data streams (CRAY T3E)– Control over cache alignment (CRAY T3E)– E-registers for by-passing cache hierarchy

(CRAY T3E)

Page 110: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Importance of Parallel Computing

• High performance computing has become almost synonymous with parallel computing.

• Parallel computing is necessary to solve big problems (high resolution, lots of timesteps, etc.) in science and engineering.

• Developing and maintaining efficient, scalable parallel applications is difficult. However, the payoff can be tremendous.

Page 111: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Importance of Parallel Computing

• Before jumping in, think about– whether or not your code truly needs to be

parallelized– how to decompose your problem.

• Then choose a programming model based on your problem and your available architecture.

• Take advantage of the resources that are available - compilers libraries, debuggers, performance analyzers, etc. - to help you write efficient parallel code.

Page 112: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Useful References

• Hennessy, J. L. and Patterson, D. A. Computer Architecture: A Quantitative Approach.

• Patterson, D.A. and Hennessy, J.L., Computer Organization and Design: The Hardware/Software Interface.

• D. Dowd, High Performance Computing.

• D. Kuck, High Performance Computing. Oxford U. Press (New York) 1996.

• D. Culler and J. P. Singh, Parallel Computer Architecture.

Page 113: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Outline

• Preface

• What is High Performance Computing?

• Parallel Computing

• Distributed Computing, Grid Computing, and More

• Future Trends in HPC

Page 114: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Distributed Computing

• Concept has been used for two decades

• Basic idea: run scheduler across systems to runs processes on least-used systems first– Maximize utilization– Minimize turnaround time

• Have to load executables and input files to selected resource– Shared file system– File transfers upon resource selection

Page 115: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Examples of Distributed Computing

• Workstation farms, Condor flocks, etc.– Generally share file system

• SETI@home, Entropia, etc.– Only one source code; central server copies

correct binary code and input data to each system

• Napster, Gnutella: file/data sharing

• NetSolve– Runs numerical kernel on any of multiple

independent systems, much like a Grid solution

Page 116: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

SETI@home: Global Distributed Computing

• Running on 500,000 PCs, ~1000 CPU Years per Day– 485,821 CPU Years so far

• Sophisticated Data & Signal Processing Analysis

• Distributes Datasets from Arecibo Radio Telescope

Page 117: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Distributed vs. Parallel Computing

• Different– Distributed computing executes independent (but

possibly related) applications on different systems; jobs do not communicate with each other

– Parallel computing executes a single application across processors, distributing the work and/or data but allowing communication between processes

• Non-exclusive: can distribute parallel applications to parallel computing systems

Page 118: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Computing

• Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships.

• Resources (HPC systems, visualization systems & displays, storage systems, sensors, instruments, people) are integrated via ‘middleware’ to facilitate use of all resources.

Page 119: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Why Grids?

• Resources have different functions, but multiple classes resources are necessary for most interesting problems.

• Power of any single resource is small compared to aggregations of resources

• Network connectivity is increasing rapidly in bandwidth and availability

• Large problems require teamwork and computation

Page 120: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Network Bandwidth Growth

• Network vs. computer performance– Computer speed doubles every 18 months– Network speed doubles every 9 months– Difference = order of magnitude per 5 years

• 1986 to 2000– Computers: x 500– Networks: x 340,000

• 2001 to 2010– Computers: x 60– Networks: x 4000

Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

Page 121: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Possibilities

• A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour

• 1,000 physicists worldwide pool resources for petaflop analyses of petabytes of data

• Civil engineers collaborate to design, execute, & analyze shake table experiments

• Climate scientists visualize, annotate, & analyze terabyte simulation datasets

• An emergency response team couples real time data, weather model, population data

Page 122: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Some Grid Usage Models

• Distributed computing: job scheduling on Grid resources with secure, automated data transfer

• Workflow: synchronized scheduling and automated data transfer from one system to next in pipeline (e.g. HPC system to visualization lab to storage system)

• Coupled codes, with pieces running on different systems simultaneously

• Meta-applications: parallel apps spanning multiple systems

Page 123: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Usage Models

• Some models are similar to models already being used, but are much simpler due to:– single sign-on– automatic process scheduling– automated data transfers

• But Grids can encompass new resources likes sensors and instruments, so new usage models will arise

Page 124: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Selected Major Grid Projects

Name URL & Sponsors FocusAccess Grid www.mcs.anl.gov/FL/

accessgrid; DOE, NSFCreate & deploy group collaboration systems using commodity technologies

BlueGrid IBM Grid testbed linking IBM laboratories

DISCOM www.cs.sandia.gov/discomDOE Defense Programs

Create operational Grid providing access to resources at three U.S. DOE weapons laboratories

DOE Science Grid

sciencegrid.org

DOE Office of Science

Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities

Earth System Grid (ESG)

earthsystemgrid.orgDOE Office of Science

Delivery and analysis of large climate model datasets for the climate research community

European Union (EU) DataGrid

eu-datagrid.org

European Union

Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics

g

g

g

g

g

g

Page 125: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Selected Major Grid Projects

Name URL/Sponsor FocusEuroGrid, Grid Interoperability (GRIP)

eurogrid.org

European Union

Create technologies for remote access to supercomputer resources & simulation codes; in GRIP, integrate with Globus

Fusion Collaboratory fusiongrid.org

DOE Off. Science

Create a national computational collaboratory for fusion research

Globus Project globus.org

DARPA, DOE, NSF, NASA, Msoft

Research on Grid technologies; development and support of Globus Toolkit; application and deployment

GridLab gridlab.org

European Union

Grid technologies and applications

GridPP gridpp.ac.uk

U.K. eScience

Create & apply an operational grid within the U.K. for particle physics research

Grid Research Integration Dev. & Support Center

grids-center.org

NSF

Integration, deployment, support of the NSF Middleware Infrastructure for research & education

g

g

g

g

g

g

Page 126: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Selected Major Grid Projects

Name URL/Sponsor FocusGrid Application Dev. Software

hipersoft.rice.edu/grads; NSF

Research into program development technologies for Grid applications

Grid Physics Network griphyn.org

NSF

Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS

Information Power Grid

ipg.nasa.gov

NASA

Create and apply a production Grid for aerosciences and other NASA missions

International Virtual Data Grid Laboratory

ivdgl.org

NSF

Create international Data Grid to enable large-scale experimentation on Grid technologies & applications

Network for Earthquake Eng. Simulation Grid

neesgrid.org

NSF

Create and apply a production Grid for earthquake engineering

Particle Physics Data Grid

ppdg.net

DOE Science

Create and apply production Grids for data analysis in high energy and nuclear physics experiments

g

g

g

g

g

g

Page 127: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Selected Major Grid Projects

Name URL/Sponsor FocusTeraGrid teragrid.org

NSF

U.S. science infrastructure linking four major resource sites at 40 Gb/s

UK Grid Support Center

grid-support.ac.uk

U.K. eScience

Support center for Grid projects within the U.K.

Unicore BMBFT Technologies for remote access to supercomputers

g

g

New

There are also many technology R&D projects: e.g., Globus, Condor, NetSolve, Ninf, NWS, etc.

Page 128: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Example Application Projects

• Earth Systems Grid: environment (US DOE)

• EU DataGrid: physics, environment, etc. (EU)

• EuroGrid: various (EU)

• Fusion Collaboratory (US DOE)

• GridLab: astrophysics, etc. (EU)

• Grid Physics Network (US NSF)

• MetaNEOS: numerical optimization (US NSF)

• NEESgrid: civil engineering (US NSF)

• Particle Physics Data Grid (US DOE)

Page 129: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Some Grid Requirements – Systems/Deployment Perspective

• Identity & authentication

• Authorization & policy

• Resource discovery

• Resource characterization

• Resource allocation

• (Co-)reservation, workflow

• Distributed algorithms

• Remote data access

• High-speed data transfer

• Performance guarantees

• Monitoring

• Adaptation

• Intrusion detection

• Resource management

• Accounting & payment

• Fault management

• System evolution

• Etc.

• Etc.

Page 130: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Some Grid Requirements –User Perspective

• Single allocation (or none needed)

• Single sign-on: authentication to any Grid resources authenticates for all others

• Single compute space: one scheduler for all Grid resources

• Single data space: can address files and data from any Grid resources

• Single development environment: Grid tools and libraries that work on all grid resources

Page 131: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The Systems Challenges:Resource Sharing Mechanisms That…

• Address security and policy concerns of resource owners and users

• Are flexible enough to deal with many resource types and sharing modalities

• Scale to large number of resources, many participants, many program components

• Operate efficiently when dealing with large amounts of data & computation

Page 132: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The Security Problem

• Resources being used may be extremely valuable & the problems being solved extremely sensitive

• Resources are often located in distinct administrative domains– Each resource may have own policies & procedures

• The set of resources used by a single computation may be large, dynamic, and/or unpredictable– Not just client/server

• It must be broadly available & applicable– Standard, well-tested, well-understood protocols– Integration with wide variety of tools

Page 133: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The Resource Management Problem

• Enabling secure, controlled remote access to computational resources and management of remote computation– Authentication and authorization– Resource discovery & characterization– Reservation and allocation– Computation monitoring and control

Page 134: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Systems Technologies

• Systems and security problems addressed by new protocols & services. E.g., Globus:– Grid Security Infrastructure (GSI) for security– Globus Metadata Directory Service (MDS) for

discovery– Globus Resource Allocations Manager (GRAM)

protocol as a basic building block• Resource brokering & co-allocation services

– GridFTP for data movement

Page 135: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The Programming Problem

• How does a user develop robust, secure, long-lived applications for dynamic, heterogeneous, Grids?

• Presumably need:– Abstractions and models to add to

speed/robustness/etc. of development– Tools to ease application development and

diagnose common problems– Code/tool sharing to allow reuse of code

components developed by others

Page 136: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Programming Technologies

• “Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …)– Seems unlikely there is one solution

• Most applications have been written “from scratch,” with or without Grid services

• Application-specific libraries have been shown to provide significant benefits

• No new language, programming model, etc., has yet emerged that transforms things– But certainly still quite possible

Page 137: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Examples of GridProgramming Technologies

• MPICH-G2: Grid-enabled message passing

• CoG Kits, GridPort: Portal construction, based on N-tier architectures

• GDMP, Data Grid Tools, SRB: replica management, collection management

• Condor-G: simple workflow management

• Legion: object models for Grid computing

• Cactus: Grid-aware numerical solver framework– Note tremendous variety, application focus

Page 138: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

MPICH-G2: A Grid-Enabled MPI

• A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments– Based on the Argonne MPICH implementation of MPI

(Gropp and Lusk)

• Globus services for authentication, resource allocation, executable staging, output, etc.

• Programs run in wide area without change!

• See also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi

Page 139: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Events

• Global Grid Forum: working meeting– Meets 3 times/year, alternates U.S.-Europe, with

July meeting as major event

• HPDC: major academic conference– HPDC-11 in Scotland with GGF-8, July 2002

• Other meetings include– IPDPS, CCGrid, EuroGlobus, Globus Retreats

www.gridforum.org, www.hpdc.org

Page 140: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Useful References

• Book (Morgan Kaufman)– www.mkp.com/grids

• Perspective on Grids– “The Anatomy of the Grid: Enabling Scalable

Virtual Organizations”, IJSA, 2001– www.globus.org/research/papers/anatomy.pdf

• All URLs in this section of the presentation, especially:– www.gridforum.org, www.grids-center.org,

www.globus.org

Page 141: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Outline

• Preface

• What is High Performance Computing?

• Parallel Computing

• Distributed Computing, Grid Computing, and More

• Future Trends in HPC

Page 142: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Value of Understanding Future Trends

• Monitoring and understanding future trends in HPC is important:– users: applications should be written to be

efficient on current and future architectures– developers: tools should be written to be efficient

on current and future architectures– computing centers: system purchases are

expensive and should have upgrade paths

Page 143: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

The Next Decade

• 1980s and 1990s:– academic and government requirements strongly

influenced parallel computing architectures– academic influence was greatest in developing

parallel computing software (for science & eng.)– commercial influence grew steadily in late 1990s

• In the next decade:– commercialization will become dominant in

determining the architecture of systems– academic/research innovations will continue to

drive the development of the HPC software

Page 144: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Commercialization

• Computing technologies (including HPC) are now propelled by profits, not sustained by subsidies– Web servers, databases, transaction processing

and especially multimedia applications drive the need for computational performance.

– Most HPC systems are ‘scaled up’ commercial systems, with less additional hardware and software compared to commercial systems.

– It’s not engineering, it’s economics.

Page 145: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Processors and Nodes

• Easy predictions:– microprocessors performance increase continues

at ~60% per year (Moore’s Law) for 5+ years.– total migration to 64-bit microprocessors– use of even more cache, more memory hierarchy.– increased emphasis on SMPs

• Tougher predictions:– resurgence of vectors in microprocessors? Maybe– dawn of multithreading in microprocessors? Yes

Page 146: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Building Fat Nodes: SMPs

• More processors are faster, of course– SMPs are simplest form of parallel systems– efficient if not limited by memory bus contention:

small numbers of processors

• Commercial market for high performance servers at low cost drives need for SMPs

• HPC market for highest performance, ease of programming drives development of SMPs

Page 147: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Building Fat Nodes: SMPs

• Trends are to:– build bigger SMPs– attempt to share memory across SMPs (cc-

NUMA)

Page 148: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Resurgence of Vectors

• Vectors keep functional units busy– vector registers are very fast– vectors are more efficient for loops of any stride– vectors are great for many science & eng. apps

• Possible resurgence of vectors– SGI/Cray plans has built SV1ex, building SV2– NEC continues building (CMOS) parallel-vector,

Cray-like systems– Microprocessors (Pentium4, G4) have added

vector-like functionality for multimedia purposes

Page 149: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Dawn of Multithreading?

• Memory speed will always be a bottleneck

• Must overlap computation with memory accesses: tolerate latency– requires immense amount of parallelism– requires processors with multiple streams and

compilers that can define multiple threads

Page 150: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Multithreading Diagram

Page 151: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Multithreading

• Tera MTA was first multithreaded HPC system– scientific success, production failure– MTA-2 will be delivered in a few months.

• Multithreading will be implemented (in more limited fashion) in commercial processors.

Page 152: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Networks

• Commercial network bandwidth and latency approaching custom performance.

• Dramatic performance increases likely– “the network is the computer” (Sun slogan)– more companies, more competition– no severe physical, economic limits

• Implications of faster networks– more clusters– collaborative, visual supercomputing– Grid computing

Page 153: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Commodity Clusters

• Clusters provide some real advantages:– computing power: leverage workstations and PCs– high availability: replace one at a time– inexpensive: leverage existing competitive market– simple path to installing parallel computing system

• Major disadvantages were robustness of hardware and software, but both have improved

• NCSA has huge clusters in production based on Pentium III and Itanium.

Page 154: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Clustering SMPs

• Inevitable (already here!):– leverages SMP nodes effectively for same

reasons clusters leverage individual processors– Commercial markets drive need for SMPs

• Combine advantages of SMPs, clusters– more powerful nodes through multiprocessing– more powerful nodes -> more powerful cluster– Interconnect scalability requirements reduced for

same number of processors

Page 155: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Continued Linux Growth in HPC

• Linux popularity growing due to price and availability of source code

• Major players now supporting Linux, esp. IBM

• Head start on Intel Itanium

Page 156: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Programming Tools

• However, programming tools will continue to lag behind hardware and OS capabilities:– Researchers will continue to drive the need for the

most powerful tools to create the most efficient applications on the largest systems

– Such technologies will look more like MPI than the Web… maybe worse due to multi-tiered clusters of SMPs (MPI + OpenMP; Active messages + threads?).

– Academia will continue to play a large role in HPC software development.

Page 157: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid Computing

• Parallelism will continue to grow in the form of– SMPs– clusters– Cluster of SMPs (and maybe DSMs)

• Grids provide the next level– connects multiple computers into virtual systems– Already here:

• IBM, other vendors supporting Globus• SC2001 dominated by Grid technologies• Many major government awards (>$100M in past year)

Page 158: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Emergence of Grids

• But Grids enable much more than apps running on multiple computers (which can be achieved with MPI alone)– virtual operating system: provides global

workspace/address space via a single login– automatically manages files, data, accounts, and

security issues– connects other resources (archival data facilities,

instruments, devices) and people (collaborative environments)

Page 159: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grids Are Inevitable

• Inevitable (at least in HPC):– leverages computational power of all available

systems– manages resources as a single system--easier for

users– provides most flexible resource selection and

management, load sharing– researchers’ desire to solve bigger problems will

always outpace performance increases of single systems; just as multiple processors are needed, ‘multiple multiprocessors’ will be deemed so

Page 160: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid-Enabled Software

• Commercial applications on single parallel systems and Grids will require that:– underlying architectures must be invisible: no

parallel computing expertise required– usage must be simple– development must not be to difficult

• Developments in ease-of-use will benefit scientists as users (not as developers)

• Web-based interfaces: transparent supercomputing (MPIRE, Meta-MEME, etc.).

Page 161: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid-Enabled Collaborative andVisual Supercomputing

• Commercial world demands:– multimedia applications– real-time data processing– online transaction processing– rapid prototyping and simulation in engineering,

chemistry and biology– interactive, remote collaboration– 3D graphics, animation and virtual reality

visualization

Page 162: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Grid-enabled Collaborative, Visual Supercomputing

• Academic world will leverage resulting Grids linking computing and visualization systems via high-speed networks:– collaborative post-processing of data already here– simulations will be visualized in 3D, virtual worlds

in real-time– such simulations can then be ‘steered’– multiple scientists can participate in these visual

simulations– the ‘time to insight’ (SGI slogan) will be reduced

Page 163: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Web-based Grid Computing

• Web currently used mostly for content delivery

• Web servers on HPC systems can execute applications

• Web servers on Grids can launch applications, move/store/retrieve data, display visualizations, etc.

• NPACI HotPage already enables single sign-on to NPACI Grid Resources

Page 164: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Summary of Expectations

• HPC systems will grow in performance but probably change little in design (5-10 years):– HPC systems will be larger versions of smaller

commercial systems, mostly large SMPs and clusters of inexpensive nodes

– Some processors will exploit vectors, as well as more/larger caches.

– Best HPC systems will have been designed ‘top-down’ instead of ‘bottom-up’, but all will have been designed to make the ‘bottom’ profitable.

– Multithreading is the only likely, near-term major architectural change.

Page 165: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Summary of Expectations

• Using HPC systems will change much more:– Grid computing will become widespread in HPC

and in commercial computing– Visual supercomputing and collaborative

simulation will be commonplace.– WWW interfaces to HPC resources will make

transparent supercomputing commonplace.

• But programming the most powerful resources most effectively will remain difficult.

Page 166: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Caution

• Change is difficult to predict (and I am an astrophysicist, not an astrologer):– Accuracy of linear extrapolation predictions

degrade over long times (like weather forecasts)– Entirely new ideas can change everything:

• WWW is an excellent example; Grid computing is probably the next

• Eventually, something truly different will replace CMOS technology (nanotechnology? molecular computing? DNA computing?)

Page 167: Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More Dr. Jay Boisseau Director, Texas Advanced

Introduction to High Performance Computing

Final Prediction

“The thing about change is that things will be different afterwards.”

Alan McMahon (Cornell University)