33
Introduction to Parallel Computing George Karypis Parallel Programming Platforms

Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

  • Upload
    others

  • View
    29

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Introduction to Parallel Computing

George KarypisParallel Programming Platforms

Page 2: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Elements of a Parallel ComputerHardware

Multiple ProcessorsMultiple MemoriesInterconnection Network

System SoftwareParallel Operating SystemProgramming Constructs to Express/Orchestrate Concurrency

Application SoftwareParallel Algorithms

Goal: Utilize the Hardware, System, & Application Software to either

Achieve Speedup: Tp = Ts/pSolve problems requiring a large amount of memory.

Page 3: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Parallel Computing PlatformLogical Organization

The user’s view of the machine as it is being presented via its system software

Physical OrganizationThe actual hardware architecture

Physical Architecture is to a large extent independent of the Logical Architecture

Page 4: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Logical Organization ElementsControl Mechanism

SISD/SIMD/MIMD/MISDSingle/Multiple Instruction Stream & Single/Multiple Data Stream

SPMD: Single Program Multiple Data

Page 5: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Logical Organization Elements

Communication ModelShared-Address Space

UMA/NUMA/ccNUMA

Message-Passing

Page 6: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Physical OrganizationIdeal Parallel Computer Architecture

PRAM: Parallel Random Access MachinePRAM Models

EREW/ERCW/CREW/CRCWExclusive/Concurrent Read and/or Write

Concurrent Writes are resolved viaCommon/Arbitrary/Priority/Sum

Page 7: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Physical OrganizationInterconnection Networks (ICNs)

Provide processor-to-processor and processor-to-memory connectionsNetworks are classified as:

DynamicThe network consists of switching elements that the various processors attach to

indirect networkHistorically used to link processors-to-memory

shared-memory systems

StaticConsist of a number of point-to-point links

direct networkHistorically used to link processors-to-processors

distributed-memory system

Page 8: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Static & Dynamic ICNs

Page 9: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Evaluation Metrics for ICNsDiameter

The maximum distance between any two nodesSmaller the better.

ConnectivityThe minimum number of arcs that must be removed to break it into two disconnected networks

Larger the betterMeasures the multiplicity of paths

Bisection widthThe minimum number of arcs that must be removed to partition the network into two equal halves.

Larger the betterBisection bandwidth

Applies to networks with weighted arcs—weights correspond to the link width (how much data it can transfer)The minimum volume of communication allowed between any two halves of a network

Larger the betterCost

The number of links in the networkSmaller the better

Page 10: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Metrics and Dynamic Networks

Page 11: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesBus-Based Networks

Shared mediumInformation is being broadcastedEvaluation:

Diameter: O(1)Connectivity: O(1)Bisection width: O(1)Cost: O(p)

Page 12: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesCrossbar Networks

Switch-based networkSupports simultaneous connectionsEvaluation:

Diameter: O(1)Connectivity: O(1)?Bisection width: O(p)?Cost: O(p2)

Page 13: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesMultistage Interconnection Networks

Page 14: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Multistage Switch Architecture

Pass-through

Cross-over

Page 15: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Connecting the Various Stages

Page 16: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Blocking in a Multistage SwitchRouting is done by comparing the bit-levelrepresentation of source and destination addresses.-match goes via pass-through-mismatch goes via cross-over

Page 17: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesComplete and star-connected networks.

Page 18: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesCartesian Topologies

Page 19: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesHypercubes

Page 20: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Network TopologiesTrees

Page 21: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Summary of Performance Metrics

Page 22: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Physical OrganizationCache Coherence in Shared Memory Systems

A certain level of consistency must be maintained for multiple copies of the same dataRequired to ensure proper semantics and correct program execution

serializabilityTwo general protocols for dealing with it

invalidate & update

Page 23: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Invalidate/Update Protocols

Page 24: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Invalidate/Update ProtocolsThe preferred scheme depends on the characteristics of the underlying application

frequency of reads/writes to shared variablesClassical trade-off between communication overhead (updates) and idling (stalling in invalidates)Additional problems with false sharingExisting schemes are based on the invalidate protocol

A number of approaches have been developed for maintaining the state/ownership of the shared data

Page 25: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Communication Costs in Parallel Systems

Message-Passing SystemsThe communication cost of a data-transfer operation depends on:

start-up time: tsadd headers/trailer, error-correction, execute the routing algorithm, establish the connection between source & destination

per-hop time: thtime to travel between two directly connected nodes.

node latencyper-word transfer time: tw

1/channel-width

Page 26: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Store-and-Forward & Cut-Through Routing

Page 27: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Cut-through Routing Deadlocks

Messages 0, 1, 2, and 3need to go to nodes A, B,C, and D, respectively

Page 28: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Communication Model Used for this Class

We will assume that the cost of sending a message of size m is:

In general true because ts is much larger than th and for most of the algorithms that we will study mtw is much larger than lth

Page 29: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Routing MechanismsRouting:

The algorithm used to determine the path that a message will take to go from the source to destination

Can be classified along different dimensions

minimal vs non-minimaldeterministic vs adaptive

Page 30: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Dimension Ordered RoutingThere is a predefined ordering of the dimensionsMessages are routed along the dimensions in that order until they cannot move any further

X-Y routing for meshesE-cube routine for hypercubes

Page 31: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Topology EmbeddingsMapping between networks

Useful in the early days of parallel computing when topology specific algorithms were being developed.

Embedding quality metricsdilation

maximum number of lines an edge is mapped tocongestion

maximum number of edges mapped on a single link

Page 32: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Mapping a Cartesian Topology onto a Hypercube

Cool things ☺

Page 33: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of

Mapping a Cartesian Topology onto a Hypercube