37
CSE539: Advanced Computer Architecture Chapter 7 Multiprocessors and Multicomputers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University [email protected]

Aca2 07 new

Embed Size (px)

DESCRIPTION

Multiprocessors and multicomputers

Citation preview

Page 1: Aca2 07 new

CSE539: Advanced Computer Architecture

Chapter 7

Multiprocessors and Multicomputers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Sumit Mittu

Assistant Professor, CSE/IT

Lovely Professional University

[email protected]

Page 2: Aca2 07 new

In this chapter…

• Multiprocessor System Interconnects

• Cache Coherence and Synchronization Mechanisms

• Three Generations of Multi-computers

• Message Routing Schemes

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2

Page 3: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3

Page 4: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

• Network Characteristics o Topology

• Dynamic Networks

o Timing control protocol

• Synchronous (with global clock)

• Asynchronous (with handshake or interlocking mechanism)

o Switching method

• Circuit switching

• Packet switching

o Control Strategy

• Centralized (global controller to receive requests from all devices and grant network access)

• Distributed (requests handled by local devices independently)

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4

Page 5: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Bus System o Local Bus (board level)

• Memory bus, data bus

o Backplane Bus (backplane level)

• VME bus (IEEE 1014-1987), Multibus II (IEEE 1296-1987), Futurebus+ (IEEE 896.1-1991)

o I/O Bus (I/O level)

o E.g. Encore Multimax multprocessor’s nanobus

• 20 slots

• 32-bit address path

• 64-bit data path

• Clock rate: 12.5 MHz

• Total Memory bandwidth: 100 Megabytes per second

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5

Page 6: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6

Page 7: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Buses and Caches o Cache Levels

• First level caches

• Second level caches

o Buses

• (Intra) Cluster Bus

• Inter-cluster bus

o Cache coherence

• Snoopy cache protocol for coherence among first level caches of same cluster

• Intra-cluster cache coherence controlled among second level caches and results passed to

first level caches

o Use of Bridges between multiprocessor clusters

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7

Page 8: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Buses and Caches

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8

Page 9: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9

Page 10: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10

Page 11: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

• Crossbar Switch Design o Based on number of network stages

• Single stage (or recirculating) networks

• Multistage networks

o Blocking networks

o Non-blocking (re-arranging) networks

• Crossbar networks

o n x m and n2 Cross-point switch design

o Crossbar benefits and limitations

• Multiport Memory Design o Multiport Memory

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11

Page 12: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12

Page 13: Aca2 07 new

MULTIPROCESSOR SYSTEM INTERCONNECTS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13

Page 14: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14

• Cache Coherence Problem o Inconsistent copies of same memory block in different caches

o Sources of inconsistency:

• Sharing of writable data

• Process migration

• I/O activity

• Protocol Approaches o Snoopy Bus Protocol

o Directory Based Protocol

• Write Policies o (Write-back, Write-through) x (Write-invalidate, Write-update)

Page 15: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15

Page 16: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16

Page 17: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17

Page 18: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18

Page 19: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 19

• Snoopy Bus Protocols o Write-through caches

• Write invalidate coherence protocol for write-through caches

• Write-update coherence protocol for write-through caches

• Data item states:

o VALID

o INVALID

• Possible operations:

o Read by same processor R(i) Read by different processor R( j )

o Write by same processor W(i) Write by different processor W( j )

o Replace by same processor Z(i) Replace by different processor Z( j )

Page 20: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 20

Page 21: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 21

• Snoopy Bus Protocols o Write-through caches – write invalidate scheme

Current

State Operation

New

State

Valid

R(i) Valid

W(i) Valid

Z(i) Invalid

R(j) Valid

W(j) Invalid

Z(j) Valid

Current

State Operation

New

State

Invalid

R(i) Valid

W(i) Valid

Z(i) Invalid

R(j) Invalid

W(j) Invalid

Z(j) Invalid

Page 22: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 22

• Snoopy Bus Protocols o Write-back caches

• Ownership protocol: Write invalidate coherence protocol for write-through caches

• Data item states:

o RO : Read Only (Valid state)

o RW : Read Write (Valid state)

o INV : Invalid state

• Possible operations:

o Read by same processor R(i) Read by different processor R( j )

o Write by same processor W(i) Write by different processor W( j )

o Replace by same processor Z(i) Replace by different processor Z( j )

Page 23: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 23

• Snoopy Bus Protocols o Write-back caches – write invalidate (ownership protocol) scheme

Current

State Operation

New

State

RO

(Valid)

R(i) RO

W(i) RW

Z(i) INV

R(j) RO

W(j) INV

Z(j) RO

Current

State Operation

New

State

RW

(Valid)

R(i) RW

W(i) RW

Z(i) INV

R(j) RO

W(j) INV

Z(j) RW

Current

State Operation

New

State

INV

(Invalid)

R(i) RO

W(i) RW

Z(i) INV

R(j) INV

W(j) INV

Z(j) INV

Page 24: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 24

Page 25: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 25

• Snoopy Bus Protocols o Write-once Protocol

• First write using write-through policy

• Subsequent writes using write-back policy

• In both cases, data item copy in remote caches is invalidated

• Data item states:

o Valid :cache block consistent with main memory copy

o Reserved : data has been written exactly once and is consistent with main memory

copy

o Dirty : data is written more than once but is not consistent with main memory copy

o Invalid :block not found in cache or is inconsistent with main memory copy

Page 26: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 26

• Snoopy Bus Protocols o Write-once Protocol

• Cache events and actions:

o Read-miss

o Read-hit

o Write-miss

o Write-hit

o Block replacement

Page 27: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 27

• Multilevel Cache Coherence

Page 28: Aca2 07 new

CACHE COHERENCE MECHANISMS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 28

• Protocol Performance issues o Snoopy Cache Protocol Performance determinants:

• Workload Patterns

• Implementation Efficiency

o Goals/Motivation behind using snooping mechanism

• Reduce bus traffic

• Reduce effective memory access time

o Data Pollution Point

• Miss ratio decreases as block size increases, up to a data pollution point (that is, as blocks become larger, the probability of finding a desired data item in the cache increases).

• The miss ratio starts to increasing as the block size increases to data pollution point.

o Ping-Pong effect on data shared between multiple caches

• If two processes update a data item alternately, data will continually migrate between two caches with high miss-rate

Page 29: Aca2 07 new

THREE GENERATIONS OF MULTICOMPUTERS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 29

• Multicomputer v/s Multiprocessor

• Design Choices for Multi-computers o Processors

• Low cost commodity (off-the-shelf) processors

o Memory Structure

• Distributed memory organization

• Local memory with each processor

o Interconnection Schemes

• Message passing, point-to-point , direct networks with send/receive semantics with/without uniform message communication speed

o Control Strategy

• Asynchronous MIMD, MPMD and SPMD operations

Page 30: Aca2 07 new

THREE GENERATIONS OF MULTICOMPUTERS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 30

Page 31: Aca2 07 new

THREE GENERATIONS OF MULTICOMPUTERS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 31

• The Past, Present and Future Development o First Generation

• Example Systems: Caltech’s Cosmic Cube, Intel iPSC/1, Ametek S/14, nCube/10

o Second Generation

• Example Systems: iPSC/2, i860, Delta, nCube/2, Supernode 1000, Ametek Series 2010

o Third Generation

• Example Systems: Caltech’s Mosaic C, J-Machine, Intel Paragon

o First and second generation multi-computers are regarded as medium-grain systems

o Third generation multi-computers were regarded as fine-grain systems.

o Fine-grain and shared memory approach can, in theory, combine the relative merits of

multiprocessors and multi-computers in a heterogeneous processing environment.

Page 32: Aca2 07 new

THREE GENERATIONS OF MULTICOMPUTERS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 32

1st Generation 2nd Generation 3rd Generation

Typical

Node

Attributes

MIPS 1 10 100

MFLOPS (scalar) 0.1 2 40

MFLOPS (vector) 10 40 200

Memory Size (in MB) 0.5 4 32

Typical

System

Attributes

Number of Nodes (N) 64 256 1024

MIPS 64 2560 100 K

MFLOPS (scalar) 6.4 512 40 K

MFLOPS (vector) 640 10 K 200 K

Memory Size (in MB) 32 1 K 32 K

Communi-

cation

Latency

Local Neighbour

(in microseconds) 2000 5 0.5

Non-local node

(in microseconds) 6000 5 0.5

Page 33: Aca2 07 new

THREE GENERATIONS OF MULTICOMPUTERS

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 33

Page 34: Aca2 07 new

MESSAGE PASSING SCHEMES

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 34

• Message Routing Schemes

• Message Formats o Messages

o Packets

o Flits (Control Flow Digits)

• Data Only Flits

• Sequence Number

• Routing Information

• Store-and-forward routing

• Wormhole routing

Page 35: Aca2 07 new

MESSAGE PASSING SCHEMES

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 35

Page 36: Aca2 07 new

MESSAGE PASSING SCHEMES

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 36

• Asynchronous Pipelining

Page 37: Aca2 07 new

MESSAGE PASSING SCHEMES

Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 37

• Latency Analysis o L: Packet length (in bits)

o W: Channel Bandwidth (in bits per second)

o D: Distance (number of nodes traversed minus 1)

o F: Flit length (in bits)

o Communication Latency in Store-and-forward Routing

• TSF = L (D + 1) / W

o Communication Latency in Wormhole Routing

• TWH = L / W + F D / W