52
CSE 8383 - Advanced Computer Architecture Week-12 April 8, 2004 engr.smu.edu/~rewini/8383

CSE 8383 - Advanced Computer Architecture

  • Upload
    randy

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE 8383 - Advanced Computer Architecture. Week-12 April 8, 2004 engr.smu.edu/~rewini/8383. Contents. Dynamic Networks Message Passing Mechanisms Message Passing in PVM. Dynamic Network Analysis. Parameters: Cost: number of switches Delay: latency Blocking characteristics - PowerPoint PPT Presentation

Citation preview

Page 1: CSE 8383 - Advanced Computer Architecture

CSE 8383 - Advanced Computer Architecture

Week-12April 8, 2004

engr.smu.edu/~rewini/8383

Page 2: CSE 8383 - Advanced Computer Architecture

Contents Dynamic Networks Message Passing Mechanisms Message Passing in PVM

Page 3: CSE 8383 - Advanced Computer Architecture

Dynamic Network Analysis Parameters:

Cost: number of switches Delay: latency Blocking characteristics Fault tolerance

Page 4: CSE 8383 - Advanced Computer Architecture

MIMD Distributed Memory Systems

Interconnection Networks

M M M M

P P P P

Page 5: CSE 8383 - Advanced Computer Architecture

Interconnection Network Taxonomy

Interconnection Network

Static Dynamic

Bus-based Switch-based1-D 2-D HC

Single Multiple SS MS Crossbar

Page 6: CSE 8383 - Advanced Computer Architecture

Dynamic Interconnection Networks Communication patterns are based

on program demands

Connections are established on the fly during program execution

Multistage Interconnection Network (MIN) and Crossbar

Page 7: CSE 8383 - Advanced Computer Architecture

Switch Modules A x B switch module A inputs and B outputs In practice, A = B = power of 2 Each input is connected to one or

more outputs (conflicts must be avoided)

One-to-one (permutation) and one-to-many are allowed

Page 8: CSE 8383 - Advanced Computer Architecture

Binary Switch

2x2Switch

Legitimate States = 4

Permutation Connections = 2

Page 9: CSE 8383 - Advanced Computer Architecture

Legitimate Connections

Straight Exchange

Upper-broadcast

Lower-broadcastThe different setting of the 2X2 SE

Page 10: CSE 8383 - Advanced Computer Architecture

Group WorkGeneral Case ??

Page 11: CSE 8383 - Advanced Computer Architecture

Multistage Interconnection Networks

ISC1ISC1 ISC2ISC2 ISCnISCn

switches switches switches

ISC Inter-stage Connection Patterns

Page 12: CSE 8383 - Advanced Computer Architecture

Perfect-Shuffle Routing Function Given x = {an, an-1, …, a2, a1} P(x) = {an-1, …, a2, a1 , an}

X = 110001P(x) = 100011

Page 13: CSE 8383 - Advanced Computer Architecture

Perfect Shuffle Example000 000001 010010 100011 110100 001101 011110 101111 111

Page 14: CSE 8383 - Advanced Computer Architecture

Perfect-Shuffle000001010011

100101110111

000001010011

100101110111

Page 15: CSE 8383 - Advanced Computer Architecture

Exchange Routing Function Given x = {an, an-1, …, a2, a1}

Ei(x) = {an, an-1, …, ai, …, a2, a1}

X = 0000000E3(x) = 0000100

Page 16: CSE 8383 - Advanced Computer Architecture

Exchange E1

000 001001 000010 011011 010100 101101 100110 111111 110

Page 17: CSE 8383 - Advanced Computer Architecture

Exchange E1

000001010011

100101110111

000001010011

100101110111

Page 18: CSE 8383 - Advanced Computer Architecture

Butterfly Routing Function Given x = {an, an-1, …, a2, a1} B(x) = {a1 , an-1, …, a2, an}

X = 010001P(x) = 110000

Page 19: CSE 8383 - Advanced Computer Architecture

Butterfly Example000 000001 100010 010011 110100 001101 101110 011111 111

Page 20: CSE 8383 - Advanced Computer Architecture

Butterfly000001010011

100101110111

000001010011

100101110111

Page 21: CSE 8383 - Advanced Computer Architecture

Multi-stage network000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 22: CSE 8383 - Advanced Computer Architecture

MIN (cont.)1

2

3

4

5

6

7

8

9

10

11

12

001010011

100101

110111

000000001

010011

100101

110111

An 8X8 Banyan network

Page 23: CSE 8383 - Advanced Computer Architecture

Min ImplementationControl (X)

Source (S) Destination (D)

X = f(S,D)

Page 24: CSE 8383 - Advanced Computer Architecture

Example

X = 0 X = 1

(crossed) (straight)

A

B

C

D

A

B

C

D

Page 25: CSE 8383 - Advanced Computer Architecture

Consider this MINS1S2

S3

S4

S5

S6

S7

S8

D1D2

D3D4

D5

D6

D7

D8

stage 1 stage 2 stage 3

Page 26: CSE 8383 - Advanced Computer Architecture

Example (Cont.) Let control variable be X1, X2, X3 Find the values of X1, X2, X3 to connect:

S1 D6 S7 D5 S4 D1

Page 27: CSE 8383 - Advanced Computer Architecture

The 3 connectionsS1S2

S3

S4

S5

S6

S7

S8

D1D2

D3D4

D5

D6

D7

D8

stage 1 stage 2 stage 3

Page 28: CSE 8383 - Advanced Computer Architecture

Boolean Functions X = x1, x2, x3 S = s2, s2, s3 D = d1, d2, d3

Find X = f(S,D)

Page 29: CSE 8383 - Advanced Computer Architecture

Crossbar Switch M1 M2 M3 M4 M5 M6 M7 M8

P1

P2

P3

P4

P5

P6

P7

P8

Page 30: CSE 8383 - Advanced Computer Architecture

Analysis and performance metricsdynamic networks Performance comparison of dynamic networks

Networks Delay Cost Blocking

Degree of FT

Bus O(N) O(1) Yes 0Multiple-bus O(mN) O(m) Yes (m-1)

MIN O(logN) O(NlogN) Yes 0Crossbar O(1) O(N2) No 0

Page 31: CSE 8383 - Advanced Computer Architecture

Message Passing Mechanisms Message Format

Message arbitrary number of fixed length packets

Packet basic unit containing destination address. Sequence number is needed

A packet can further be divided into flits (flow control digits)

Routing and sequence occupy header flit

Page 32: CSE 8383 - Advanced Computer Architecture

Message, Packets, Flits

Message

Packet

Data flit

Destination

Sequence

Page 33: CSE 8383 - Advanced Computer Architecture

Store and Forward Routing Packets are the basic units of

information flow Each node uses a packet buffer A packet is transferred from S to D

through a sequence of intermediate nodes

Channel and buffer must be available

Page 34: CSE 8383 - Advanced Computer Architecture

Wormhole Routing Flits are the basic units of information

flow Each node uses a flit buffer Flits are transferred from S to D through a

sequence of intermediate routers in order (Pipeline)

Can be visualized as a railroad train Flits from different packets cannot be

mixed up

Page 35: CSE 8383 - Advanced Computer Architecture

Latency Analysis L packet length (in bits) W Channel bandwidth (bits/sec) D Distance (number of hops) F flit length (in bits) TSF = D * L/W TWH = L/W + D* F/W L/W if L>>F

(independent of D)

Page 36: CSE 8383 - Advanced Computer Architecture

Communication Patterns Point to Point 1 - 1 Multicast 1 - n Broadcast 1 - all Conference n - n

Page 37: CSE 8383 - Advanced Computer Architecture

Routing Efficiency Two Parameters

Channel Traffic (number of channels used to deliver the message involved)

Communication Latency (distance)

Page 38: CSE 8383 - Advanced Computer Architecture

Multicast on a mesh (5 unicasts)

Traffic ?

Latency ?

Page 39: CSE 8383 - Advanced Computer Architecture

Multicast on a mesh (multicast pattern 1

Traffic ?

Latency ?

Page 40: CSE 8383 - Advanced Computer Architecture

Multicast on a mesh (multicast pattern 2)

Traffic ?

Latency ?

Page 41: CSE 8383 - Advanced Computer Architecture

Broadcast (tree structure)3 2 3 4

2 1 2 3

1 1 2

Page 42: CSE 8383 - Advanced Computer Architecture

Message Passing in PVM

User application

Library

Daemon

1

2 3

4

User application

Library

Daemon

5

6 7

8

Sending Task Receiving Task

Page 43: CSE 8383 - Advanced Computer Architecture

Standard PVM asynchronous communication A sending task issues a send command

(point 1) The message is transferred to the

daemon (point 2) Control is returned to the user

application (points 3 & 4) The daemon will transmit the message

on the physical wire sometime after returning control to the user application (point 3)

Page 44: CSE 8383 - Advanced Computer Architecture

Standard PVM asynchronous communication (cont.) The receiving task issues a receive

command (point 5) at some other time In the case of a blocking receive, the

receiving task blocks on the daemon waiting for a message (point 6). After the message arrives, control is returned to the user application (points 7 & 8)

In the case of a non-blocking receive, control is returned to the user application immediately (points 7 & 8)

Page 45: CSE 8383 - Advanced Computer Architecture

Send (3 steps)1. A send buffer must be initialized2. The message is packed into the

buffer3. The completed message is sent

to its destination(s)

Page 46: CSE 8383 - Advanced Computer Architecture

Receive (2 steps)1. The message is received2. The received items are unpacked

Page 47: CSE 8383 - Advanced Computer Architecture

Message Buffers Buffer Creation (before packing)

Bufid = pvm_initsend(encoding_option)

Bufid = pvm_mkbuf(encoding_option)

Encoding option Meaning0 XDR1 No encoding2 Leave data in place

Page 48: CSE 8383 - Advanced Computer Architecture

Message Buffers (cont.) Data Packing

pvm_pk*() pvm_pkstr() – one argument

pvm_pkstr(“This is my data”); Others – three arguments

1. Pointer to the first item2. Number of items to be packed3. Stride

pvm_pkint(my_array, n, 1); Packing functions can be called multiple

times to pack data into a single message

Page 49: CSE 8383 - Advanced Computer Architecture

Sending a message Point to point (one receiver)

info = pvm_send(tid, tag)

broadcast (multiple receivers)info = pvm_mcast(tids, n, tag)info = pvm_bcast(group_name, tag)

Pack and Send (one step)info = pvm_psend(tid, tag, my_array, length, data type)

Page 50: CSE 8383 - Advanced Computer Architecture

Receiving a message Blocking

bufid = pvm_recv(tid, tag)-1 wild card in either tid or tag

Nonblocking bufid = pvm_nrecv(tid, tag) bufid = 0 (no message was received)

Timeout bufid = pvm_trecv(tid, tag, timeout)bufid = 0 (no message was received)

Page 51: CSE 8383 - Advanced Computer Architecture

Different Receive in PVM

Pvm_recv()

wait

Time

Funcitonis called

Time is expired

Message arrival

Blocking

Pvm_nrecv()

Continue execution

Non-blocking

Pvm_trecv()

wait

Timeout

Resume execution

Resume execution

Page 52: CSE 8383 - Advanced Computer Architecture

Data unpackingpvm_upk*() pvm_upkstr() – one argument

pvm_upkstr(string);

Others – three arguments1. Pointer to the first item2. Number of items to be unpacked3. Stride

pvm_upkint(my_array, n, 1);