24
Interconnect & Communication 6.004 – Fall 2002 11/14/0 L20 – Communication 1

Interconnect & Communication

Embed Size (px)

DESCRIPTION

Interconnect & Communication. 6.004 – Fall 2002 11/14/0L20 – Communication 1. Computer Technologies What’s the most important part of this picture?. Raid systems. Graphics Acceleration. 6.004 – Fall 2002 11/14/0L20 – Communication 2. Ancient Times (Ad hoc connections). - PowerPoint PPT Presentation

Citation preview

Page 1: Interconnect & Communication

Interconnect & Communication

6.004 – Fall 2002 11/14/0 L20 – Communication 1

Page 2: Interconnect & Communication

Computer TechnologiesWhat’s the most important part of this picture?

6.004 – Fall 2002 11/14/0 L20 – Communication 2

Raid systems

Graphics

Acceleration

Page 3: Interconnect & Communication

System Interfaces & Modularity

6.004 – Fall 2002 11/14/0 L20 – Communication 3

Ancient Times (Ad hoc connections)

Late 60s (Processor-dependent Bus)

80s (Processor-independent Bus)

Today Buses Galore

Back-side bus

Front-side bus (PCI and EISA)

“AGP” bus

Page 4: Interconnect & Communication

Interface Standard: Backplane Bus

6.004 – Fall 2002 11/14/0 L20 – Communication 4

The backplane provides:

Power

Common system clock

Wires for communication

Modular cards that plug into a common backplane:

CPUs

Memories

Bulk storage

I/O devices

S/W?

BUS LINES

Printed Circuit Cards MODULE LOGIC

address

data

operationstart

finishclock

Page 5: Interconnect & Communication

Dawn of the Dumb Bus: ISA & EISA

6.004 – Fall 2002 11/14/0 L20 – Communication 5

ISA bus (Original IBM PC bus) - Pin out and timing is nearly identicalto the 8088 spec.

Original primitive approach -- Just take the control signals and data bus from the CPU module, buffer it, and call it a bus.

Page 6: Interconnect & Communication

Smarter “Processor Independent” Buses

6.004 – Fall 2002 11/14/0 L20 – Communication 6

TERMINOLOGY –

BUS MASTER – a module that initiates a bus transaction. (CPU, disk controller, etc.)

BUS SLAVE – a module that responds to a bus request. (Memory, I/O device, etc.)

BUS CYCLE – The period from whena transaction isrequested until it is served.

NuBus, PCI…Isolate basic communication primitives from processor architecture: • Simple read/write protocols • Symmetric: any module can become “Master” (smart I/O, multiple processors, etc) • Support for “plug & play” expansionGoal: vendor-independent interface standard

Page 7: Interconnect & Communication

Buses, Interconnect…what’s the big deal?

Aren’t buses simply logic circuits with long wires?

6.004 – Fall 2002 11/14/0 L20 – Communication 7

Wires: interconnect engineer’s view:

Transmission lines.

Finite signal propagation

velocity.

Space matters.

Time matters.

Reality matters.

Wires: circuit theorist’s view:

Equipotential “nodes” of a

circuit.

Instant propagation of v, i

over entire node.

“space” abstracted out of

design model.

Time issues dictated by RLC

elements; wires are

timeless.

Page 8: Interconnect & Communication

Bus Lines as Transmission Lines

6.004 – Fall 2002 11/14/0 L20 – Communication 8

ANALOG ISSUES:● Propagation times ■ Light travels about 1 ft / ns (about 7”/ns in a wire)● Skew ■ Different points along the bus see the signals at different times● Reflections & standing waves ■ At each interface (places where the propagation medium changes) the signal may reflect if the impedances are not matched. ■ Make a transition on a long line – may have to wait many transition times for echos to subside.

TIME

Page 9: Interconnect & Communication

Coping with Analog Issues...

6.004 – Fall 2002 11/14/0 L20 – Communication 9

We’d like our bus to be technology independent... • Self-timed protocols allow bus transactions to accommodate

varying response times;

• Asynchronous protocols avoid the need to pick a (technologydependent)

clock frequency.

BUT... asynchronous protocols are vulnerable to analog-domain

problems, like the infamous

WIRED-OR GLITCH: what happens

when a switch is opened???

COMMON COMPROMISE: Synchronous, Self-Timed protocols

• Broadcast bus clock

• Signals sampled at “safe” times

* DEAL WITH: noise, clock skew (wrt signals)

Page 10: Interconnect & Communication

Synchronous Bus Clock Timing

6.004 – Fall 2002 11/14/0 L20 – Communication 10

Allow for several “round-trip” bus delays so that ringing can die down.

sample edge assertion edge

Signal at source

Signal at destination

“Settling

Time”“ds-skew time”

Page 11: Interconnect & Communication

A Simple Bus Transaction

6.004 – Fall 2002 11/14/0 L20 – Communication 11

sample edge assertion edge

MASTER:1) Chooses bus operation 2) Asserts an address3) Waits for a slave to answer.

SLAVE:1) Monitors start2) Check address3) If meant for me a) look at bus operation b) do operation c) signal finish of cycle

BUS:1) Monitors start2) Start count down3) If no one answers

beforecounter reaches 0 then

“timeout”

CLK

start

finish

operation

address

data (Master)

(Master)

(Master)

WRITE (Master)

(Slave)

Page 12: Interconnect & Communication

Multiplexed Bus: Write Transaction

6.004 – Fall 2002 11/14/0 L20 – Communication 12

sample edge assertion edge We let the address and data buses share the same wires.

Slave sends a status message by driving the operation control signals when it finishes. Possible indications: - request succeeded - request failed - try again

A slave can stall the write by waiting several cycles before asserting the finish signal.

CLK

start

finish

operation

address

/dataadr (Master)

OK (Slave)

(Master)

WRITE (Master)

(Slave)

data (Master)

Page 13: Interconnect & Communication

Multiplexed Bus: Read Transaction

6.004 – Fall 2002 11/14/0 L20 – Communication 13

sample edge assertion edge

On reads, we allot one cycle forthe bus to “turn around” (stopdriving and begin receiving). Itgenerally takes some time toread data anyway.

A slave can stall the read (forinstance if the device is slowcompared to the bus clock) bywaiting several clocks beforeasserting the finish signal.These delays are sometimes called “WAIT-STATES”

CLK

start

finish

operation

address

/dataadr (Master)

OK (Slave)

(Master)

WRITE (Master)

(Slave)

data (Master)

Turn around time

Turn around time

Throughput: 3 Clocks/word

Page 14: Interconnect & Communication

Block Write Transfers

6.004 – Fall 2002 11/14/0 L20 – Communication 14

Block transfers are the way to get peak performance from a bus. Athroughput of nearly 1 Clock/word is achievable on large blocks. SlavesMust generate sequential addresses.

CLK

start

finish

operation

address

/dataadr A (M)

OK (Slave)

(Master)

WBLK 2 (M)

(Slave)

data (A)

CONT (M)

CONT (M)

CONT (M)

CONT (M)

data (A+1) data (A+2) data (A+3)

Page 15: Interconnect & Communication

Block Write Transfers

6.004 – Fall 2002 11/14/0 L20 – Communication 15

Block read transfers still require at least one cycle to turn-around the bus. More WAIT-STATES can be added if initial latency is high. The throughput is nearly 1 Clock/word on large blocks. Great for reading long cache lines!

CLK

start

finish

operation

address

/dataadr A (M)

OK (Slave)

(Master)

RBLK 4 (M)

(Slave)

data [A] (S)

CONT (S)CONT S)CONT (S)HOLD (S)

data [A+1] (S)

data [A+2] (S)

data [A+3] (S)

Page 16: Interconnect & Communication

Split-Transaction Bus Operation… you knew we’d work pipelining in somehow!

6.004 – Fall 2002 11/14/0 L20 – Communication 16

The bus master can postseveral read requestsbefore the first request isserved.

Generally, accesses areserved in the same orderthat they are requested.Slaves must queue upmultiple requests, untilmaster releases bus.

The master must keeptrack of outstandingrequests and their status.

CLK

start

finish

operation

address

/dataadr A1 (M

1)

OK #1 (S1)

(M1)

Rd #1 (M1)

(S1)

data [A1] (S1)

(M2)

Rd #2 (M2)

adr A2 (M2)

Throughput: 2 Clocks/word, independent of read latency

Page 17: Interconnect & Communication

Bus Arbitration: Multiple Bus Masters

6.004 – Fall 2002 11/14/0 L20 – Communication 17

ISSUES: • Fairness - Given uniform requests, bus cycles should be divided evenly among modules (to each, according to their needs…) • Bounded Wait – An upper bound on how long a module has to wait between requesting and receiving a grant • Utilization - Arbitration scheme should allow for maximum bus performance • Scalability - Fixed-cost per module (both in terms of arbitration H/W and arbitration time.

STATE OF THE ART ARBITRATION: N masters, log N time, log N wires.

Request

Grant In

Grant Out

Request

Grant In

Grant Out

Request

Grant In

Grant Out

Request

Grant In

Grant Out

Request“Daisy-Chain Arbitration”

Module 1 Module 2 Module 3 Module 4

Page 18: Interconnect & Communication

Outside the box…The Network as an interface standard

6.004 – Fall 2002 11/14/0 L20 – Communication 18

EMERGING IDEA: Protocol “stacks” that isolate application-level

interface from low-level physical devices:

ETHERNET: In the mid-70’s Bob Metcalf (at Xerox PARC, an MIT alum) devised a bus for networking computers together.

• Bit-serial (optimized for long wires)

• Asynchronous (no clock distribution)

• Variable-length “packets”

Application

Session

IP

TCP UDP

Ethernet

Token Ring

Physical

Network

Transport

Page 19: Interconnect & Communication

Beyond Buses: CommunicationTopologies

6.004 – Fall 2002 11/14/0 L20 – Communication 19

COMPLETE GRAPH:

Dedicated lines connecting each pair of communicating nodes. Θ(n) simultaneous communications.

CROSSBAR SWITCH: switch dedicated between each pair of nodes; each A can be connected to one B at any time.

Special cases: • A = processors, B = memories. • A and B are same type.

DRAWBACK: Quadratic Cost!

Page 20: Interconnect & Communication

Communication Topologies:Low-Cost Networks

6.004 – Fall 2002 11/14/0 L20 – Communication 20

Θ(n) steps for random message delivery

BUS

One step for random message delivery (but only one message at a time!)

RING

Page 21: Interconnect & Communication

Mesh Topologies

6.004 – Fall 2002 11/14/0 L20 – Communication 21

4-Neighbor

Nearest-neighbor connectivity: Point-to-point interconnect -minimizes delays -minimizes “analog” effects Store-and-forward (some overhead associated with communication routing)

2-Dimensional Meshes8-Neighbor

3-D, 6-Neighbor Mesh

Thruput

Latency

Cost

Thruput

Latency

Cost

Page 22: Interconnect & Communication

Communication Topology:Logarithmic Latency Networks

6.004 – Fall 2002 11/14/0 L20 – Communication 22

Maximum path length is Θ(log n) steps;

HYPERCUBE (n-cube): Cost = Θ(n log n) Worst-case path length = Θ(log n)

BINARY TREE:

1-cube

Cost/node constant.

2-cube 3-cube

4-cube

Page 23: Interconnect & Communication

Communication Topologies: Latency

6.004 – Fall 2002 11/14/0 L20 – Communication 23

Topology

IS IT REAL? • Speed of Light: ~ 1 ns/foot (typical bus propagation: 5 ns/foot) • Density limits: can a node shrink forever? How about Power, Heat, etc … ?

OBSERVATION: Links on Tree, N-cube must grow with n; hence time/link must grow.

Theorist's view:

• • Each point-to-point link requires one hardware unit.

• • Each point-to-point communication requires one time unit.

Complete Graph

Crossbar

1D Bus

2D Mesh

3D Mesh

Tree

N-cube

TheoreticalLatency

ActualLatency

Page 24: Interconnect & Communication

Backplane Buses - still the standard + easy hardware configurability

+ vendor-independent standards

- serialized communications

- bottleneck as systems scale up

New-generation communications...

• Log networks (trees, hypercubes, …)

• 2D Meshes (IWARP, ...)

• 3D Meshes …

• 4-neighbor, 3D mesh (NuMesh Diamond lattice)

• 6-neighbor, 3D mesh (cube cut on its diagonal)

• Nodes plug together like Legos!

Communications Futures

6.004 – Fall 2002 11/14/0 L20 – Communication 24