Advanced computer architecture

Preview:

Citation preview

CSE 8383 - Advanced Computer Architecture

Week-3Week of Jan 26, 2004

engr.smu.edu/~rewini/8383

Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline

Linear Pipeline Processing Stages are linearly

connected Perform fixed function Synchronous Pipeline

Clocked latches between Stage i and Stage i+1

Equal delays in all stages Asynchronous Pipeline

(Handshaking)

Latches

S1 S2 S3

L1 L2

Equal delays clock period

Slowest stage determines delay

Reservation Table

X

X

X

X

S1

S2

S3

S4

Time

5 tasks on 4 stages

XX XX XX XX XX

XX XX XX XX XX

XX XX XX XX XX

XX XX XX XX XX

S1

S2

S3

S4

Time

Non Linear Pipelines Variable functions Feed-Forward Feedback

3 stages & 2 functions

S1 S2 S3

YX

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Linear Instruction Pipelines Assume the following instruction

execution phases: Fetch (F) Decode (D) Operand Fetch (O) Execute (E) Write results (W)

Pipeline Instruction Execution

II11 II22 II33

II11 II22 II33

II11 II22 II33

II11 II22 II33

II11 II22 II33

F

D

E

W

O

Dependencies Data Dependency

(Operand is not ready yet)

Instruction Dependency(Branching)

Will that Cause a Problem?

Data Dependency

I1 -- Add R1, R2, R3

I2 -- Sub R4, R1, R5

II11 II22

II11 II22

II11 II22

II11 II22

II11 II22

F

D

E

W

O

1 2 3 4 5 6

Solutions STALL Forwarding Write and Read in one cycle ….

Instruction Dependency

I1 – Branch o

I2 –

II11 II22

II11 II22

II11 II22

II11 II22

II11 II22

F

D

E

W

O

1 2 3 4 5 6

Solutions STALL Predict Branch taken Predict Branch not taken ….

Floating Point Multiplication Inputs (Mantissa1, Exponenet1),

(Mantissa2, Exponent2) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single

length mantissa. You may adjust the exponent

Linear Pipeline for floating-point multiplication

Add Exponents

Multiply Mantissa

Normalize Round

Partial Products

AccumulatorAdd Exponents

Normalize Round

Renormalize

Linear Pipeline for floating-point Addition

Partial Shift

AddMantissa

Subtract Exponents

Find Leading 1

RoundRe

normalize

Partial Shift

Combined Adder and Multiplier

Partial Shift

AddMantissa

ExponentsSubtract

/ ADD

Find Leading 1

RoundRe

normalize

Partial Shift

Partial Products

CA

B

E D

F G H

Reservation Table for Multiply

1 2 3 4 5 6 7

A XB X XC X XD X XE XF

G

H

Reservation Table for Addition

1 2 3 4 5 6 7 8 9

A Y

B

C Y

D Y

E Y

F Y Y

G Y

H Y Y

Nonlinear Pipeline Design Latency

The number of clock cycles between two initiations of a pipeline

CollisionResource Conflict

Forbidden LatenciesLatencies that cause collisions

Nonlinear Pipeline Design cont Latency Sequence

A sequence of permissible latencies between successive task initiations

Latency CycleA sequence that repeats the same subsequence

Collision vectorC = (Cm, Cm-1, …, C2, C1), m <= n-1

n = number of column in reservation tableCi = 1 if latency i causes collision, 0 otherwise

Mul – Mul Collision (lunch after 1 cycle)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X Z XE X ZF

G

H

Mul –Mul Collision (lunch after 2 cycles)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X X ZE XF

G

H

Mul – Mul Collision (lunch after 3 cycles)

1 2 3 4 5 6 7

A X ZB X X Z ZC X X Z ZD X XE XF

G

H

Collision Vector for Multiply after Multiply

Forbidden Latencies: 1, 2

Collision vector0 0 0 0 1 1 11

Maximum forbidden latency = 2 m = 2

Example

S1 S2 S3

YX

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Reservation Tables for X & Y

X X X

X X

X X X

Y Y

Y

Y Y Y

S1

S2

S3

S1

S2

S3

Forbidden Latencies X after X X after Y Y after X Y after Y

X after X

X1 X2 X1 X2 X1

X1 X2 X1 X2

X1 X2 X1

X2 X1

S1

S2

S3

X1 X2 X1 X1

X1 X1 X2

X1 X1 X1 X2

S1

S2

S3

5

2

X after X

X1 X2 X1 X1

X1 X1 X2 X2

X1 X1 X2 X1

S1

S2

S3

X1 X1 X2 X1

X1 X1

X1 X1 X1

S1

S2

S3

4

7

Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0

Y after Y

Y Y Y

Y Y

Y Y Y

Y Y

S1

S2

S3

Y Y Y

Y

Y Y Y Y

S1

S2

S3

Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0

Exercise – Find the collision vector

1 2 3 4 5 6 7

A X X X

B X X

C X X

D X

State Diagram for X

1 0 1 1 0 1 0

1 1 1 1 1 1 11 0 1 1 0 1 1

36 8+

6

8+

8+

3*

1*

Cycles Simple cycles each state

appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles

whose edges are all made with minimum latencies from their respective starting states

(1,8), (3) one of them is MAL

Recommended