Pipeline Programming

CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst

Pipeline ProgrammingPipeline Programming

Programming Techniques& Datapath Requirements


Hazard Avoidance Example

• Sample code:

for (int i = 0; i < 100; i++) {A[i] = B[i] + c;

}

• Assume:– A is in reg 1– B is in reg 2– c is in reg 3

• Write the LC2K code to minimize stalls – assume:– a single cycle delayed branch (branch delay slot of size 1)– any necessary/possible data forwarding


Hazard Avoidance: Example

• The unoptimized codebased on a simplehand-compile:

lw 7 0 hundr # 100

add 6 0 0 # I

L: beq 6 7 End

noop

add 5 2 6 # 5 = B + I

lw 4 5 0 # 4 = B[I]

add 4 4 3 # 4 = B[I] + c

add 5 1 6 # 5 = A + I

sw 4 5 0 # store A[I]

lw 5 0 one # load const 1

add 6 6 5 # I++

beq 0 0 L

noop

End: …

Hazard Avoidance: Unoptimized1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

lw 7 0 hundr F D E M W

add 6 0 0 F D E M W

L: beq 6 7 End F D E M W

noop F D

add 5 2 6 F D E M W

lw 4 5 0 F D E M W

add 4 4 3 F D E M W

add 5 1 6 F D E M W

sw 4 5 0 F D E M W

lw 5 0 one F D E M W

add 6 6 5 F D E M W

bne 0 0 L F D E M W

noop F D

L: beq 6 7 End F D …

Load stall!

Hazard Avoidance: Optimized

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

lw 7 0 hundr F D E M W

add 6 0 0 F D E M W

L: beq 6 7 End F D E M W

add 5 2 6 F D E M W

lw 4 5 0 F D E M W

add 5 3 6 F D E M W

add 4 4 3 F D E M W

sw 4 5 0 F D E M W

lw 5 0 one F D E M W

bne 0 0 L F D E M W

add 6 6 5 F D E M W

L: beq 6 7 End F D …

Branch delay slot

Branch delay slot


Performance Analysis

• N iterations (N = 100 in the example)

• Unoptimized:– Overhead: 2 cycles are required to get to the loop– Each iteration takes 15 – 3 + 1 = 13 cycles– 1 more cycle to finish last instruction in the pipeline

– Total cycles for unoptimized version 2 + 13 * N + 1 = 1303– Total instructions = 2 + 9 * N = 902

– CPI 1.445

• Optimized:– Each iteration takes 11 – 3 + 1 = 9 cycles– 4 more cycles to finish last instruction in the pipeline

– Total cycles for optimized version 2 + 9 * N + 4 = 906

– CPI 1.004

Documents

Pipeline Programming