Upload
hana
View
37
Download
0
Embed Size (px)
DESCRIPTION
Pipeline Programming. Programming Techniques & Datapath Requirements. Hazard Avoidance Example Sample code: for (int i = 0; i < 100; i++) { A[i] = B[i] + c; } Assume: A is in reg 1 B is in reg 2 c is in reg 3 Write the LC2K code to minimize stalls – assume: - PowerPoint PPT Presentation
Citation preview
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Pipeline ProgrammingPipeline Programming
Programming Techniques& Datapath Requirements
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Hazard Avoidance Example
• Sample code:
for (int i = 0; i < 100; i++) {A[i] = B[i] + c;
}
• Assume:– A is in reg 1– B is in reg 2– c is in reg 3
• Write the LC2K code to minimize stalls – assume:– a single cycle delayed branch (branch delay slot of size 1)– any necessary/possible data forwarding
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Hazard Avoidance: Example
• The unoptimized codebased on a simplehand-compile:
lw 7 0 hundr # 100
add 6 0 0 # I
L: beq 6 7 End
noop
add 5 2 6 # 5 = B + I
lw 4 5 0 # 4 = B[I]
add 4 4 3 # 4 = B[I] + c
add 5 1 6 # 5 = A + I
sw 4 5 0 # store A[I]
lw 5 0 one # load const 1
add 6 6 5 # I++
beq 0 0 L
noop
End: …
Hazard Avoidance: Unoptimized1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
lw 7 0 hundr F D E M W
add 6 0 0 F D E M W
L: beq 6 7 End F D E M W
noop F D
add 5 2 6 F D E M W
lw 4 5 0 F D E M W
add 4 4 3 F D E M W
add 5 1 6 F D E M W
sw 4 5 0 F D E M W
lw 5 0 one F D E M W
add 6 6 5 F D E M W
bne 0 0 L F D E M W
noop F D
L: beq 6 7 End F D …
Load stall!
Hazard Avoidance: Optimized
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
lw 7 0 hundr F D E M W
add 6 0 0 F D E M W
L: beq 6 7 End F D E M W
add 5 2 6 F D E M W
lw 4 5 0 F D E M W
add 5 3 6 F D E M W
add 4 4 3 F D E M W
sw 4 5 0 F D E M W
lw 5 0 one F D E M W
bne 0 0 L F D E M W
add 6 6 5 F D E M W
L: beq 6 7 End F D …
Branch delay slot
Branch delay slot
CS 352 : Computer Organization and DesignUniversity of Wisconsin-Eau Claire Dan Ernst
Performance Analysis
• N iterations (N = 100 in the example)
• Unoptimized:– Overhead: 2 cycles are required to get to the loop– Each iteration takes 15 – 3 + 1 = 13 cycles– 1 more cycle to finish last instruction in the pipeline
– Total cycles for unoptimized version 2 + 13 * N + 1 = 1303– Total instructions = 2 + 9 * N = 902
– CPI 1.445
• Optimized:– Each iteration takes 11 – 3 + 1 = 9 cycles– 4 more cycles to finish last instruction in the pipeline
– Total cycles for optimized version 2 + 9 * N + 4 = 906
– CPI 1.004