View
216
Download
0
Category
Preview:
Citation preview
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
ECE 232
Hardware Organization and Design
Lecture 18
Pipelining
Maciej Ciesielski
www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html
ECE 232 L18.Pipeline.2 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Outline
° Review • Microprogramming
° Pipelining, basic concept
° Hazards• Structural
• Data hazards
• Control hazards
ECE 232 L18.Pipeline.3 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Pipelining is Natural!
Laundry Example
° Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
° Washer takes 30 minutes
° Dryer takes 40 minutes
° “Folder” takes 20 minutes
A B C D
ECE 232 L18.Pipeline.4 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Sequential Laundry
° Sequential laundry takes 6 hours for 4 loads
° If they learned pipelining, how long would laundry take?
A
B
C
D
Task
Order
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Time
ECE 232 L18.Pipeline.5 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Pipelined Laundry: Start work ASAP
° Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
Task
Order
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
ECE 232 L18.Pipeline.6 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Pipelining Lessons
° Pipelining doesn’t help latency of single task, it helps throughput of entire workload
° Pipeline rate limited by slowest pipeline stage
° Multiple tasks operating simultaneously using different resources
° Potential speedup = Number pipe stages
° Unbalanced lengths of pipe stages reduces speedup
° Time to “fill” pipeline and time to “drain” it reduces speedup
° Stall for Dependences
A
B
C
D
Task
Order
6 PM 7 8 9
Time
30 40 40 40 40 20
ECE 232 L18.Pipeline.7 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
The Five Stages of Load
° IFetch: Instruction Fetch• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
ECE 232 L18.Pipeline.8 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Pipelined Execution
° Utilization?
° Now we just have to make it work
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WBProgram Flow
Time
ECE 232 L18.Pipeline.9 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
ECE 232 L18.Pipeline.10 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Why Pipeline?
° Suppose we execute 100 instructions
° Single Cycle Machine• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
° Multicycle Machine• 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
° Ideal pipelined machine• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
ECE 232 L18.Pipeline.11 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Why Pipeline? Because the resources are there!
Time (clock cycles)
Instr.
Order
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
AL
UIm Reg Dm Reg
ECE 232 L18.Pipeline.12 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Can pipelining get us into trouble?
° Yes: Pipeline Hazards• structural hazards: attempt to use the same resource two different
ways at the same time
- E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
• data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer
- instruction depends on result of prior instruction still in the pipeline
• control hazards: attempt to make a decision before condition is evaulated
- E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
- branch instructions
° Can always resolve hazards by waiting• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
ECE 232 L18.Pipeline.13 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Single Memory is a Structural Hazard
Time (clock cycles)
Mem
Instr.
Order
Load
Instr 1
Instr 2
Instr 3
Instr 4A
LUMem Reg Mem Reg
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
AL
UReg Mem Reg
AL
UMem Reg Mem Reg
Detection is easy in this case! (right half highlight means read, left half write)
ECE 232 L18.Pipeline.14 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
° Stall: wait until decision is clear• It is possible to move up decision to 2nd stage by adding hardware
to check registers as being read
° Impact: 2 clock cycles per branch instruction => slow
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem RegMem
ECE 232 L18.Pipeline.15 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
° Predict: guess one direction then back up if wrong• Predict not taken
° Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time)
° More dynamic scheme: history of 1 branch ( 90%)
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Load
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
ECE 232 L18.Pipeline.16 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
° Redefine branch behavior (takes place after next instruction)
“delayed branch”
° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time)
° As launch more instruction per clock cycle, less useful
Control Hazard Solutions
Instr.
Order
Time (clock cycles)
Add
Beq
Misc
AL
UMem Reg Mem Reg
AL
UMem Reg Mem Reg
MemA
LUReg Mem Reg
Load Mem
AL
UReg Mem Reg
ECE 232 L18.Pipeline.17 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
• Dependencies backwards in time are hazards
Data Hazard on r1:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF
ID/RF
EX MEM WB
AL
UIm Reg Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
ECE 232 L18.Pipeline.18 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
• “Forward” result from one stage to another
• “or” OK if define read/write properly
Data Hazard Solution:
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
AL
UIm Reg
IF
ID/RF
EX MEM WB
Dm Reg
AL
UIm Reg Dm RegA
LUIm Reg Dm Reg
Im
AL
UReg Dm Reg
AL
UIm Reg Dm Reg
ECE 232 L18.Pipeline.19 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
• Dependencies backwards in time are hazards
• Can’t solve with forwarding: • Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads
Time (clock cycles)
IF
ID/RF
EX MEM WB
lw r1,0(r2)
sub r4,r1,r3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
ECE 232 L18.Pipeline.20 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Designing a Pipelined Processor
° Go back and examine your datapath and control diagrams
° Associated resources with states
° Ensure that flows do not conflict, or figure out how to resolve
° Assert control in appropriate stage
ECE 232 L18.Pipeline.21 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers
Summary
° Microprogramming is a fundamental concept• implement an instruction set by building a very simple
processor and interpreting the instructions
• essential for very complex instructions and when few register transfers are possible
° Pipelining is a fundamental concept• multiple steps using distinct resources
° Utilize capabilities of the Datapath by pipelined instruction processing
• start next instruction while working on the current one
• limited by length of longest stage (plus fill/flush)
• detect and resolve hazards
Recommended