View
21
Download
0
Category
Tags:
Preview:
DESCRIPTION
CS 230: Computer Organization and Assembly Language. Aviral Shrivastava. Department of Computer Science and Engineering School of Computing and Informatics Arizona State University. Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB. Announcements. - PowerPoint PPT Presentation
Citation preview
CMLCML
CS 230: Computer Organization and
Assembly LanguageAviral
ShrivastavaDepartment of Computer Science and
EngineeringSchool of Computing and Informatics
Arizona State UniversitySlides courtesy: Prof. Yann Hang Lee, ASU, Prof.
Mary Jane Irwin, PSU, Ande Carle, UCB
CMLCML
Announcements• Alternate Project
– Submit Nov 24
• Quiz 5– Thursday, Nov 19, 2009– Pipelining
• Finals– Tuesday, Dec 08, 2009– Please come on time (You’ll need all the time)– Open book, notes, and internet– No communication with any other human
CMLCML
Benefits of Pipelining• Pipeline latches: pass the status and result of the current instruction to next stage• Comparison:
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Ifetch
lw sw
Dec/Reg Exec Mem Wr Dec/Reg Exec MemIfetchSingle- cycle inst.
Ifetch Dec/Reg Exec Mem Wr
Ifetch Dec/Reg Exec Mem Wr
Ifetch Dec/Reg Exec Mem Wr
pipelined
CMLCML
Branch Hazards
• So far, we’ve limited discussion of hazards to:– Arithmetic/logic operations– Data transfers
• Also need to consider hazards involving branches:– Example:
• 40: beq $1, $3, 28• 44: and $12, $2, $5• 48: or $13, $6, $2• 52: add $14, $2, $2• 72: lw $4, 50($7)
• How long will it take before the branch decision takes effect?– What happens in the meantime?
CMLCML
Branch signal determined in MEM stage
Readreg 1
Shiftleft 2
Signextend
InstructionMemory
Read address
Readreg 2
Writereg
Writedata
Readdata 1
Readdata 2
Readaddr
Writeaddr
Writedata
Readdata
ALU
Add
Add
Zero
Mux
Mux
Mux
PC
DataMemory
Mux
IF/ID
EX/MEM
ID/EX
MEM/WB
ALUcontrol
Reg
Wr it
e
ALUSrc
Bra
nch
Mem
Wr it
e
Mem
toR
eg
Reg
Dst
ALUOp
Mem
Rea
d
PCSrc
Inst[15-0]
Inst[20-16]
Inst[15-11]
Control
WB
M
EX
WB
M WB
Registers
CMLCML
Pipeline impact on branch
• If branch condition true, must skip 44, 48, 52– But, these have already started down the pipeline– They will complete unless we do something about it
• How do we deal with this?– We’ll consider 2 possibilities
IM Reg DM Reg
IM Reg DM Reg
IM Reg DM Reg
IM Reg DM Reg
IM Reg DM Reg
40 beq $1, $3, 28
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
72 lw $4, 50($7)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:PC Changed during Mem cycle of beq
CMLCML
Dealing w/branch hazards: always stall
• Branch taken– Wait 3 cycles– No proper instructions in the pipeline– Same delay as without stalls (no time lost)
40 beq $1, $3, $28
72 lw $4, 50($7)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:
bubble
IM Reg DM Reg
IM bubblebubble
CC 10 CC 11 CC 12
bubble
stall
stall
stall
bubbleIM bubblebubble bubble
bubbleIM bubblebubble bubble
IM Reg DM Reg
CMLCML
Dealing w/branch hazards: always stall
• Branch not taken– Still must wait 3 cycles– Time lost– Could have spent cycles fetching and decoding next
instructions
40 beq $1, $3, $28
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:
bubble
IM Reg DM Reg
IM bubblebubble
CC 10 CC 11 CC 12
bubble
IM Reg DM Reg
IM Reg DM Reg
stall
stall
stall
bubbleIM bubblebubble bubble
bubbleIM bubblebubble bubble
IM Reg DM Reg
CMLCML
Assume branch not taken
• On average, branches are taken ½ the time– If branch not taken…
• Continue normal processing– Else, if branch is taken…
• Need to flush improper instruction from pipeline
• Cuts overall time for branch processing in ½
CMLCML
Flushing unwanted instructions from pipeline
• Useful to compare w/stalling pipeline:– Simple stall: inject bubble into pipe at ID stage only
• Change control to 0 in the ID stage• Let “bubbles” percolate to the right
– Flushing pipe: must change inst. In IF, ID, and EX• IF Stage:
– Zero instruction field of IF/ID pipeline register– Use new control signal IF.Flush
• ID Stage:– Use existing “bubble injection” mux that zeros control for
stalls– Signal ID.Flush is ORed w/stall signal from hazard detection
unit• EX Stage:
– Add new muxes to zero EX pipeline register control lines– Both muxes controlled by single EX.Flush signal
• Control determines when to flush:– Depends on Opcode and value of branch condition
CMLCML
Flushing Pipeline
PC
IF/ID
EX/MEM
ID/EX
MEM/WB
WB
M
EX
WB
M WB
Mux0 M
ux0
Mux
0
HazardDetection
Unit
Control
IF.Flush
ID.Flush
EX.Flush
Branch Decision
Flush Pipeline
CMLCML
Assume “branch not taken”…and branch is not taken…
• Execution proceeds normally – no penalty
IM Reg DM Reg
IM Reg DM Reg
IM Reg DM Reg
IM Reg DM Reg
40 beq $1, $3, 28
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:
CMLCML
Assume “branch not taken”…and branch is taken…
• Bubbles injected into 3 stages during cycle 5
IM Reg DM Reg
IM Reg
IM Reg
IM
IM Reg DM Reg
40 beq $1, $3, 28
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
72 lw $4, 50($7)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9clock cycle:
bubble bubble bubble bubble
bubble bubble bubble
bubble bubble
CMLCML
Reservation Table Picture
• Another way of looking at it…40: beq $1, $3, 7244: and $12, $2, $548: or $13, $6, $252: add $14, $2, $2…72: lw $4, 50($7)
Assume Branch Not Taken and Correct
Assume Branch Not Taken and NOT Correct
1 2 3 4 5 6 7 8 9IF Beq And Or Add 56ID Beq And Or Add 56EX Beq And Or Add 56Mem
Beq And Or Add 56
WB Beq And Or Add 56
1 2 3 4 5 6 7 8 9IF Beq And Or Add SwID Beq And Or Add SwEX Beq And Or Add SwMem
Beq --- --- --- 56
WB Beq --- --- --- 56
No penalty
3 cycle penalty
(FYI, branchFreq ~ 20%; &3 cycle penalty50% of time)
CMLCML
Branch Penalty Impact
• Assume 16% of all instructions are branches– 4% unconditional branches: 3 cycle penalty– 12% conditional: 50% taken
• For a sequence of N instructions (assume N is large)
• N cycles to initiate each• 3 * 0.04 * N delays due to unconditional branches• 0.5 * 3 * 0.12 * N delays due to conditional taken• Also, an extra 4 cycles for pipeline to empty
• Total:– 1.3*N + 4 total cycles (or 1.3 cycles/instruction)
(CPI)• 30% Performance Hit!!! (Bad thing)
CMLCML
Branch Penalty Impact
• Some solutions:– In ISA: branches always execute
next 1 or 2 instructions• Instruction so executed said to be in
delay slot• See SPARC ISA• (example – loop counter update)
– In organization: move comparator to ID stage and decide in the ID stage• Reduces branch delay by 2 cycles• Increases the cycle time
CMLCML
Branch Prediction
• Prior solutions are “ugly”• Better (& more common): guess in IF stage
– Technique is called “branch predicting”; needs 2 parts:• “Predictor” to guess where/if instruction will branch (and to
where)• “Recovery Mechanism”: i.e. a way to fix your mistake
– Prior strategy:• Predictor: always guess branch never taken• Recovery: flush instructions if branch taken
– Alternative: accumulate info. in IF stage as to…• Whether or not for any particular PC value a branch was
taken next• To where it is taken• How to update with information from later stages
CML
A Branch Predictor
PC
InstructionMemory
Normal PC value
BranchPredictionLogic
Guess Branch
Guess as to whereto branch
BranchUpdateInformation
CMLCML
Branch History Table
PC
InstructionMemory
Normal PC value
BranchHistoryTable
Given a PC, look up an entry in Table.Each Table entry has two fields
1 bit Branch PredictionNew PC value
BHT updated by Mem stage when each real branch is resolved
Questions:How to keep BHT from being too bigHow to generate prediction
Answer to BHT size question: use only bottom N bits (e,g, N=8) of PCThis means that multiple instructions will
“share” same entry, causing potential mistakesBranch Prediction
Predicted PC Value
Branch Prediction Accuracy: how often is our prediction correct
CMLCML
Branch Prediction Information
• One bit predictor:– Use result from last time we saw this instruction
• Problem:– Even if branch is almost always taken, we will be
wrong at least twice• 1st time we the instruction• 1st time the branch is not taken• Also, 1st time branch is taken again after than• And if branch alternates b/t taken, not taken…
– We get 0% accuracy• Can we do better? Yep.
CMLCML
Branch Prediction Information
• How to do better?– Keep a “counter” in each entry of the number
of times taken in the last N times executed– Keep information about the “pattern” of
previous branches
• Book’s scheme: a “2-bit saturating counter”– Increment when branch is taken– Decrement when branch is not taken– Don’t increment or decrement above or below a
max/min count• Use sign of count as predictor
CMLCML
Book’s 2 Bit Branch Counter
PredictTaken
PredictTaken
PredictNot
TakenPredict
NotTaken
Actually Taken
Actually Taken
Actually Not Taken
Actu ally Taken
Actually Not Taken
Actu ally Taken Actually Not Taken
Actually Not TakenAs soon as (and only when) we have two mispredictions in a row do we change our prediction.
CMLCML
Computing Performance
• Program assumptions:– 23% loads and in ½ of cases, next instruction uses
load value– 13% stores– 19% conditional branches– 2% unconditional branches– 43% other
• Machine Assumptions:– 5 stage pipe with all forwarding
• Only penalty is 1 cycle on use of load value immediately after a load)
• Jumps are totally resolved in ID stage for a 1 cycle branch penalty
• 75% branch prediction accuracy• 1 cycle delay on misprediction
CMLCML
The Answer:• CPI penalty calculation:
– Loads:• 50% of the 23% of loads have 1 cycle penalty: .5*.23=0.115
– Jumps:• All of the 2% of jumps have 1 cycle penalty: 0.02*1 = 0.02
– Conditional Branches:• 25% of the 19% are mispredicted for a 1 cycle penalty:
0.25*0.19*1 = 0.0475
• Total Penalty: 0.115 + 0.02 + 0.0475 = 0.1825
• Average CPI: 1 + 0.1825 = 1.1825
CMLCML
Yoda says…
Death is a natural part of life. Rejoice for those around you who transform into the Force. Mourn them do not. Miss them do not
Recommended