Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
1
COSC 6385
Computer Architecture
Dynamic Branch Prediction
Edgar Gabriel
Spring 2018
Pipelining
• Pipelining allows for overlapping the execution of
instructions
• Limitations on the (pipelined) execution of instruction
– Data Dependencies
– Control Dependencies
• Minimizing the effect of the limitations can be done in
– Hardware
– Software (Compiler)
2
Branch prediction
• No instruction is allowed to initiate execution until all
branches preceding the instruction have completed
• How to deal with control hazards
– Stall the pipeline until we know the next fetch address
– Guess the next fetch address (branch prediction)
– Employ delayed branching (branch delay slot)
– Do something else (fine-grained multithreading)
– Eliminate control-flow instructions (predicated
execution)
– Fetch from both possible paths (multipath execution)
Slide based on a lecture by Onur Mutlu, Carnegie Mellon University
https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf
• Idea: Predict the next fetch address (to be used in the
next cycle)
• Requires three things to be predicted at fetch stage:
– Whether the fetched instruction is a branch
– (Conditional) branch direction
– Branch target address (if taken)
• In most instances, target address remains the same for a
branch instruction across multiple instances
• Branch Target Buffer: Store the target address from previous
instance and access it with the Program Counter
Slide based on a lecture by Onur Mutlu, Carnegie Mellon University
https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf
3
An example
for ( i=0; i < 1000 ; i++ ) {
x[i] = x[i] + s;
}
Loop: L.D F0, 0(R1)
ADD.D F4, F0, F2
S.D F4, 0(R1)
ADD R1, R1, #8
BNE R1, R2, Loop
Dynamic branch prediction
• Algorithms using the previous execution of a branch to
predict the outcome of the next execution
• Several techniques for dynamic branch prediction
– 1bit branch prediction buffer
– 2bit branch prediction buffer
– Correlating Branch Prediction Buffer
– Tournament predictors
– Tagged hybrid predictors
4
1bit Branch prediction buffer (I)
• Branch prediction buffer:
– Small memory area indexed by the lower portion of the
address of the branch instruction
– Records whether the branch was taken the last time or
not (1 bit is sufficient)
Address of branch instruction
Branch Target Buffer (BTB)
• 1 bit per entry
• 2k entries
Prediction for this branchk-bits
BTB indextag
1bit BTB entry is updated after actual outcome of the branch is known
Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology
https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf
1-bit predictor using Branch Target
Address
Address of branch instruction
k-bits
BTB indextag
tag 1-bit Branch Target Address
=?Instruction found in
BTB?
PC+4
next PC
prediction
Slide based on a lecture by Onur Mutlu, Carnegie Mellon University
https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf
5
1bit Branch Prediction Buffer
• Limitations
– Even for a regular loop (embedded in another large loop)
the 1bit Branch Prediction Buffer will mispredict at least
the first and the last iteration
• 1st iteration: the bit has been set by the last iteration
of the same loop to ‘not-taken’, but the branch will
be taken
• Last iteration: the bit says ‘taken’, but the branch
won’t be taken
2bit Branch Prediction Buffer
• A prediction must miss twice before the prediction is
changed
– Can be extended to n-bits
Predict taken
11
Predict taken
10
Predict not taken
01
Predict not taken
00
Taken
Taken
Not taken
Taken
Not taken
Not takenTaken
6
Correlated branches
• Partial correlations:
– one branch could test for cond1, and another branch
could test for cond1 && cond2 (if cond1 is false,
then the second branch can be predicted as false)
• Multiple correlations:
– one branch tests cond1, a second tests cond2 , and
a third tests cond1 ⊕ cond2 (which can always be
predicted if the first two branches are known).
Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology
https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf
Correlated branches
• Local History
– What is the predicted outcome of Branch A given the
outcomes of previous instances of Branch A?
• Global History
– What is the predicted outcome of Branch Z given the
outcomes of (all/last n) previous branches A, B, …, X and
Y executed before Branch Z?
Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology
https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf
7
Correlated branches
• For a (1,1) predictor: each branch has two different branch prediction buffers:
• The content of the two branch prediction buffers are determined by the branch to which they belong (local history)
• Which of the two branch prediction buffers are used is depending on the outcome of the previous branch in the application (global history)
X / Y
Predictor used in case the previous branch in the application has not been taken
Predictor used in case the previous branch in the application has been taken
Correlated branches - example
if ( d==0 )
d = 1;
if ( d==1 )
…
BNEZ R1, L1 !branch b1
MOV R1, #1
L1: ADD R3, R1, #-1
BNEZ R3, L2 !branch b2
…
L2:
Initial value of d
d==0? b1 Value of d
before b2
d==1? b2
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
8
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT NT/NT
• the branch prediction buffers for the branches b1 and b2 are assumed to hold the prediction ‘Not taken’ for both option (previous
branch not taken/taken)
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT NT/NT
• assuming BPB for b1 uses the ‘Not Taken’ predictor because the previous branch in the application has not been taken
→ BPB for b1 predicts that b1 will not be taken
9
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT T NT/NT
→ BPB for b1 predicts that b1 will not be taken
→ b1 is taken (see table for d=2)
Initial value of d
d==0? b1 Value of d
before b2
d==1? b2
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT T NT/NT
T/NT
→ updating the ‘Previous branch has not been taken’ part of BPB for b1 to Taken
→ because b1 has been taken, the ‘last branch has been taken’ part of BPB b2 will be used
→ BPB b2 predicts, that b2 will not be taken
10
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT T NT/NT T
T/NT NT/T
→ b2 is taken (see table for d=2)
→ updating the ‘Previous branch has been taken’ part of BPB for b2 to Taken
→ because b2 has been taken, the ‘last branch has been taken’ part of BPB b1 will be used
→ BPB b1 predicts, that b1 will not be takenInitial value of d
d==0? b1 Value of d
before b2
d==1? b2
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
Correlated branches - example
d=? BPB b1 b1 act. BPB b2 B2 act.
2 NT/NT T NT/NT T
0 T/NT NT NT/T
→ b1 is not taken (see table for d=0) → matches prediction!
→ update of BPB b1 does not modify any entry taken
→ because b1 has not been taken, the ‘last branch has not been taken’ part of BPB b2 will be used
→ BPB b2 predicts that b2 will not be taken
Initial value of d
d==0? b1 Value of d
before b2
d==1? b2
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
11
Correlated branches
• A (2,1) correlated branch predictor
– Uses outcome of the last 2 branches to choose from 22
different predictions
– Uses a 1 bit predictor for each of the 4 prediction buffers
A / B / C / D
Predictor used in case the previous 2 branches in the application have both not been taken (00)
Predictor used in case the previous branches have the history :second last branch not taken, last branch taken (01)
Predictor used in case the previous branches have the history: second last branch taken, last branch not taken (10)
Predictor used in case the previous 2 branches in the application have both been taken (11)
Correlated branches
• How do we know which of the four sections of our
branch predictor to use
– Need to record the behavior of all branches in the
application
• e.g. 11001100110011
Initial value of d
d==0? b1 Value of d
before b2
d==1? b2
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
2 No Taken 2 No Taken
0 Yes Not taken 1 Yes Not taken
12
Global branch history
• For a (2,n) branch predictor, the outcome of last two
branches is relevant
11
110
1100
11001
110011
1100110
11001100
• 2-bit global branch history
• implemented using a 2bit shift register
1%
0%
1%
5%
6% 6%
11%
4%
6%
5%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li
Fre
qu
en
cy o
f M
isp
red
icti
on
s
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Accuracy of Different Schemes
4096 Entries 2-bit BHT
Unlimited Entries 2-bit BHT
1024 Entries (2,2) BHT
0%
18%
Fre
quency o
f M
ispre
dic
tions
Slide based on a lecture by David A. Patterson, University of California, Berkley
http://www.cs.berkeley.edu/~pattrsn/252S01
13
Tournament predictors
• Combine multiple prediction algorithms
• Use a predictor to predict which predictor works best
– Keeps track which predictor was the most accurate for
the last execution of a branch
– Allows to use different prediction algorithms for different
types of branches
14
Gshare predictor
• Another example of a correlating branch predictor
• Combines branch history and branch address to
determine index in a branch predictor table
Branch addressBranch history (e.g. 10 bit)
XOR1024 2-bit predictors
prediction10
TAGE: Tagged hybrid predictor
• Uses the same algorithm but for different history length
– Some branches are detected to require long history, while
others are predicted accurately with short history
– Index calculated using a hash of branch history and branch
address (of various history length)
• Entries in history tables are uniquely identified by a tag
– Tag can be short ( 4-8 bits) since hash already had to match
– Allows to avoid ‘accidental’ matches by two different branch
instructions to the same entry
– A prediction is only used if tags match and hash match
• The prediction for a given branch is the predictor with the longest
branch history
15
TAGE: Tagged hybrid predictor
Base
pre
dic
tor
pc
P(0) P(1)
pre
dic
tion
tag
hash hash
pc h(0:9)
=?pre
dic
tion
tag
hash hash
pc h(0:19)
=?
pre
dic
tion
tag
hash hash
pc h(0:39)
=?
pre
dic
tion
tag
hash hash
pc h(0:79)
=?1
1
1
1
1111111
1
10 10 10 10 8888