COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

1

COSC 6385

Computer Architecture

Dynamic Branch Prediction

Edgar Gabriel

Spring 2018

Pipelining

• Pipelining allows for overlapping the execution of

instructions

• Limitations on the (pipelined) execution of instruction

– Data Dependencies

– Control Dependencies

• Minimizing the effect of the limitations can be done in

– Hardware

– Software (Compiler)

2

Branch prediction

• No instruction is allowed to initiate execution until all

branches preceding the instruction have completed

• How to deal with control hazards

– Stall the pipeline until we know the next fetch address

– Guess the next fetch address (branch prediction)

– Employ delayed branching (branch delay slot)

– Do something else (fine-grained multithreading)

– Eliminate control-flow instructions (predicated

execution)

– Fetch from both possible paths (multipath execution)

Slide based on a lecture by Onur Mutlu, Carnegie Mellon University

https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf

• Idea: Predict the next fetch address (to be used in the

next cycle)

• Requires three things to be predicted at fetch stage:

– Whether the fetched instruction is a branch

– (Conditional) branch direction

– Branch target address (if taken)

• In most instances, target address remains the same for a

branch instruction across multiple instances

• Branch Target Buffer: Store the target address from previous

instance and access it with the Program Counter





3

An example

for ( i=0; i < 1000 ; i++ ) {

x[i] = x[i] + s;

}

Loop: L.D F0, 0(R1)

ADD.D F4, F0, F2

S.D F4, 0(R1)

ADD R1, R1, #8

BNE R1, R2, Loop

Dynamic branch prediction

• Algorithms using the previous execution of a branch to

predict the outcome of the next execution

• Several techniques for dynamic branch prediction

– 1bit branch prediction buffer

– 2bit branch prediction buffer

– Correlating Branch Prediction Buffer

– Tournament predictors

– Tagged hybrid predictors

4

1bit Branch prediction buffer (I)

• Branch prediction buffer:

– Small memory area indexed by the lower portion of the

address of the branch instruction

– Records whether the branch was taken the last time or

not (1 bit is sufficient)

Address of branch instruction

Branch Target Buffer (BTB)

• 1 bit per entry

• 2k entries

Prediction for this branchk-bits

BTB indextag

1bit BTB entry is updated after actual outcome of the branch is known

Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology

https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf

1-bit predictor using Branch Target

Address

Address of branch instruction

k-bits

BTB indextag

tag 1-bit Branch Target Address

=?Instruction found in

BTB?

PC+4

next PC

prediction





5

1bit Branch Prediction Buffer

• Limitations

– Even for a regular loop (embedded in another large loop)

the 1bit Branch Prediction Buffer will mispredict at least

the first and the last iteration

• 1st iteration: the bit has been set by the last iteration

of the same loop to ‘not-taken’, but the branch will

be taken

• Last iteration: the bit says ‘taken’, but the branch

won’t be taken

2bit Branch Prediction Buffer

• A prediction must miss twice before the prediction is

changed

– Can be extended to n-bits

Predict taken

11

Predict taken

10

Predict not taken

01

Predict not taken

00

Taken

Taken

Not taken

Taken

Not taken

Not takenTaken

6

Correlated branches

• Partial correlations:

– one branch could test for cond1, and another branch

could test for cond1 && cond2 (if cond1 is false,

then the second branch can be predicted as false)

• Multiple correlations:

– one branch tests cond1, a second tests cond2 , and

a third tests cond1 ⊕ cond2 (which can always be

predicted if the first two branches are known).



Correlated branches

• Local History

– What is the predicted outcome of Branch A given the

outcomes of previous instances of Branch A?

• Global History

– What is the predicted outcome of Branch Z given the

outcomes of (all/last n) previous branches A, B, …, X and

Y executed before Branch Z?





7

Correlated branches

• For a (1,1) predictor: each branch has two different branch prediction buffers:

• The content of the two branch prediction buffers are determined by the branch to which they belong (local history)

• Which of the two branch prediction buffers are used is depending on the outcome of the previous branch in the application (global history)

X / Y

Predictor used in case the previous branch in the application has not been taken

Predictor used in case the previous branch in the application has been taken

Correlated branches - example

if ( d==0 )

d = 1;

if ( d==1 )

…

BNEZ R1, L1 !branch b1

MOV R1, #1

L1: ADD R3, R1, #-1

BNEZ R3, L2 !branch b2

…

L2:

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken



8


d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT NT/NT

• the branch prediction buffers for the branches b1 and b2 are assumed to hold the prediction ‘Not taken’ for both option (previous

branch not taken/taken)



2 NT/NT NT/NT

• assuming BPB for b1 uses the ‘Not Taken’ predictor because the previous branch in the application has not been taken

→ BPB for b1 predicts that b1 will not be taken

9



2 NT/NT T NT/NT

→ BPB for b1 predicts that b1 will not be taken

→ b1 is taken (see table for d=2)

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2





2 NT/NT T NT/NT

T/NT

→ updating the ‘Previous branch has not been taken’ part of BPB for b1 to Taken

→ because b1 has been taken, the ‘last branch has been taken’ part of BPB b2 will be used

→ BPB b2 predicts, that b2 will not be taken

10



2 NT/NT T NT/NT T

T/NT NT/T

→ b2 is taken (see table for d=2)

→ updating the ‘Previous branch has been taken’ part of BPB for b2 to Taken

→ because b2 has been taken, the ‘last branch has been taken’ part of BPB b1 will be used

→ BPB b1 predicts, that b1 will not be takenInitial value of d

d==0? b1 Value of d

before b2

d==1? b2





2 NT/NT T NT/NT T

0 T/NT NT NT/T

→ b1 is not taken (see table for d=0) → matches prediction!

→ update of BPB b1 does not modify any entry taken

→ because b1 has not been taken, the ‘last branch has not been taken’ part of BPB b2 will be used

→ BPB b2 predicts that b2 will not be taken

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2



11

Correlated branches

• A (2,1) correlated branch predictor

– Uses outcome of the last 2 branches to choose from 22

different predictions

– Uses a 1 bit predictor for each of the 4 prediction buffers

A / B / C / D

Predictor used in case the previous 2 branches in the application have both not been taken (00)

Predictor used in case the previous branches have the history :second last branch not taken, last branch taken (01)

Predictor used in case the previous branches have the history: second last branch taken, last branch not taken (10)

Predictor used in case the previous 2 branches in the application have both been taken (11)

Correlated branches

• How do we know which of the four sections of our

branch predictor to use

– Need to record the behavior of all branches in the

application

• e.g. 11001100110011

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2





12

Global branch history

• For a (2,n) branch predictor, the outcome of last two

branches is relevant

11

110

1100

11001

110011

1100110

11001100

• 2-bit global branch history

• implemented using a 2bit shift register

1%

0%

1%

5%

6% 6%

11%

4%

6%

5%

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li

Fre

qu

en

cy o

f M

isp

red

icti

on

s

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

Accuracy of Different Schemes

4096 Entries 2-bit BHT

Unlimited Entries 2-bit BHT

1024 Entries (2,2) BHT

0%

18%

Fre

quency o

f M

ispre

dic

tions

Slide based on a lecture by David A. Patterson, University of California, Berkley

http://www.cs.berkeley.edu/~pattrsn/252S01

http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

http://www.cs.berkeley.edu/~pattrsn/252S01

13

Tournament predictors

• Combine multiple prediction algorithms

• Use a predictor to predict which predictor works best

– Keeps track which predictor was the most accurate for

the last execution of a branch

– Allows to use different prediction algorithms for different

types of branches

14

Gshare predictor

• Another example of a correlating branch predictor

• Combines branch history and branch address to

determine index in a branch predictor table

Branch addressBranch history (e.g. 10 bit)

XOR1024 2-bit predictors

prediction10

TAGE: Tagged hybrid predictor

• Uses the same algorithm but for different history length

– Some branches are detected to require long history, while

others are predicted accurately with short history

– Index calculated using a hash of branch history and branch

address (of various history length)

• Entries in history tables are uniquely identified by a tag

– Tag can be short ( 4-8 bits) since hash already had to match

– Allows to avoid ‘accidental’ matches by two different branch

instructions to the same entry

– A prediction is only used if tags match and hash match

• The prediction for a given branch is the predictor with the longest

branch history

15

TAGE: Tagged hybrid predictor

Base

pre

dic

tor

pc

P(0) P(1)

pre

dic

tion

tag

hash hash

pc h(0:9)

=?pre

dic

tion

tag

hash hash

pc h(0:19)

=?

pre

dic

tion

tag

hash hash

pc h(0:39)

=?

pre

dic

tion

tag

hash hash

pc h(0:79)

=?1

1

1

1

1111111

1

10 10 10 10 8888

Documents

COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping