15
1 COSC 6385 Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution of instruction Data Dependencies Control Dependencies Minimizing the effect of the limitations can be done in Hardware Software (Compiler)

COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

1

COSC 6385

Computer Architecture

Dynamic Branch Prediction

Edgar Gabriel

Spring 2018

Pipelining

• Pipelining allows for overlapping the execution of

instructions

• Limitations on the (pipelined) execution of instruction

– Data Dependencies

– Control Dependencies

• Minimizing the effect of the limitations can be done in

– Hardware

– Software (Compiler)

Page 2: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

2

Branch prediction

• No instruction is allowed to initiate execution until all

branches preceding the instruction have completed

• How to deal with control hazards

– Stall the pipeline until we know the next fetch address

– Guess the next fetch address (branch prediction)

– Employ delayed branching (branch delay slot)

– Do something else (fine-grained multithreading)

– Eliminate control-flow instructions (predicated

execution)

– Fetch from both possible paths (multipath execution)

Slide based on a lecture by Onur Mutlu, Carnegie Mellon University

https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf

• Idea: Predict the next fetch address (to be used in the

next cycle)

• Requires three things to be predicted at fetch stage:

– Whether the fetched instruction is a branch

– (Conditional) branch direction

– Branch target address (if taken)

• In most instances, target address remains the same for a

branch instruction across multiple instances

• Branch Target Buffer: Store the target address from previous

instance and access it with the Program Counter

Slide based on a lecture by Onur Mutlu, Carnegie Mellon University

https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf

Page 3: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

3

An example

for ( i=0; i < 1000 ; i++ ) {

x[i] = x[i] + s;

}

Loop: L.D F0, 0(R1)

ADD.D F4, F0, F2

S.D F4, 0(R1)

ADD R1, R1, #8

BNE R1, R2, Loop

Dynamic branch prediction

• Algorithms using the previous execution of a branch to

predict the outcome of the next execution

• Several techniques for dynamic branch prediction

– 1bit branch prediction buffer

– 2bit branch prediction buffer

– Correlating Branch Prediction Buffer

– Tournament predictors

– Tagged hybrid predictors

Page 4: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

4

1bit Branch prediction buffer (I)

• Branch prediction buffer:

– Small memory area indexed by the lower portion of the

address of the branch instruction

– Records whether the branch was taken the last time or

not (1 bit is sufficient)

Address of branch instruction

Branch Target Buffer (BTB)

• 1 bit per entry

• 2k entries

Prediction for this branchk-bits

BTB indextag

1bit BTB entry is updated after actual outcome of the branch is known

Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology

https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf

1-bit predictor using Branch Target

Address

Address of branch instruction

k-bits

BTB indextag

tag 1-bit Branch Target Address

=?Instruction found in

BTB?

PC+4

next PC

prediction

Slide based on a lecture by Onur Mutlu, Carnegie Mellon University

https://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture11-branch-prediction-afterlecture.pdf

Page 5: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

5

1bit Branch Prediction Buffer

• Limitations

– Even for a regular loop (embedded in another large loop)

the 1bit Branch Prediction Buffer will mispredict at least

the first and the last iteration

• 1st iteration: the bit has been set by the last iteration

of the same loop to ‘not-taken’, but the branch will

be taken

• Last iteration: the bit says ‘taken’, but the branch

won’t be taken

2bit Branch Prediction Buffer

• A prediction must miss twice before the prediction is

changed

– Can be extended to n-bits

Predict taken

11

Predict taken

10

Predict not taken

01

Predict not taken

00

Taken

Taken

Not taken

Taken

Not taken

Not takenTaken

Page 6: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

6

Correlated branches

• Partial correlations:

– one branch could test for cond1, and another branch

could test for cond1 && cond2 (if cond1 is false,

then the second branch can be predicted as false)

• Multiple correlations:

– one branch tests cond1, a second tests cond2 , and

a third tests cond1 ⊕ cond2 (which can always be

predicted if the first two branches are known).

Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology

https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf

Correlated branches

• Local History

– What is the predicted outcome of Branch A given the

outcomes of previous instances of Branch A?

• Global History

– What is the predicted outcome of Branch Z given the

outcomes of (all/last n) previous branches A, B, …, X and

Y executed before Branch Z?

Slide based on a lecture by Milos Prvulovic, Georgia Institute of Technology

https://www.cc.gatech.edu/~milos/Teaching/CS6290F07/6_BranchPred.pdf

Page 7: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

7

Correlated branches

• For a (1,1) predictor: each branch has two different branch prediction buffers:

• The content of the two branch prediction buffers are determined by the branch to which they belong (local history)

• Which of the two branch prediction buffers are used is depending on the outcome of the previous branch in the application (global history)

X / Y

Predictor used in case the previous branch in the application has not been taken

Predictor used in case the previous branch in the application has been taken

Correlated branches - example

if ( d==0 )

d = 1;

if ( d==1 )

BNEZ R1, L1 !branch b1

MOV R1, #1

L1: ADD R3, R1, #-1

BNEZ R3, L2 !branch b2

L2:

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

Page 8: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

8

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT NT/NT

• the branch prediction buffers for the branches b1 and b2 are assumed to hold the prediction ‘Not taken’ for both option (previous

branch not taken/taken)

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT NT/NT

• assuming BPB for b1 uses the ‘Not Taken’ predictor because the previous branch in the application has not been taken

→ BPB for b1 predicts that b1 will not be taken

Page 9: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

9

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT T NT/NT

→ BPB for b1 predicts that b1 will not be taken

→ b1 is taken (see table for d=2)

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT T NT/NT

T/NT

→ updating the ‘Previous branch has not been taken’ part of BPB for b1 to Taken

→ because b1 has been taken, the ‘last branch has been taken’ part of BPB b2 will be used

→ BPB b2 predicts, that b2 will not be taken

Page 10: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

10

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT T NT/NT T

T/NT NT/T

→ b2 is taken (see table for d=2)

→ updating the ‘Previous branch has been taken’ part of BPB for b2 to Taken

→ because b2 has been taken, the ‘last branch has been taken’ part of BPB b1 will be used

→ BPB b1 predicts, that b1 will not be takenInitial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

Correlated branches - example

d=? BPB b1 b1 act. BPB b2 B2 act.

2 NT/NT T NT/NT T

0 T/NT NT NT/T

→ b1 is not taken (see table for d=0) → matches prediction!

→ update of BPB b1 does not modify any entry taken

→ because b1 has not been taken, the ‘last branch has not been taken’ part of BPB b2 will be used

→ BPB b2 predicts that b2 will not be taken

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

Page 11: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

11

Correlated branches

• A (2,1) correlated branch predictor

– Uses outcome of the last 2 branches to choose from 22

different predictions

– Uses a 1 bit predictor for each of the 4 prediction buffers

A / B / C / D

Predictor used in case the previous 2 branches in the application have both not been taken (00)

Predictor used in case the previous branches have the history :second last branch not taken, last branch taken (01)

Predictor used in case the previous branches have the history: second last branch taken, last branch not taken (10)

Predictor used in case the previous 2 branches in the application have both been taken (11)

Correlated branches

• How do we know which of the four sections of our

branch predictor to use

– Need to record the behavior of all branches in the

application

• e.g. 11001100110011

Initial value of d

d==0? b1 Value of d

before b2

d==1? b2

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

2 No Taken 2 No Taken

0 Yes Not taken 1 Yes Not taken

Page 12: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

12

Global branch history

• For a (2,n) branch predictor, the outcome of last two

branches is relevant

11

110

1100

11001

110011

1100110

11001100

• 2-bit global branch history

• implemented using a 2bit shift register

1%

0%

1%

5%

6% 6%

11%

4%

6%

5%

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li

Fre

qu

en

cy o

f M

isp

red

icti

on

s

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

Accuracy of Different Schemes

4096 Entries 2-bit BHT

Unlimited Entries 2-bit BHT

1024 Entries (2,2) BHT

0%

18%

Fre

quency o

f M

ispre

dic

tions

Slide based on a lecture by David A. Patterson, University of California, Berkley

http://www.cs.berkeley.edu/~pattrsn/252S01

Page 13: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

13

Tournament predictors

• Combine multiple prediction algorithms

• Use a predictor to predict which predictor works best

– Keeps track which predictor was the most accurate for

the last execution of a branch

– Allows to use different prediction algorithms for different

types of branches

Page 14: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

14

Gshare predictor

• Another example of a correlating branch predictor

• Combines branch history and branch address to

determine index in a branch predictor table

Branch addressBranch history (e.g. 10 bit)

XOR1024 2-bit predictors

prediction10

TAGE: Tagged hybrid predictor

• Uses the same algorithm but for different history length

– Some branches are detected to require long history, while

others are predicted accurately with short history

– Index calculated using a hash of branch history and branch

address (of various history length)

• Entries in history tables are uniquely identified by a tag

– Tag can be short ( 4-8 bits) since hash already had to match

– Allows to avoid ‘accidental’ matches by two different branch

instructions to the same entry

– A prediction is only used if tags match and hash match

• The prediction for a given branch is the predictor with the longest

branch history

Page 15: COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

15

TAGE: Tagged hybrid predictor

Base

pre

dic

tor

pc

P(0) P(1)

pre

dic

tion

tag

hash hash

pc h(0:9)

=?pre

dic

tion

tag

hash hash

pc h(0:19)

=?

pre

dic

tion

tag

hash hash

pc h(0:39)

=?

pre

dic

tion

tag

hash hash

pc h(0:79)

=?1

1

1

1

1111111

1

10 10 10 10 8888