15
CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs152 3/20/2008 CS152-Spring!08 2 Last time in Lecture 13 Register renaming removes WAR, WAW hazards by giving a new internal destination register for every new result Pipeline is structured with in-order fetch/decode/rename, followed by out-of-order execution/complete, followed by in-order commit At commit time, can detect exceptions and roll back buffer to provide precise interrupts

CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

CS 152 Computer Architecture and

Engineering

Lecture 14 - Branch Prediction

Krste AsanovicElectrical Engineering and Computer Sciences

University of California at Berkeley

http://www.eecs.berkeley.edu/~krste

http://inst.eecs.berkeley.edu/~cs152

3/20/2008 CS152-Spring!08 2

Last time in Lecture 13

• Register renaming removes WAR, WAW hazards bygiving a new internal destination register for everynew result

• Pipeline is structured with in-orderfetch/decode/rename, followed by out-of-orderexecution/complete, followed by in-order commit

• At commit time, can detect exceptions and roll backbuffer to provide precise interrupts

Page 2: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 3

Recap: Overall Pipeline Structure

• Instructions fetched and decoded into instruction reorder buffer in-order• Execution is out-of-order ( ! out-of-order completion)

• Commit (write-back to architectural state, i.e., regfile & memory) is in-order

Temporary storage needed to hold results before commit(shadow registers and store buffers)

Fetch Decode

Execute

CommitReorder Buffer

In-order In-orderOut-of-order

KillKill Kill

Exception?Inject handler PC

3/20/2008 CS152-Spring!08 4

I-cache

FetchBuffer

IssueBuffer

Func.Units

Arch.State

Execute

Decode

ResultBuffer Commit

PC

Fetch

Branchexecuted

Next fetchstarted

Modern processors may have> 10 pipeline stages betweennext PC calculation and branchresolution !

Control Flow Penalty

How much work is lost ifpipeline doesn’t followcorrect instruction flow?

~ Loop length x pipeline width

Page 3: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 5

Instruction Taken known? Target known?

J

JR

BEQZ/BNEZ

MIPS Branches and Jumps

Each instruction fetch depends on one or two piecesof information from the preceding instruction:

1) Is the preceding instruction a taken branch?

2) If so, what is the target address?

After Inst. Decode

After Inst. Decode After Inst. Decode

After Inst. Decode After Reg. Fetch

After Reg. Fetch*

*Assuming zero detect on register read

3/20/2008 CS152-Spring!08 6

Branch Penalties in Modern Pipelines

A PC Generation/Mux

P Instruction Fetch Stage 1

F Instruction Fetch Stage 2

B Branch Address Calc/Begin Decode

I Complete Decode

J Steer Instructions to Functional units

R Register File Read

E Integer Execute

Remainder of execute pipeline (+ another 6 stages)

UltraSPARC-III instruction fetch pipeline stages(in-order issue, 4-way superscalar, 750MHz, 2000)

BranchTargetAddressKnown

BranchDirection &JumpRegisterTargetKnown

Page 4: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 7

Reducing Control Flow Penalty

Software solutions• Eliminate branches - loop unrolling

Increases the run length• Reduce resolution time - instruction scheduling

Compute the branch condition as earlyas possible (of limited value)

Hardware solutions• Find something else to do - delay slots

Replaces pipeline bubbles with useful work(requires software cooperation)

• Speculate - branch predictionSpeculative execution of instructions beyondthe branch

3/20/2008 CS152-Spring!08 8

Branch Prediction

Motivation:Branch penalties limit performance of deeply pipelinedprocessors

Modern branch predictors have high accuracy(>95%) and can reduce branch penalties significantly

Required hardware support:Prediction structures:

• Branch history tables, branch target buffers, etc.

Mispredict recovery mechanisms:• Keep result computation separate from commit• Kill instructions following branch in pipeline• Restore state to state following branch

Page 5: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 9

Static Branch Prediction

Overall probability a branch is taken is ~60-70% but:

ISA can attach preferred direction semantics to branches,e.g., Motorola MC88110

bne0 (preferred taken) beq0 (not taken)

ISA can allow arbitrary choice of statically predicteddirection, e.g., HP PA-RISC, Intel IA-64 typically reported as ~80% accurate

JZ

JZbackward

90%forward

50%

3/20/2008 CS152-Spring!08 10

Dynamic Branch Predictionlearning based on past behavior

Temporal correlationThe way a branch resolves may be a goodpredictor of the way it will resolve at the nextexecution

Spatial correlationSeveral branches may resolve in a highlycorrelated manner (a preferred path ofexecution)

Page 6: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 11

• Assume 2 BP bits per instruction• Change the prediction after two consecutive mistakes!

¬takewrong

taken¬ taken

taken

taken

taken¬takeright

takeright

takewrong

¬ taken

¬ taken¬ taken

BP state:(predict take/¬take) x (last prediction right/wrong)

Branch Prediction Bits

3/20/2008 CS152-Spring!08 12

Branch History Table

4K-entry BHT, 2 bits/entry, ~80-90% correct predictions

0 0Fetch PC

Branch? Target PC

+

I-Cache

Opcode offset

Instruction

k

BHT Index

2k-entryBHT,2 bits/entry

Taken/¬Taken?

Page 7: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 13

Exploiting Spatial CorrelationYeh and Patt, 1992

History register, H, records the direction of the lastN branches executed by the processor

if (x[i] < 7) then

y += 1;

if (x[i] < 5) then

c -= 4;

If first condition false, second condition also false

3/20/2008 CS152-Spring!08 14

Two-Level Branch PredictorPentium Pro uses the result from the last two branchesto select one of the four sets of BHT bits (~95% correct)

0 0

kFetch PC

Shift inTaken/¬Takenresults of eachbranch

2-bit global branchhistory shift register

Taken/¬Taken?

Page 8: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 15

Limitations of BHTs

Only predicts branch direction. Therefore, cannot redirectfetch stream until after branch target is determined.

UltraSPARC-III fetch pipeline

Correctly

predicted

taken branchpenalty

Jump Registerpenalty

A PC Generation/Mux

P Instruction Fetch Stage 1

F Instruction Fetch Stage 2

B Branch Address Calc/Begin Decode

I Complete Decode

J Steer Instructions to Functional units

R Register File Read

E Integer Execute

Remainder of execute pipeline(+ another 6 stages)

3/20/2008 CS152-Spring!08 16

Branch Target Buffer

BP bits are stored with the predicted target address.

IF stage: If (BP=taken) then nPC=target else nPC=PC+4later: check prediction, if wrong then kill the instruction and update BTB & BPb else update BPb

IMEM

PC

Branch Target Buffer (2k entries)

k

BPbpredicted

target BP

target

Page 9: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 17

Address Collisions

What will be fetched after the instruction at 1028?BTB prediction = Correct target =

!

Assume a 128-entry BTB

BPbtarget

take236

1028 Add .....

132 Jump 100

InstructionMemory

2361032

kill PC=236 and fetch PC=1032

Is this a common occurrence?Can we avoid these bubbles?

3/20/2008 CS152-Spring!08 18

BTB is only for Control Instructions

BTB contains useful information for branch andjump instructions only

! Do not update it for other instructions

For all other instructions the next PC is PC+4 !

How to achieve this effect without decoding theinstruction?

Page 10: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 19

Branch Target Buffer (BTB)

• Keep both the branch PC and target PC in the BTB• PC+4 is fetched if match fails• Only taken branches and jumps held in BTB• Next PC determined before branch fetched and decoded

2k-entry direct-mapped BTB(can also be associative)

I-Cache PC

k

Valid

valid

Entry PC

=

match

predicted

target

target PC

3/20/2008 CS152-Spring!08 20

Consulting BTB Before Decoding

1028 Add .....

132 Jump 100

BPbtarget

take236

entry PC

132

• The match for PC=1028 fails and 1028+4 is fetched ! eliminates false predictions after ALU instructions

• BTB contains entries only for control transfer instructions! more room to store branch targets

Page 11: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 21

CS152 Administrivia

• Lab 4, branch predictor competition, due April 3

– PRIZE (TBD) for winners in both unlimited and realistic categories

• Quiz 4, Tuesday April 8

3/20/2008 CS152-Spring!08 22

Combining BTB and BHT• BTB entries are considerably more expensive than BHT, but can

redirect fetches at earlier stage in pipeline and can accelerateindirect branches (JR)

• BHT can hold many more entries and is more accurate

A PC Generation/Mux

P Instruction Fetch Stage 1

F Instruction Fetch Stage 2

B Branch Address Calc/Begin Decode

I Complete Decode

J Steer Instructions to Functional units

R Register File Read

E Integer Execute

BTB

BHTBHT in laterpipeline stagecorrects whenBTB misses apredictedtaken branch

BTB/BHT only updated after branch resolves in E stage

Page 12: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 23

Uses of Jump Register (JR)

• Switch statements (jump to address of matching case)

• Dynamic function call (jump to run-time function address)

• Subroutine returns (jump to return address)

How well does BTB work for each of these cases?

BTB works well if same case used repeatedly

BTB works well if same function usually called, (e.g., inC++ programming, when objects have same type invirtual function call)

BTB works well if usually return to the same place

! Often one function called from many distinct call sites!

3/20/2008 CS152-Spring!08 24

Subroutine Return Stack

Small structure to accelerate JR for subroutine returns,typically much more accurate than BTBs.

&fb()

&fc()

Push call address whenfunction call executed

Pop return addresswhen subroutinereturn decoded

fa() { fb(); }

fb() { fc(); }

fc() { fd(); }

&fd()k entries(typically k=8-16)

Page 13: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 25

Mispredict Recovery

In-order execution machines:

– Assume no instruction issued after branch can write-back beforebranch resolves

– Kill all instructions in pipeline behind mispredicted branch

–Multiple instructions following branch in programorder can complete before branch resolves

Out-of-order execution?

3/20/2008 CS152-Spring!08 26

In-Order Commit for Precise Exceptions

• Instructions fetched and decoded into instruction reorder buffer in-order• Execution is out-of-order ( ! out-of-order completion)

• Commit (write-back to architectural state, i.e., regfile & memory, is in-order

Temporary storage needed in ROB to hold results before commit

Fetch Decode

Execute

CommitReorder Buffer

In-order In-orderOut-of-order

KillKill Kill

Exception?Inject handler PC

Page 14: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 27

Branch Misprediction in Pipeline

Fetch Decode

Execute

CommitReorder Buffer

Kill

Kill Kill

BranchResolution

Inject correct PC

• Can have multiple unresolved branches in ROB• Can resolve branches out-of-order by killing all the instructions in ROB that follow a mispredicted branch

BranchPrediction

PC

Complete

3/20/2008 CS152-Spring!08 28

t vt vt v

Recovering ROB/Renaming Table

RegisterFile

Reorder buffer Load

UnitFU FU FU

Store Unit

< t, result >

t1

t2

.

.tn

Ins# use exec op p1 src1 p2 src2 pd dest data

Commit

Rename Table r1

t v

r2

Take snapshot of register rename table at each predictedbranch, recover earlier snapshot if branch mispredicted

Rename Snapshots

Ptr2

next to commit

Ptr1

next available

rollback

next available

Page 15: CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

3/20/2008 CS152-Spring!08 29

Speculating Both Directions

• resource requirement is proportional to the number of concurrent speculative executions

An alternative to branch prediction is to executeboth directions of a branch speculatively

• branch prediction takes less resources than speculative execution of both paths

• only half the resources engage in useful work when both directions of a branch are executed speculatively

With accurate branch prediction, it is more costeffective to dedicate all resources to the predicteddirection

3/20/2008 CS152-Spring!08 30

Acknowledgements

• These slides contain material developed andcopyright by:

– Arvind (MIT)

– Krste Asanovic (MIT/UCB)

– Joel Emer (Intel/MIT)

– James Hoe (CMU)

– John Kubiatowicz (UCB)

– David Patterson (UCB)

• MIT material derived from course 6.823

• UCB material derived from course CS252