22
EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

  • Upload
    vicki

  • View
    60

  • Download
    0

Embed Size (px)

DESCRIPTION

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE. A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag. Outlines. Introduction Static Branch Prediction Schemes: A Brief View Dynamic Branch Prediction One-Bit Scheme:BHT To provide target instructions quickly: BTBs - PowerPoint PPT Presentation

Citation preview

Page 1: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

A Survey on

BRANCH PREDICTION METHODOLOGY

By,

Baris Mustafa Kazar

Resit Sendag

Page 2: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Outlines

• Introduction

• Static Branch Prediction Schemes: A Brief View

• Dynamic Branch Prediction– One-Bit Scheme:BHT

– To provide target instructions quickly: BTBs

– Bimodal Branch Prediction (Two-Bit Prediction Scheme)

– Two-level Branch Prediction• Global History Schemes: GAg, GAs, GAp

• Per-Address History Schemes: PAg, PAs, PAp

• Per-Set Address Schemes: SAg, SAs, Sap

– Correlation Branch Prediction

– More on Global Branch Prediction: Gselect and Gshare

– Hybrid Branch Predictors

Page 3: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Outlines (cont.)

– Hybrid Branch Predictors• Where did the idea come from?

• Branch Classification

• An alternative Selection Mechanism

– Reducing PHT Interference• Bi-Mode Branch Predictor

• The Agree Predictor

• The Skewed Branch Predictor

• The YAGS Branch Prediction Scheme

• Concluding Remarks

Page 4: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Introduction• Pipeline flushes due to branch mispredictions is one of the most serious problems facing the designer of a deeply pipelined, super-scalar processor. Many branch predictors have been proposed

to alleviate this problem…

• compile-time schemes (static scheduling)

• Focus on hardware-based prediction schemes (dynamic scheduling)

Page 5: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Static Branch Prediction Schemes: A Brief View

• Program behavior (i.e. branch direction)

• Profile information collected

• Some schemes:

☼ Always not taken,

☼ Always Taken,

☼ Op-code Based,

☼ Backward Taken and Forward not Taken.

Page 6: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Dynamic Prediction Schemes (DPS)

• One-Bit Scheme– Branch Prediction Buffer or Branch History Table (BHT)

– indexed by lower bits of the branch instruction address

– prediction bit

• To provide target instructions quickly: BTBs– Lee and Smith (1984)

– Special instruction cache designed to store the target instructions

Page 7: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

DPS (cont.)

• Two-Bit Prediction Scheme– Bimodal (usually taken or usually not taken)

– 2-bit Saturating Counter, Smith-1981

• Two-level Branch Predictors– Yeh and Patt, 1991

– correlating predictors

– First-level is the history of the last k branches encountered.

– Second-level is the history of branch behavior of the last j occurences of that unique pattern of the last k branches.

– Branch History Register and Pattern History Table (PHT)

– Run-time collection of the history

– Performs better than the other schemes given previously (up to97%)

Page 8: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Two-Level Branch Predictors (cont.)

• Two-level Predictors are classified into 3 classes(Yeh and Patt, 1993)

• Global History Schemes (GAg, GAs, GAp )– The first-level branch history is the actual last k branches.

– Only one global history register (GHR)

– updated with the results from all branches

Page 9: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Two-Level Branch Predictors (cont.)

• Per-address History Schemes– Local Branch Prediction

– The-first level history refers to the last k occurences of the same branch instruction

– The branch prediction is independent of other branches’ execution history

Page 10: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Two-Level Branch Predictors (cont.)

• Per-set History Schemes– The first-level history means the last k occurences of the branch

instructions from the same sub-set.

– The set attribute

– The prediction is influenced by the other branches in the same set

Page 11: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

DPS (cont.)

• Comparison results for Two-level Branch Prediction Classes– Comparison is made upon the performance and cost effectiveness

– Global History Schemes performs better on integer programs

– Per-address history schemes performs better on floating point prog.

– PAs is the most cost effective among low-cost schemes

• Correlation Branch Prediction– Pan, So, and Rahmeh, 1992

– GAp and GAs

Page 12: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

DPS (cont.)

• More on Global Branch Prediction– Local Branch Prediction: history of each branch independently

– Global Branch Prediction: combined history of all recent branches

– Gselect: Global Predictor with Index Selection, Pan, So, Rahmeh, 1992.

• PHT is indexed by concatenations of global history and branch address

• performs better than either bimodal or global prediction

– Gshare: Global Predictor with Index Sharing, McFarling 1993.• PHT is indexed by XOR of global history and branch address

Page 13: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Hybrid Predictors : Combining Branch Predictors

• McFarling 1993

• Different prediction schemes have different advantages

• Combined Predictor: Bimodal and Gshare

• 2 bit-counter is used to select one of the predictor

• performs always better than either predictor alone

• 98.1% vs 97.1%

• The idea of combining predictors was introduced first time.

Page 14: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Hybrid Predictors : Branch Classification

• Chang, Hao, Yeh, and Patt, 1994

• Partitions a program’s branches into sets or branch classes

• Classes are based on run-time and compile-time info.

• Associates each branch class with the most suitable predictor

• 2-bit counter is used to select the branch predictor

Page 15: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

An Alternative Selection Mechanism

• Single-scheme predictors and selection mechanisms

• 2-level Selector, Chang, Hao, and Patt, 1995

• The concept of Two-level Branch Prediction is embodied.

• The performance of 2-level BPS(Branch Predictor Selector) is shown to be better than 2-bit counter BPS mechanism of McFarling

Page 16: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Reducing Pattern History Table Interference

• The main problem, which reduces the prediction rate in the global schemes is aliasing.

• Neutral aliasing- no mispredictions

• Destructive aliasing-misprediction

Page 17: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Reducing PHT Interference: The Agree Predictor

• Sprangle, Chappel, Alsup, Patt, 1997

• Assigns a biasing bit to each branch in BTB

• The PHT info is changed with the bias bit.

• Hopes highly biased behavior of the branches

is seen the first time a branch is introduced

into the BT.

• Neutral aliasing

Page 18: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Reducing PHT Interference: Bi-Mode Predictors

• Lee, Chen, Mudge, 1997

• Tries to replace destructive aliasing with neutral aliasing

• It splits the PHT table into even parts

• choice PHT

• direction PHTs: Taken and Not taken

• Xored indexing of direction PHTs

Page 19: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Reducing PHT Interference: The Skewed Branch Predictor

• Michaud, Seznec, Uhlig, 1997

• Lack of Associativity in PHT

• conflict aliasing, rather than capacity

• set associative PHT? (tags, etc)

• Skewing Function

• splits PHT into 3 banks

• uses unique hashing function per bank

• majority vote

• partial updating of the banks

Page 20: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Reducing PHT Interference: The YAGS Branch Predictor

• Eden, Mudge, 1998

• Yet Another Global Scheme (YAGS)

• Combines the strong points of several

previous schemes

• introduces tags into the PHT that allows it

to be reduced without sacrificing key branch

outcome information. The size reduction

more than offsets the cost of the tags.

• Gives better prediction accuracy for the

SPEC95 benchmark suite than several leading

prediction schemes, for the same cost.

Page 21: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Conclusions

• The Branch Prediction Methodology is studied.

• 2-level Branch Prediction was the most important step on the topic.

• Hybrid predictors, combining the advantages of the single-predictors are the most effective ones in branch prediction

• The selection of the predictors in the Hybrid predictors requires a good study of branch behavior and depends to a great extend upon the programs.

• Branch classification could be a promising method for Hybrid predictors.

• Using 2-level BPS gives better performance than the 2-bit BPS

Page 22: EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE

Conclusions (cont.)

• Bi-Mode and Agree predictors that suggest splitting of the PHT into two branch streams have done a good job in reducing the aliasing in global schemes.

• YAGS scheme further reduces the aliasing by combining the strong points of previous schemes