37
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University

Neural Methods for Dynamic Branch Prediction

Embed Size (px)

DESCRIPTION

Neural Methods for Dynamic Branch Prediction. Daniel A. Jiménez Department of Computer Science Rutgers University. The Context. I'll be discussing the implementation of microprocessors Microarchitecture I study deeply pipelined, high clock frequency CPUs The goal is to improve performance - PowerPoint PPT Presentation

Citation preview

Page 1: Neural Methods for  Dynamic Branch Prediction

Neural Methods for Dynamic Branch Prediction

Daniel A. Jiménez

Department of Computer ScienceRutgers University

Page 2: Neural Methods for  Dynamic Branch Prediction

2

The Context

I'll be discussing the implementation of microprocessors Microarchitecture

I study deeply pipelined, high clock frequency CPUs

The goal is to improve performance Make the program go faster

How can we exploit program behavior to make it go faster? Remove control dependences

Increase instruction-level parallelism

Page 3: Neural Methods for  Dynamic Branch Prediction

3

An Example

This C++ code computes something useful. The inner loop executes two statements each time through the loop.

int foo (int w[], bool v[], int n) {int sum = 0;for (int i=0; i<n; i++) {

if (v[i])sum += w[i];

elsesum += ~w[i];

}return sum;

}

Page 4: Neural Methods for  Dynamic Branch Prediction

4

An Example continued

This C++ code computes the same thing with three statements in the loop.

This version is 55% faster on a Pentium 4. Previous version had many mispredicted branch instructions.

int foo2 (int w[], bool v[], int n) {int sum = 0;for (int i=0; i<n; i++) {

int a = w[i];int b = - (int) v[i];sum += ~(a ^ b);

}return sum;

}

Page 5: Neural Methods for  Dynamic Branch Prediction

5

How an Instruction is Processed

Instruction fetch

Instruction decode

Execute

Memory access

Write back

Processing can be divided

into five stages:

Page 6: Neural Methods for  Dynamic Branch Prediction

6

Instruction-Level Parallelism

Instruction fetch

Instruction decode

Execute

Memory access

Write back

To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions

Page 7: Neural Methods for  Dynamic Branch Prediction

7

Control Hazards: Branches

Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later.

Branch instruction

Page 8: Neural Methods for  Dynamic Branch Prediction

8

Pipelining and Branches

Instruction fetch

Instruction decode

Execute

Memory access

Write back

Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle.

Unresolved branch instruction

Page 9: Neural Methods for  Dynamic Branch Prediction

9

Branch Prediction

Instruction fetch

Instruction decode

Execute

Memory access

Write back

A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path.

Speculative execution

Branch predictors must be highly accurate to avoid mispredictions!

Page 10: Neural Methods for  Dynamic Branch Prediction

10

Branch Predictors Must Improve

The cost of a misprediction is proportional to pipeline depth As pipelines deepen, we need more accurate branch predictors

Pentium 4 pipeline has 20 stages Future pipelines will have > 32 stages

Simulations with SimpleScalar/Alpha

Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage

Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline

Page 11: Neural Methods for  Dynamic Branch Prediction

11

Overview

Branch prediction background

Applying machine learning to branch prediction

Results and analysis

Circuit-level implementation

Future work and conclusions

Page 12: Neural Methods for  Dynamic Branch Prediction

12

Branch Prediction Background

Page 13: Neural Methods for  Dynamic Branch Prediction

13

Branch Prediction Background

The basic mechanism: 2-level adaptive prediction [Yeh & Patt `91]

Uses correlations between branch history and outcome Examples:

gshare [McFarling `93] agree [Sprangle et al. `97] hybrid predictors [Evers et al. `96]

This scheme is highly accurate in practice

Page 14: Neural Methods for  Dynamic Branch Prediction

14

Branch Predictor Accuracy

Larger tables and smarter organizations yield better accuracy Longer histories provide more context for finding correlations

Table size is exponential in history length The cost is increased access delay and chip area

Page 15: Neural Methods for  Dynamic Branch Prediction

15

Applying Machine Learning to Branch Prediction

Page 16: Neural Methods for  Dynamic Branch Prediction

16

Branch Prediction is a Machine Learning Problem

So why not apply a machine learning algorithm? Replace 2-bit counters with a more accurate predictor

Tight constraints on prediction mechanism

Must be fast and small enough to work as a component of a

microprocessor

Artificial neural networks Simple model of neural networks in brain cells

Learn to recognize and classify patterns

Most neural nets are slow and complex relative to tables

For branch prediction, we need a small and fast neural method

Page 17: Neural Methods for  Dynamic Branch Prediction

17

A Neural Method for Branch Prediction

We investigated several neural methods

Most were too slow, too big, or not accurate enough

Our choice: The perceptron [Rosenblatt `62, Block `62]

Very high accuracy for branch prediction

Prediction and update are quick, relative to other neural methods

Sound theoretical foundation; perceptron convergence theorem

Proven to work well for many classification problems

Page 18: Neural Methods for  Dynamic Branch Prediction

18

Branch-Predicting Perceptron

Inputs (x’s) are from branch history register Weights (w’s) are small integers learned by on-line training Output (y) gives prediction; dot product of x’s and w’s Training finds correlations between history and outcome

Page 19: Neural Methods for  Dynamic Branch Prediction

19

Training Algorithm

Page 20: Neural Methods for  Dynamic Branch Prediction

20

Organization of the Perceptron Predictor

Keeps a table of perceptrons, indexed by branch address Inputs are from branch history register Predict taken if output 0, otherwise predict not taken

Key intuition: table size isn't exponential in history length, so we can consider much longer histories

Page 21: Neural Methods for  Dynamic Branch Prediction

21

Results and Analysis for the Perceptron Predictor

Page 22: Neural Methods for  Dynamic Branch Prediction

22

Experimental Evaluation

Execution and trace driven simulations: Measure instruction throughput (IPC) and misprediction rates

SimpleScalar/Alpha [Burger & Austin `97]

Alpha 21264-like configuration:

4-wide issue, 64KB I-cache, 64KB D-cache, 512 entry BTB

SPECint 2000 benchmarks

Technological estimates: HSPICE for circuit delay estimates

Modified CACTI 2.0 [Agarwal 2000] for PHT delay estimates

Page 23: Neural Methods for  Dynamic Branch Prediction

23

Results: Predictor Accuracy

Perceptron outperforms competitive hybrid predictor by 36% at ~4KB; 1.71% vs. 2.66%

Page 24: Neural Methods for  Dynamic Branch Prediction

24

Results: Large Hardware Budgets

Multi-component hybrid was the most accurate fully dynamic predictor known in the literature [Evers 2000]

Perceptron predictor is even more accurate

Page 25: Neural Methods for  Dynamic Branch Prediction

25

Delay Sensitive Implementation

Even the relatively simple perceptron has high access delay

Our solution: An overriding perceptron predictor

First level is a single-cycle gshare

Second level is a 4KB, 23-bit history perceptron predictor

HSPICE total prediction delay estimates:

2 cycles at 833 MHz (like Alpha 21264)

4 cycles at 1.76 GHz (like Pentium 4)

Compare with 4KB hybrid predictor

Page 26: Neural Methods for  Dynamic Branch Prediction

26

Results: IPC with high clock rate

Pentium 4-like: 20 cycle misprediction penalty, 1.76 GHz 15.8% higher IPC than gshare, 5.7% higher than hybrid

Page 27: Neural Methods for  Dynamic Branch Prediction

27

Analysis: History Length

The fixed-length path branch predictor can also use long histories [Stark, Evers & Patt `98]

Page 28: Neural Methods for  Dynamic Branch Prediction

28

Analysis: Training Times

Perceptron “warms up’’ faster

Page 29: Neural Methods for  Dynamic Branch Prediction

29

Circuit-Level Implementation of a Neural Branch Predictor

Page 30: Neural Methods for  Dynamic Branch Prediction

30

Circuit-Level Implementation

Example output computation: 12

weights, Wallace tree of depth 6

followed by 14-bit carry-lookahead

adder

Delay is 2-4 cycles for longer histories

Carry-save adders have O(1)

depth, carry-lookahead adder

has O(log n) depth

Page 31: Neural Methods for  Dynamic Branch Prediction

31

HSPICE Perceptron Simulations

2 cycles at 833 MHz, 4 cycles at 1.76 GHz, 180 nm technology

Page 32: Neural Methods for  Dynamic Branch Prediction

32

Future Work and Conclusions

Page 33: Neural Methods for  Dynamic Branch Prediction

33

Future Work with Perceptron Predictor

Let's make the best predictor even better

Better representation

Better training algorithm

Latency is a problem

Crazy people are saying that overriding organizations don't work as

well as simple but large predictors [ Me, HPCA 2003 ]

How can we eliminate the latency of the perceptron predictor?

Page 34: Neural Methods for  Dynamic Branch Prediction

34

Future Work with Perceptron Predictor

Value prediction

Predict value of a load to mitigate memory latency

Indirect branch prediction

Virtual dispatch

Switch statements in C

Exit prediction

Predict the taken exit from predicated hyperblocks

Page 35: Neural Methods for  Dynamic Branch Prediction

35

Future Work Characterizing Predictability

Branch predictability, value predictability

How can we characterize algorithms in terms of their predictability?

Given an algorithm, how can we transform it so that its branches and

values are easier to predict?

How much predictability is inherent in the algorithm, and how much is

an artifact of the program structure?

How can we compare different algorithms' predictability?

Page 36: Neural Methods for  Dynamic Branch Prediction

36

Conclusions

Neural predictors can improve performance for deeply

pipelined microprocessors

Perceptron learning is well-suited for microarchitectural

implementation

There is still a lot of work left to be done on the perceptron

predictor in particular and microarchitectural prediction in

general

Page 37: Neural Methods for  Dynamic Branch Prediction

37

The End