40
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Perceptron-based Global Confidence Estimation for Value Prediction

Master’s Thesis

Michael Black

June 26, 2003

Thesis Objectives

• To present a viable global confidence estimator using perceptrons

• To quantify predictability relationships between instructions

• To study the performance of the global confidence estimator when used with common value prediction methods

Presentation Outline

• Background:– Data Value Prediction– Confidence Estimation

• Predictability Relationships

• Perceptrons

• Perceptron-based Confidence Estimator

• Experimental Results and Conclusions

Value Locality

Suppose instruction 1 has been executed several times before:

I 1: 5 (A) = 3 (B) + 2 (C)

. . .

I 1: 6 (A) = 4 (B) + 2 (C)

. . .

I 1: 7 (A) = 5 (B) + 2 (C)

Next time, its outcome A will probably be 8

Data Value Prediction

• A data value predictor predicts A from instruction 1’s past outcomes

• Instruction 2 speculatively executes using the prediction

1. ADD 7 (A) = 5 (B) + 2 (C)

1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A)

Predictor: +1

Types of Value Predictors

• Computational:Performs a mathematical operation on past values– Last-Value: 5, 5, 5, 5 5 – Stride: 1, 3, 5, 7 9

• Context:Learns repeating sequences of numbers

3, 6, 5, 3, 6, 5, 3 6

Types of Value History

• Local History:Predicts using data from past instances of

instructions

• Global History:Predicts using data from other instructions

Local value prediction is more conventional

Are mispredictions a problem?

• If a prediction is incorrect, speculatively executed instructions must be re-executed

• This can result in:– Cycle penalties for detecting the misprediction– Cycle penalties for restarting dependent instructions– Incorrect resolution of dependent branch instructions

It is better to not predict at all than to mispredict

Confidence Estimator

• Decides whether to make a prediction for an instruction

• Bases decisions on the accuracy of past predictions

• Common confidence estimation method:Saturating Up-Down Counter

Up-Down Counter

Don’t Predict

Don’t Predict

Don’t Predict

Predict

Start

Correct

Incorrect

Correct Correct Correct

IncorrectIncorrectIncorrect

Threshold

Local vs. Global

• Up-Down counter is local– Only past instances of an instruction affect its

counter

• Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions

• Problem with global:– Not every past instruction affects the predictability

of the current instruction

Example

I 1. A = B + C

I 2. F = G – H

I 3. E = A + A

• Instruction 3 depends on 1 but not on 2– Instruction 3’s predictability is related to 1 but not 2

• If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly

Is global confidence worthwhile?

• Fewer mispredictions than local– If an instruction mispredicts, its dependent

instructions know not to predict

• Less warm-up time than local– Instructions need not be executed several times

before accurate confidence decisions can be made

How common are predictability relationships?

Simulation study:– How many instructions in a program predict

correctly only when a previous instruction predicts correctly?

– Which past instructions have the most influence?

Predictability RelationshipsPercentage of Instructions with the Same

Predictability as a Previous Instruction

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Stride Last-Value Context

Value Predictor Type

Pe

rce

nta

ge

of

Inst

ruct

ion

s

100%

99%

95%

90%

Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!

Predictability Relationships

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%1 11

21

31

41

51

61

71

81

91

101

111

121

131

141

151

161

171

181

191

201

211

221

231

241

251

Previous Instruction's Distance from Current

Pe

rce

nta

ge

of

Ins

tru

cti

on

s w

ith

Sam

e P

red

icta

bilit

y a

s

Cu

rre

nt

Stride

Context

Last-Value

The most recent 10 instructions have the most influence

Global Confidence Estimation

A global confidence estimator must:

1. Identify for each instruction which past instructions have similar predictability

2. Use their prediction accuracy to decide whether to predict or not predict

Neural Network

• Used to iteratively learn unknown functions from examples

• Consists of nodes and links

• Each link has a numeric weight

• Data is fed to input nodes and propagated to output nodes by the links

• Desired output used to adjust (“train”) the weights

Perceptron

• Perceptrons only have input and output nodes

• They are much easier to implement and train than larger neural networks

• Can only learn linearly separable functions

Perceptron Computation

• Each bit of input data sourced to an input node

• Dot product calculated between input data and weights

• Output is “1” if dot product exceeds a threshold; otherwise “0”

Perceptron Training

• Weights adjusted so that the perceptron output = the desired output for the given input

• Error value (ε) = desired value – perceptron output• ε times each input bit added to each weight

kkk iww

Weights

• Weights determine the effect of each input on the output

• Positive weight:

Output varies directly with input bit

• Negative weight:

Output varies inversely with input bit

• Large weight:

Input has strong effect on output

• Zero weight

Input bit has no effect on output

Linear Separability

• An input may have a direct influence on the output

• An input may instead have an inverse influence on the output

• But an input cannot have a direct influence sometimes and an inverse influence at other times

Perceptron Confidence Estimator

• Each input node is a past instruction’s prediction outcome:

(1 = correct, –1 = incorrect)• The output is the decision to predict:

(1 = predict, 0 = don’t predict)• Weights determine past instruction’s predictability influence on

the current instruction:– Positive weight: current instruction mispredicts when past

instruction mispredicts– Negative weight: current instruction mispredicts when past

instruction predicts correctly– Zero weight: past instruction does not affect current

Perceptron Confidence Estimator

Example weights:bias weight = –1

I 1: A = B C weight = 1

I 2: D = E + F weight = 1

I 3: P = Q R weight = 0

I 4: G = A + D (current instruction)

Instruction 4 predicts correctly only when 1 and 2 predict correctly

Confidence Estimator Organization

Perceptron Implementation

Weight Value Distribution

Simulation Study:– What are typical perceptron weight values?– How does the type of predictor influence the

weight distribution?– What minimum range do the weights need to

have?

Weight Value Distribution

0.00001%

0.00010%

0.00100%

0.01000%

0.10000%

1.00000%

10.00000%

-255

-241

-227

-213

-199

-185

-171

-157

-143

-129

-115

-101 -87

-73

-59

-45

-31

-17 -3 11 25 39 53 67 81 95 109

123

137

151

165

179

193

207

221

235

249

Possible Perceptron Weight Values

Pe

rce

nta

ge

of

We

igh

ts a

t V

alu

e

Stride

Last Value

Context

Simulation Methodology

• Measurements simulated using SimpleScalar 2.0a• SPEC2000 benchmarks:

bzip2, gcc, gzip, perlbmk, twolf, vortex

• Each benchmark is run for 500 million instructions

• Value predictors: Stride, Last-Value, Context• Baseline confidence estimator: 2-bit up-down

counter

Simulation Metrics

PCORRECT: # of correct predictions

PINCORRECT: # of incorrect predictions

N: # of cases where no prediction was made

NPP

PPCoverage

INCORRECTCORRECT

INCORRECTCORRECT

INCORRECTCORRECT

CORRECT

PP

PAccuracy

Stride Results

47.3%

51.9%

28.4%

38.6%

28.7%

34.6%

20.2%

30.8%33.5%

42.3% 43.0%

52.1%

97.2% 97.2%

92.2%95.4%

92.5% 93.7% 92.3%

98.4%

91.1%94.6%

97.3%99.5%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Pe

rce

nta

ge

of

Ins

tru

cti

on

s P

red

icte

d (

Co

ve

rag

e)

Pe

rce

nta

ge

of

Co

rre

ct

Pre

dic

tio

ns

(A

ccu

racy)

Coverage Accuracy

gzipgcc perlbmkbzip2 twolf vortex

Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter

Last-Value Results

36.7%

42.7%

31.6%

44.5%

20.2%

27.1%23.0%

37.4%

30.8%

39.9%43.3%

55.2%

94.9% 96.5%

87.2%

96.2%

87.9%91.5%

89.1%

99.1%

87.8%

93.8% 94.4%

99.6%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

up-d

own

perc

eptro

n

up-d

own

perc

eptro

n

up-d

own

perc

eptro

n

up-d

own

perc

eptro

n

up-d

own

perc

eptro

n

up-d

own

perc

eptro

n

Pe

rce

nta

ge

of

Ins

tru

ctio

ns

Pre

dic

ted

(C

ove

rag

e)

Pe

rce

nta

ge

of

Co

rre

ct P

red

icti

on

s (

Acc

ura

cy)

Coverage Accuracy

gzipgcc perlbmkbzip2 twolf vortex

Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter

Context Results

29.7%33.0%

25.4%

35.2%

10.3%14.8%

19.5%

32.1%29.7% 27.4%

38.5%

47.3%

96.8%93.3%

88.4% 89.1% 91.0%

81.0%

90.4%93.3%

96.8%

90.0%

97.0% 96.3%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Pe

rce

nta

ge

of

Ins

tru

ctio

ns

Pre

dic

ted

(C

ove

rag

e)

Pe

rce

nta

ge

of

Co

rre

ct P

red

icti

on

s (

Acc

ura

cy)

Coverage Accuracy

gzipgcc perlbmkbzip2 twolf vortex

Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter

Sensitivity to GPH size

41.0%

33.5%

28.5%

41.7% 41.7% 41.7% 40.1%

30.9%

25.1%

41.1% 41.1% 41.1%

31.7% 31.7%

23.8%

31.6% 31.6% 31.6%

87.7%

93.7%96.4%

91.3%93.4%

96.5%

79.7%

90.2%

95.6%

88.4%

93.1%96.1%

82.0%

93.4% 94.8%

83.6%86.1%

90.2%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Last-Value

Pe

rce

nta

ge

of

Ins

tru

ctio

ns

Pre

dic

ted

(C

ove

rag

e)

Pe

rce

nta

ge

of

Co

rre

ct P

red

icti

on

s (

Acc

ura

cy)

Coverage Accuracy

Stride Context

Coverage Sensitivity to the Unavailability of Past Instructions

41.1%

31.6%

40.7%40.9%41.7%

41.0%

41.1%41.1% 41.1%

31.6%31.6%

31.3%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

45.0%

0 10 20 30 40 50 60 70

Number of Unavailable Instruction Outcomes

Pe

rce

nta

ge

of

Inst

ruct

ion

s C

ov

ere

d

Stride

Last Value

Context

Accuracy Sensitivity to the Unavailability of Past Instructions

88.7%

90.5%

86.2%

89.7%91.3%

96.5%

91.3%

88.9%

96.1%

92.0%

88.6%87.6%

75.0%

80.0%

85.0%

90.0%

95.0%

100.0%

0 10 20 30 40 50 60 70

Number of Unavailable Instruction Outcomes

Pe

rce

nta

ge

of

Co

rre

ct P

red

icti

on

s

Stride

Last Value

Context

Coverage Sensitivity to Weight Range Limitations

43.4%43.4%42.7% 41.7%41.7%41.8%

41.1%41.1%41.1%41.8%41.7%40.7%

31.6%31.6%31.7%32.7%32.6%31.7%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

45.0%

50.0%

2 3 4 7 9 unlimited

Perceptron Weight Size (bits)

Pe

rce

nta

ge

of

Inst

ruct

ion

s C

ov

ere

d

Stride

Last Value

Context

Accuracy Sensitivity to Weight Range Limitations

87.0%

89.1%

91.7%

83.2%

86.7%

89.9%

94.9%

96.1%

85.2%

86.9%

89.5%90.5% 90.5%

96.5%96.4%95.5%

96.1%

83.6%

75.0%

80.0%

85.0%

90.0%

95.0%

100.0%

2 3 4 7 9 unlimited

Perceptron Weight Size (bits)

Pe

rce

nta

ge

of

Co

rre

ct P

red

icti

on

s

Stride

Last Value

Context

Conclusions

• Mispredictions are a problem in data value prediction

• Benchmark programs exhibit strong predictability relationships between instructions

• Perceptrons enable confidence estimators to exploit these predictability relationships

• Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation