1
André Seznec Caps Team
IRISA/INRIA
Analysis of the O-GEHL branch predictor
Optimized GEometric History Length
André Seznec IRISA/INRIA/HIPEAC
2
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Objectives
State of the art accuracy:
Any gain in branch prediction accuracy results in performance gain and power consumption gain
Keep the design implementable: Rely on a single prediction scheme Use only global history information:
Designers hate maintaining speculative local history
3
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
L(0) ?
L(4)
L(3)
L(2)L(1)
TOT1
T2
T3
T4
The basis : A Multiple history length predictor
4
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Selecting between multiple predictions
Classic solution: Use of a meta predictor
“wasting” storage !?!
chosing among 5 or 10 predictions ??
Neural inspired predictors: Use an adder tree instead of a meta-predictor
Let’s use the adder tree
5
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
L(0) ∑
L(4)
L(3)
L(2)L(1)
TOT1
T2
T3
T4
Final computation through a sum
Prediction=Sign
6
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
From “old experience” on 2bcgskew
Some applications benefit from 100+ bits histories
Some don’t !!
7
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
GEometric History Length predictor
L(1)1iαL(i)
0 L(0)
The set of history lengths forms a geometric series
{0, 2, 4, 8, 16, 32, 64, 128}
What is important: L(i)-L(i-1) is drastically increasing
Spends most of the storage for short history !!
8
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Update policy
Perceptron inspired threshold based
Perceptron-like threshold did not work
Reasonable fixed threshold= Number of tables
9
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Dynamic update threshold fitting
On an O-GEHL predictor, best threshold depends on the application the predictor size the counter width
By chance, on most applications, for the best fixed threshold,
updates on mispredictions ≈ updates on correct predictions
Monitor the difference
and adapt the update threshold
10
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Adaptative history length fitting (inspired by Juan et al 98)
(½ applications: L(7) < 50) ≠
(½ applications: L(7) > 150)
Let us adapt some history lengths to the behavior of each application
8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)
11
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Adaptative history length fitting (2)
Intuition: if high degree of conflicts on T7, stick with
short history
Implementation: monitoring of aliasing on updates on T7
through a tag bit and a counter
Simple is sufficient: Flipping from short to long histories and
vice-versa
12
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Evaluation framework
1st Championship Branch Prediction traces:
20 traces including system activity
Floating point apps : loop dominated
Integer apps: usual SPECINT
Multimedia apps
Server workload apps: very large footprint
13
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Reference configurationpresented for Championship Branch Prediction
8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters
otherwise 1 Kbits of one bit tags associated with T7
10K + 5K + 6x8K + 1K = 64K
L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}
14
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
A case for the OGEHL predictor
2nd at CBP: 2.82 misp/KI
Best practice award: The predictor the closest to a possible hardware
implementation Does not use exotic features:
• Various prime numbers, etc• Strange initial state
Very short warming intervals: Chaining all simulations: 2.84 misp/KI
15
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
A case for the OGEHL predictor (2)
High accuracy 32Kbits (3,150): 3.41 misp/KI
better than any implementable 128 Kbits predictor before CBP
• 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI• 176 Kbits PBNP (43) : 3.67 misp/KI
1Mbits (5,300): 2.27 misp/KI• 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI• 1888 Kbits PBNP (58): 3.23 misp/KI
16
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
A case for the OGEHL predictor (3)
Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI
Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, 2.80 misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145,
266} 2.78 misp/KI
17
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Impact of the number of components
4 components — 8 components• 64 Kbits: 3.02 -- 2.84 misp/KI• 256Kbits: 2.59 -- 2.44 misp/KI• 1Mbit: 2.40 -- 2.27 misp/KI
6 components — 12 components• 48 Kbits: 3.02 – 3.03 misp/KI• 768Kbits: 2.35 – 2.25 misp/KI
4 to 12 components bring high accuracy
18
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Impact of counter width
Robustness to counter width variations: 3-bit counter, 49 Kbits: 3.09 misp/KI
Dynamic update threshold fitting helps a lot
5-bit counter 79 Kbits: 2.79 misp/KI
4-bit is the best tradeoff
19
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Prediction computation time
3 successive steps: Index computation: a 3-entry XOR gate Table read Adder tree
May not fit on a single cycle: But can be ahead pipelined !
20
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Ahead pipelining a global history branch predictor (principle)
Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:
• X-block ahead instruction address• X-block ahead history
To ensure accuracy: Use intermediate path information
21
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Practice
Ahead OGEHL:8 // prediction computations
bcd
Ha
A
A B C D
22
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
Ahead Pipelined 64 Kbits OGEHL
3-block: 2.94 misp/KI 4-block: 2.99 misp/KI 5-block: 3.04 misp/KI
Not such a huge accuracy loss
23
An
aly
sis
of
th
e O
-GE
HL
bra
nch
pre
dic
tor
André SeznecCaps Team
Irisa
A final case for the O-GEHL predictor
delivers state-of-the-art accuracy uses only global information:
Very long history: 200+ bits !! can be ahead pipelined many effective design points
Nb of tables, counter width, history lengths prediction computation logic complexity is low
(compared with concurrent predictors)