Upload
moses-wilkerson
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Trace Substitution
Hans Vandierendonck,Hans Logie, Koen De Bosschere
Ghent University
EuroPar 2003, Klagenfurt
August 27, 2003 Euro-Par 2003 2
Instruction Fetch
• Wide-issue superscalar processors need to fetch multiple branches per cycle– IPC=8 implies fetching ~16 instructions/cycle and
predicting ~3 branches/cycle– Multi-ported instruction cache?
• Trace cache:– Packs fetch groups in a trace– Trace tagged with PC, path, next fetch PC– Multiple branch predictor (MBP) predicts branch
directions
August 27, 2003 Euro-Par 2003 3
The Trace Cache
instructioncache
tracecache
MBP
MUX
select
hit
pred. trace
pred. insn
fetch addressinstructionshit/miss
legend
pred. path
fetch address
next addressinstructions
fillunit
onlyexecuted
paths!
August 27, 2003 Euro-Par 2003 4
Overview
• Observation– Trace cache misses are (sometimes) branch
mispredictions
• Trace Substitution– How to make use of it
• Evaluation– Is it worth it?
• Conclusion
August 27, 2003 Euro-Par 2003 5
Observation
• Multiple branch predictor affects trace cache:– Non-perfect branch
predictors reduce the trace cache hit rate
– FIPA correlates better with TC hit rate than with MBP accuracy
TC: 16K-traces, 4-way set-assoc, path associativityMGAg, Mgshare: 12-bit historyrepeat: 8Kbit hybrid, accessed 3x
0
2
4
6
8
10
12
14
16
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
gcc vortex avg
FIP
A
70%
75%
80%
85%
90%
95%
100%
Hit
ra
te (
%)
FIPA MBP hits TC hits
August 27, 2003 Euro-Par 2003 6
TC Misses Are a Tell-Tale for MBP misses
• Trace cache misses coincide with branch mispredictions, e.g.:– 16K-entry trace cache, 12-bit MGAg:
• 84.9% of TC misses are also MBP misses• 37.6% of MBP misses are also TC misses
– 256-entry trace cache, 12 bit MGAg:• 25.1% of TC misses are also MBP misses• 55.9% of MBP misses are also TC misses
• This work: use TC misses to detect MBP misses and fix them
high accuracy,low coverage
low accuracy,higher coverage
August 27, 2003 Euro-Par 2003 7
Trace Substitution
• Assumption: TC miss implies MBP miss– Correlation between branches implies that some
paths never occur– TC stores only those paths that do occur
• If the predicted path is wrong …– Fetch a different trace– Override MBP with MRU trace starting at fetch PC
• Detect MRU trace from LRU bits stored in TC• No trace substitution applied if it does not exist
August 27, 2003 Euro-Par 2003 8
Implementation
instructioncache
tracecache
MBP
MUX
select
hit
MRU hit
MRU
pred. trace
pred. insn
fetch addressinstructionshit/miss
legend
pred. path
fetch address
next addressinstructions
fillunit
August 27, 2003 Euro-Par 2003 9
Evaluation Setup
• Benchmarks– SPECint95 (except compress, go), reference inputs– 500 million instructions from start of program– Compiled for Alpha ISA, Compaq C compiler, -O4
• Fetch Unit– TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at
system call, indirect jump– TC: 4-way set-assoc., path associativity– MBP: MGAg, varying history length– Instruction cache: 32K, 2-way, 32byte blocks, LRU
• Metric– FIPA = fetched instructions per fetch unit access
August 27, 2003 Euro-Par 2003 10
Evaluation (1)
• Observations:– Gap MGAg-perfect
increases with TC size– 20-40% of gap filled
with trace substitution– Only on TC miss, thus
performance increase drops with TC size
TC: 4-way set-associativeMGAg: 12-bit history
8
9
10
11
12
13
14
64 256 1024 4096 16384
Trace cache size (traces)
FIP
A
perfect
MGAg+subst
MGAg
August 27, 2003 Euro-Par 2003 11
Evaluation (2)
• Observations:– Compensate poor
branch predictor– No history ~ 10 bit
history– Improvement drops
with more accurate predictor
TC: 256 traces, 4-ways
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
12.0
0 2 4 6 8 10 12 14 16
Branch history length
FIP
A
MGAg+subst
MGAg
August 27, 2003 Euro-Par 2003 12
Accuracy vs. Usage
• Definitions:– Usage = substitutions
per fetch unit access– Accuracy = fraction
correct substitutions
• Note– Accuracy limited
because correct-path trace is not always present!
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0 2 4 6 8 10 12 14 16
Branch history length
Fra
ction o
f A
ccesses
Usage
Accuracy
TC: 256 traces, 4-way
August 27, 2003 Euro-Par 2003 13
Conclusion
• Proposed trace substitution– TC miss flags MBP miss
• Not always correct, not all MBP misses found• Fetch MRU trace instead: cheap implementation
• Results in– Consistent performance improvement
• No history+substitution ~ MGAg with 10-bit history• In other cases: 0.2 instructions/access
or same performance as with 16 times smaller MBP
• Most effective when MBP or TC is small