Upload
fatima-ortega
View
37
Download
2
Embed Size (px)
DESCRIPTION
Statistical Simulation of Superscalar Architectures using Commercial Workloads. Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001. Outline. Introduction Statistical Simulation - PowerPoint PPT Presentation
Citation preview
Statistical Simulation of Superscalar Architectures
using Commercial Workloads
Lieven Eeckhout and Koen De Bosschere
Dept. of Electronics and Information Systems (ELIS)Ghent University, Belgium
CAECW’01, January 21, 2001
2
Outline
• Introduction• Statistical Simulation
– Statistical profiling– Synthetic trace generation
• Methodology• Evaluation• Conclusion
3
Introduction• Architectural simulation
– trace-driven or execution-driven– accurate– long simulation times– long traces to be stored
• Need for fast simulation techniques– take part of a full trace– analytical modeling– trace sampling– statistical simulation
4
Goal
• Previous work used SPEC benchmarks to evaluate statistical simulation
• In this talk we use both commercial and scientific workloads– SPECint, SPECfp, system traces,
multimedia, X graphics, database
5
Statistical Simulation• Three steps:
– extract statistical profile from a program execution
– generate synthetic trace from it– simulate on a trace-driven simulator
• Two major advantages:– statistical profile is more compact than
full trace– fast simulation due to statistical nature design space exploration in limited
time
6
statisticalprofile
Statistical Simulation
real trace (e.g. SPEC benchmark)real trace (e.g. SPEC benchmark)
branch profilingbranch profiling cache profilingcache profiling instruction profilinginstruction profiling
branch statisticsbranch statistics cache statisticscache statistics instruction statisticsinstruction statistics
synthetic trace generatorsynthetic trace generator
synthetic tracesynthetic trace
trace-driven simulatortrace-driven simulator
7
Statistical Profiling
• Microarchitecture-independent statistics– instruction statistics
• Microarchitecture-dependent statistics– branch statistics– cache statistics
• Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed)
8
Statistical ProfilingInstruction Statistics
• Instruction mix (13 classes)• Number of register operands• Age of register operands
– probability that register operand was produced instructions before it in the trace (only RAW)
• Memory dependencies– probability that load is memory-dependent
on the -th store before it in the trace (only RAW)
9
Statistical ProfilingBranch Statistics
• Six branch types– conditional branch, unconditional
branch, call with offset, indirect jump, indirect call, return
• Distinction– branch prediction accuracy: refill
pipeline on branch misprediction– branch target prediction accuracy:
single-cycle bubble in pipeline on correct branch prediction but target misprediction
10
Statistical ProfilingCache Statistics
• D-cache statistics– L1 D-cache miss rate– L2 D-cache miss rate
• I-cache statistics– L1 I-cache miss rate– L2 I-cache miss rate
11
Synthetic Trace Generation
Instruction-by-instructionthrough random number generation
Determine• instruction type• number of operands• age of register operands• memory dependency• branch behavior• D-cache behavior• I-cache behavior
st
add
ld
br mispredicted
D-cache miss
I-cache miss
12
Methodology: microarchitecture
• Out-of-order processor– 8 and 16 issue– windows of 64 and 128 instructions
• McFarling branch predictor• ‘small’ cache configuration
– 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache
• ‘large’ cache configuration– 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache,
512KB 4WSA unified L2 cache• Access time
– L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles)
13
Methodology: benchmarks
• 8 SPECint95 benchmarks• 5 SPECfp95 benchmarks (hydro2d, su2cor, swim,
tomcatv, wave5)• 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc,
sdet, nroff, groff)• 4 MediaBench applications (g721, gs, gsm, mpeg2)• 4 X graphics benchmarks (DooM, POVRay, Xanim,
Quake)• 2 TPC-D queries running on Postgres 6.3 ~ 200 million instructions / trace
14
Evaluation• IPC prediction error = IPC real trace - IPC synthetic trace IPC real trace• IPC real trace = IPC when running real
trace on trace-driven simulator• IPC synthetic trace = IPC when running
synthetic trace generated from the statistical profile of the real trace
• Simulation speed: sIPC/xIPC less than 1% after simulating 1 million instructions
15
IPC prediction error (1)157%135%
-30%
-20%
-10%
0%
10%
20%
30%
40%
hydro
2d
su2
cor
swim
tom
catv
wave5
mpeg
jpeg
gs
veri
log
real_
gcc
sdet
nro
ffgro
ffg7
21
_e gs
gsm
_em
peg2
xanim
xdoom
xpovra
yxquake
tpc-
d.1
7tp
c-d.2
IPC
pre
dic
tion e
rror
SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D
ligcc
com
pre
ss go
ijpeg
vort
ex
m8
8ks
imperl
16-issue, 128-entry window, ‘small’ cache configuration
high D-cachemiss rate
high D-cachemiss rate
16
IPC prediction error (2)
-30%
-20%
-10%
0%
10%
20%
30%li
gcc
com
pre
ss go
ijpeg
vort
ex
m8
8ks
imperl
hydro
2d
su2
cor
swim
tom
catv
wave5
mpeg
jpeg
gs
veri
log
real_
gcc
sdet
nro
ffgro
ff
g7
21
_e gs
gsm
_em
peg2
xanim
xdoom
xpovra
yxquake
tpc-
d.1
7tp
c-d.2
IPC
pre
dic
tion e
rror
SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D
16-issue, 128-entry window, ‘large’ cache configuration
17
IPC prediction error vs. static instruction count
-40%
-20%
0%
20%
40%
60%
80%
100%
120%
140%
160%
0 20000 40000 60000 80000 100000 120000 140000 160000
static instruction count (number of instructions executed at least once)
IPC
pre
dic
tion e
rror
w = 64; i = 8; 'small' cache
w = 128; i = 16; 'small' cache
w = 64; i = 8; 'large' cache
w = 128; i = 16; 'large' cache
DooMQuakeDooMQuake gs (IBS) gs (IBS)
gccgcc
gcc (IBS)gcc (IBS)
mpeg (IBS)groff
mpeg (IBS)groff
nroffjpeg (IBS)
verilogsdet
nroffjpeg (IBS)
verilogsdet
TPC-DTPC-Dvortex
govortex
go
18
Conclusion (1)
• Higher IPC prediction errors for applications with smaller static instruction count:– MediaBench applications– SPECfp95 benchmarks– 2 X graphics benchmarks (POVRay and
Xanim)– 5 SPECint95 benchmarks
19
Conclusion (2)
• Smaller IPC prediction errors for applications with larger instruction footprint:– IBS system traces– TPC-D traces– 2 X graphics benchmarks (DooM and
Quake)– 3 SPECint95 benchmarks (go, gcc, vortex) IPC prediction error between -1% and 25%
20
Conclusion (3)
• Statistical simulation is a useful fast simulation technique for commercial workloads– due to higher variability in instructions– since commercial workloads have larger
instruction footprint– which makes a statistical technique more
powerful