20
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001

Statistical Simulation of Superscalar Architectures using Commercial Workloads

Embed Size (px)

DESCRIPTION

Statistical Simulation of Superscalar Architectures using Commercial Workloads. Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001. Outline. Introduction Statistical Simulation - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical Simulation of Superscalar Architectures using Commercial Workloads

Statistical Simulation of Superscalar Architectures

using Commercial Workloads

Lieven Eeckhout and Koen De Bosschere

Dept. of Electronics and Information Systems (ELIS)Ghent University, Belgium

CAECW’01, January 21, 2001

Page 2: Statistical Simulation of Superscalar Architectures using Commercial Workloads

2

Outline

• Introduction• Statistical Simulation

– Statistical profiling– Synthetic trace generation

• Methodology• Evaluation• Conclusion

Page 3: Statistical Simulation of Superscalar Architectures using Commercial Workloads

3

Introduction• Architectural simulation

– trace-driven or execution-driven– accurate– long simulation times– long traces to be stored

• Need for fast simulation techniques– take part of a full trace– analytical modeling– trace sampling– statistical simulation

Page 4: Statistical Simulation of Superscalar Architectures using Commercial Workloads

4

Goal

• Previous work used SPEC benchmarks to evaluate statistical simulation

• In this talk we use both commercial and scientific workloads– SPECint, SPECfp, system traces,

multimedia, X graphics, database

Page 5: Statistical Simulation of Superscalar Architectures using Commercial Workloads

5

Statistical Simulation• Three steps:

– extract statistical profile from a program execution

– generate synthetic trace from it– simulate on a trace-driven simulator

• Two major advantages:– statistical profile is more compact than

full trace– fast simulation due to statistical nature design space exploration in limited

time

Page 6: Statistical Simulation of Superscalar Architectures using Commercial Workloads

6

statisticalprofile

Statistical Simulation

real trace (e.g. SPEC benchmark)real trace (e.g. SPEC benchmark)

branch profilingbranch profiling cache profilingcache profiling instruction profilinginstruction profiling

branch statisticsbranch statistics cache statisticscache statistics instruction statisticsinstruction statistics

synthetic trace generatorsynthetic trace generator

synthetic tracesynthetic trace

trace-driven simulatortrace-driven simulator

Page 7: Statistical Simulation of Superscalar Architectures using Commercial Workloads

7

Statistical Profiling

• Microarchitecture-independent statistics– instruction statistics

• Microarchitecture-dependent statistics– branch statistics– cache statistics

• Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed)

Page 8: Statistical Simulation of Superscalar Architectures using Commercial Workloads

8

Statistical ProfilingInstruction Statistics

• Instruction mix (13 classes)• Number of register operands• Age of register operands

– probability that register operand was produced instructions before it in the trace (only RAW)

• Memory dependencies– probability that load is memory-dependent

on the -th store before it in the trace (only RAW)

Page 9: Statistical Simulation of Superscalar Architectures using Commercial Workloads

9

Statistical ProfilingBranch Statistics

• Six branch types– conditional branch, unconditional

branch, call with offset, indirect jump, indirect call, return

• Distinction– branch prediction accuracy: refill

pipeline on branch misprediction– branch target prediction accuracy:

single-cycle bubble in pipeline on correct branch prediction but target misprediction

Page 10: Statistical Simulation of Superscalar Architectures using Commercial Workloads

10

Statistical ProfilingCache Statistics

• D-cache statistics– L1 D-cache miss rate– L2 D-cache miss rate

• I-cache statistics– L1 I-cache miss rate– L2 I-cache miss rate

Page 11: Statistical Simulation of Superscalar Architectures using Commercial Workloads

11

Synthetic Trace Generation

Instruction-by-instructionthrough random number generation

Determine• instruction type• number of operands• age of register operands• memory dependency• branch behavior• D-cache behavior• I-cache behavior

st

add

ld

br mispredicted

D-cache miss

I-cache miss

Page 12: Statistical Simulation of Superscalar Architectures using Commercial Workloads

12

Methodology: microarchitecture

• Out-of-order processor– 8 and 16 issue– windows of 64 and 128 instructions

• McFarling branch predictor• ‘small’ cache configuration

– 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache

• ‘large’ cache configuration– 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache,

512KB 4WSA unified L2 cache• Access time

– L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles)

Page 13: Statistical Simulation of Superscalar Architectures using Commercial Workloads

13

Methodology: benchmarks

• 8 SPECint95 benchmarks• 5 SPECfp95 benchmarks (hydro2d, su2cor, swim,

tomcatv, wave5)• 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc,

sdet, nroff, groff)• 4 MediaBench applications (g721, gs, gsm, mpeg2)• 4 X graphics benchmarks (DooM, POVRay, Xanim,

Quake)• 2 TPC-D queries running on Postgres 6.3 ~ 200 million instructions / trace

Page 14: Statistical Simulation of Superscalar Architectures using Commercial Workloads

14

Evaluation• IPC prediction error = IPC real trace - IPC synthetic trace IPC real trace• IPC real trace = IPC when running real

trace on trace-driven simulator• IPC synthetic trace = IPC when running

synthetic trace generated from the statistical profile of the real trace

• Simulation speed: sIPC/xIPC less than 1% after simulating 1 million instructions

Page 15: Statistical Simulation of Superscalar Architectures using Commercial Workloads

15

IPC prediction error (1)157%135%

-30%

-20%

-10%

0%

10%

20%

30%

40%

hydro

2d

su2

cor

swim

tom

catv

wave5

mpeg

jpeg

gs

veri

log

real_

gcc

sdet

nro

ffgro

ffg7

21

_e gs

gsm

_em

peg2

xanim

xdoom

xpovra

yxquake

tpc-

d.1

7tp

c-d.2

IPC

pre

dic

tion e

rror

SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D

ligcc

com

pre

ss go

ijpeg

vort

ex

m8

8ks

imperl

16-issue, 128-entry window, ‘small’ cache configuration

high D-cachemiss rate

high D-cachemiss rate

Page 16: Statistical Simulation of Superscalar Architectures using Commercial Workloads

16

IPC prediction error (2)

-30%

-20%

-10%

0%

10%

20%

30%li

gcc

com

pre

ss go

ijpeg

vort

ex

m8

8ks

imperl

hydro

2d

su2

cor

swim

tom

catv

wave5

mpeg

jpeg

gs

veri

log

real_

gcc

sdet

nro

ffgro

ff

g7

21

_e gs

gsm

_em

peg2

xanim

xdoom

xpovra

yxquake

tpc-

d.1

7tp

c-d.2

IPC

pre

dic

tion e

rror

SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D

16-issue, 128-entry window, ‘large’ cache configuration

Page 17: Statistical Simulation of Superscalar Architectures using Commercial Workloads

17

IPC prediction error vs. static instruction count

-40%

-20%

0%

20%

40%

60%

80%

100%

120%

140%

160%

0 20000 40000 60000 80000 100000 120000 140000 160000

static instruction count (number of instructions executed at least once)

IPC

pre

dic

tion e

rror

w = 64; i = 8; 'small' cache

w = 128; i = 16; 'small' cache

w = 64; i = 8; 'large' cache

w = 128; i = 16; 'large' cache

DooMQuakeDooMQuake gs (IBS) gs (IBS)

gccgcc

gcc (IBS)gcc (IBS)

mpeg (IBS)groff

mpeg (IBS)groff

nroffjpeg (IBS)

verilogsdet

nroffjpeg (IBS)

verilogsdet

TPC-DTPC-Dvortex

govortex

go

Page 18: Statistical Simulation of Superscalar Architectures using Commercial Workloads

18

Conclusion (1)

• Higher IPC prediction errors for applications with smaller static instruction count:– MediaBench applications– SPECfp95 benchmarks– 2 X graphics benchmarks (POVRay and

Xanim)– 5 SPECint95 benchmarks

Page 19: Statistical Simulation of Superscalar Architectures using Commercial Workloads

19

Conclusion (2)

• Smaller IPC prediction errors for applications with larger instruction footprint:– IBS system traces– TPC-D traces– 2 X graphics benchmarks (DooM and

Quake)– 3 SPECint95 benchmarks (go, gcc, vortex) IPC prediction error between -1% and 25%

Page 20: Statistical Simulation of Superscalar Architectures using Commercial Workloads

20

Conclusion (3)

• Statistical simulation is a useful fast simulation technique for commercial workloads– due to higher variability in instructions– since commercial workloads have larger

instruction footprint– which makes a statistical technique more

powerful