39
University of Maryland Towards a Methodology for Deliberate Sample- Based Statistical Performance Analysis Geoff Stoker 1

Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

  • Upload
    nova

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis. Geoff Stoker. Why Deliberate Statistical Profiling?. Move statistical profiling sample rate selection out of the realm of ad hoc - PowerPoint PPT Presentation

Citation preview

Page 1: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland1

Towards a Methodology for Deliberate Sample-Based

Statistical Performance Analysis

Geoff Stoker

Page 2: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland2

Why Deliberate Statistical Profiling?

• Move statistical profiling sample rate selection out of the realm of ad hoc

• Use mathematical model to balance change in statistical accuracy and effects of perturbation

• Allow measured program parameters and system context to inform sampling

• Likely to scale better

Page 3: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland3

Abstract Model

true performance

Mea

sure

d Pe

rfor

man

ce

total # of samples

PerturbationError

MeasurementError

Best PossibleMeasurement Point

How much execution is attributable to foo?

Page 4: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland

Analytical ModelHow much execution time is attributable to foo?

• t(n) – execution time of foo, after n samples• n – number of samples• o – overhead time cost per sample• p – foo’s proportion of total program execution time• T – uninstrumented (true) total program execution time• z – standard score (z-value)

n

ppzpnopnt

)1(T)(

4

true performance

Mea

sure

d Pe

rfor

man

ce

total # of samples

PerturbationError

MeasurementError

Best PossibleMeasurement Point

Page 5: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland5

Assumptions

• Time is a representative surrogate for all perturbation effects

• Hypergeometric distribution is appropriately approximated by a Normal (Gaussian) where n>=100; and M, N >> n

• Systematic sampling provides results similar to random sampling and occurs asynchronously with periodic events in the measured program

Page 6: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland6

Example• Predict the expected range of measurement results for foo

(20% of a program’s execution) at the 95% confidence level

• o = 250 µseconds• p = .20• T = 300 seconds• z = 1.96 for 95% confidence level

n

ppzpnopnt

)1(T)(

Page 7: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland7

Analytical Model Prediction

9002400

39005400

69008400

990011400

1290014400

1590017400

1890020400

2190023400

2490026400

2790029400

3090032400

3390035400

3690038400

3990041400

4290044400

45

50

55

60

65

70

75

total # samples taken during program execution

pred

icte

d 95

% c

onfid

ence

inte

rval

tim

e (s

ec) f

or fo

o

minimum

Page 8: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland8

Example Continued

3

2

2

)1(T

op

ppzn

3

2

2).000250(.2

)2.1(2.300(1.96)

n

686,170001.

235.2 32

n

T = 300, z = 1.96, p = .2, o = .000250

Page 9: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland9

Analytical Model Prediction

9002400

39005400

69008400

990011400

1290014400

1590017400

1890020400

2190023400

2490026400

2790029400

3090032400

3390035400

3690038400

3990041400

4290044400

45

50

55

60

65

70

75

total # samples taken during program execution

pred

icte

d 95

% c

onfid

ence

inte

rval

tim

e (s

ec) f

or fo

o

minimum

Page 10: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland10

Simulation

• 1,000,000 int array• int=300 µsec of exec• 200,000, p=.2, foo• Shuffle• Draw rnd sample 1000x

– 900, then every 1,500 up to 44,400

• Sample rate 3/sec – 148/sec

• Assess 250 µsec/sample

array shuffled sample

ss

ssss

sss

s

Page 11: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland11

Simulation Results

9002400

39005400

69008400

990011400

1290014400

1590017400

1890020400

2190023400

2490026400

2790029400

3090032400

3390035400

3690038400

3990041400

4290044400

45

50

55

60

65

70

75

total # samples taken during simulated program execution

time

(sec

) cal

cula

ted

for f

oo

Page 12: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland12

Signal Handler

Experiment

• Measured program – executes ≈ 300 sec– 1,000,000 function calls– 300 µsec functions– Compiled with -g

• Tool– Forks measured program– Initialization, signal

handler, close-out– 23 different sample

rates: 3/sec to 166/sec

s

s

s

s

s

s

Setup

Close-out

Page 13: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland13

Experimental Results

896.55

2394.03

3939.15

5450.33

6978.75

8583.46999999993

10024.85

11583.27

13107.69

14370.16

15901.25

17795.1499999999

18926.7

20204.71

21674.22

23368.87

25354.52

27708.6299999999

32019.37

35464.2

41090.75

44105.58

51774.8645

50

55

60

65

70

75

total # samples taken during program execution

time

(sec

) cal

cula

ted

for f

oo

Page 14: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland14

Combined Results90

090

224

0024

0239

0039

0254

0054

0269

0069

0284

0084

0299

0099

0211

400

1140

212

900

1290

214

400

1440

215

900

1590

217

400

1740

218

900

1890

220

400

2040

221

900

2190

223

400

2340

224

900

2490

227

900

2790

232

400

3240

235

400

3540

241

400

4140

244

400

4440

2

45.00

50.00

55.00

60.00

65.00

70.00

75.00Series11

simulation

Series1

total # samples taken during program execution

time(

sec)

cal

cula

ted

for f

oo

Page 15: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland15

Experiments with SPEC

• Omnetpp– Runtime ≈ 340 sec– 115 runs at 2 samples/sec to establish “truth” – 10 runs at 28 different sample rates (hpcrun 4.9.9)– Use Mean Absolute Percent Error (MAPE) to determine

closest sample set to “truth”• Bzip2– Similar experiment procedure– Runtime ≈ 90 sec– Look at functions 1-4

Page 16: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland16

Omnetpp Analysis

0 20,000 40,000 60,000 80,000 100,000 120,000 140,00044.7

49.7

54.7

59.7

64.7

Total Samples

Exec

ution

Tim

e

0 20,000 40,000 60,000 80,000 100,000 120,000 140,0000.1385

0.1485

0.1585

0.1685

0.1785

Total Samples

% E

xecu

tion

Tim

e

Distribution of 10 runs at 28 different sampling intervals - cMessageHeap::shiftup(int)

Page 17: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland17

Omnetpp Analysis

16653300

49196477

969310946

1252414162

1544216289

1686617768

1902519995

2146122515

2446726116

2811631264

3432837518

4228248325

5680368581

84739

1146350

0.01

0.02

0.03

0.04

0.05

0.06

Total Samples

Mea

n Ab

solu

te %

Err

or

T=343, z=1.96, p=.1585, o=.000068n=50,623

Page 18: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland18

Bzip2 Analysis

0 5,000 10,000 15,000 20,000 25,000 30,0000.105

0.115

0.125

0.135

0.145

0.155

0.165

0.175

0.185

mainGtU – distribution of 10 runs at 28 different sampling intervals

Total Samples

% E

xecu

tion

Tim

e

Page 19: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland19

Bzip2 Analysis

0 5,000 10,000 15,000 20,000 25,000 30,0000.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

Comparison of 2nd – 4th most computationally expensive functions at se-lected sample intervals

mainGtU

BZ2_decompress

BZ2_compressBlock

Total Samples

% E

xecu

tion

Tim

e

Page 20: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland20

Bzip2 Analysis

431853

12621674

24932808

32373629

39614213

43984597

48435122

54405805

62326708

72557916

87089694

1090612462

1451917446

2183729198

0.00000

0.02000

0.04000

0.06000

0.08000

0.10000

0.12000mainGtUBZ2_blockSort

Total Samples

Mea

n Ab

solu

te %

Err

or

T=87, z=1.96, p=.385, o=.000050n=16,685

Page 21: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland21

Some Concerns of Oversampling

• Tool performance variation• Misleading results– Functions heavily perturbed by sampling (SPEC

OMP examples)

Page 22: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland22

Example Analytical Result

632 4424 8216 12008 15800 19592 23384 27176 30968 34760 38552 42344 46136 49928 53720 57512 6130460.00

61.00

62.00

63.00

64.00

65.00

66.00

67.00

68.00

Tool1Tool2

total # of samples

Best PossibleMeasurement Point

perf

orm

ance

(tim

e)

Published Result

Page 23: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland23

Apsi Analysis

0 20,000 40,000 60,000 80,000 100,000 120,000 140,0000

0.02

0.04

0.06

0.08

0.1

0.12

3rd – 6th most expensive functions; 11 run sets; 4 core machine

radb3_ radf3_

radb2_ leapfr_.omp_fn.23

Total Samples(sample intervals of 1 s, 500 ms, 200 ms, 100 ms, 80 ms, 60 ms, 50 ms, 40 ms, 25 ms)

% E

xecu

tion

Tim

e

Page 24: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland24

Fma3d Analysis

1,000 10,000 100,000 1,000,0000.09

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

2nd and 3rd most expensive functions ; 11 run sets; 4 core machine

scatter_element_nodal_forces.omp_fn.5

khplq_gradient_operator_

Total Samples - log scale(sample intervals of 1 s, 500 ms, 200 ms, 100 ms, 50 ms, 25 ms, 10 ms)

% E

xecu

tion

Tim

e

Page 25: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland25

Future Work

• More experiments with additional sequential and parallel SPEC programs

• Overhead calculation process• Overhead function enhancement• Deliberate statistical profiling methodology

refinement

Page 26: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland26

Conclusion

• Oversampling can generate misleading analysis

• Deliberate statistical profiling can lead to better results

• Questions??

Page 27: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland27

Backup Slides

Page 28: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland28

Determining Sample Size

• Sample size for determining proportions

– Jain: r is CI for p/100

– Lilja: r is CI for ci/p

2

2 )1(

r

ppzn

22 )1(

pr

ppzn

n

ppzp

)1(

Page 29: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland29

Effective Sample Rates

0 50 100 150 200 250 300 350 400 450 5000

50

100

150

200

250

300

350

400

450

500

target sample rate

actu

al sa

mpl

e ra

te

Page 30: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland30

Omnetpp “Truth”

660 670 680 690 700 710 7200.1185

0.1385

0.1585

0.1785

0.1985

Page 31: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland31

Analytical Model

n

ppzpnopnt

)1(T)(

n

mp pnt mΤ)(

TTm no pnont T)(

pnopnt T)( n

pp )1(

How much execution time is attributable to foo?

Page 32: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland32

Sample Size and Accuracy

90% 95% 99% 100% 100%0

5,00010,00015,00020,00025,00030,00035,00040,000

Sample sizes required for +/- 1% accuracy

0.50.40.30.20.10.050.01

confidence levels

sam

ple

size

90% 95% 99% 100% 100%0

500,0001,000,0001,500,0002,000,0002,500,0003,000,0003,500,0004,000,000

Sample sizes required for +/- .1% accuracy

0.50.40.30.20.10.050.01

confidence levels

sam

ple

size

90% 95% 99% 100% 100%0

50,000,000100,000,000150,000,000200,000,000250,000,000300,000,000350,000,000400,000,000

Sample sizes required for +/- .01% accuracy

0.50.40.30.20.10.050.01

confidence levels

sam

ple

size

One order of magnitude accuracy change

Two orders of magnitude sample size

change

Page 33: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland33

Value of p(1-p)

0.010.04

0.07 0.10.13

0.160.19

0.220.25

0.28

0.3100000000000010.34

0.37 0.40.43

0.460.49

0.520.55

0.58

0.610000000000001

0.640000000000002

0.670000000000002

0.700000000000001

0.730000000000001

0.7600000000000020.79

0.820000000000001

0.8500000000000010.88

0.91

0.940000000000001

0.9700000000000010

0.05

0.1

0.15

0.2

0.25

value of p

n

ppzpnopnt

)1(T)(

Page 34: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland34

Mathematical Model

n

ppzpnopnt

)1(T)(

3

2

2

)1(T

op

ppzn

32

)1(T)(

n

ppzopnt

Page 35: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland35

Sample Size and Accuracy cont

0.010.05 0.1

0.15 0.20.25 0.3

0.35 0.40.45 0.5

0.55

0.600000000...

0.650000000...

0.700000000...

0.750000000... 0.8

0.850000000... 0.9

0.950000000...0.99

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

99%95%90%

p values

sam

ple

coun

t

Page 36: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland36

Another Look at Statistical Results

9002400

39005400

69008400

990011400

1290014400

1590017400

1890020400

2190023400

2490026400

2790029400

3090032400

3390035400

3690038400

3990041400

4290044400

0

0.5

1

1.5

2

2.5

3

9002400

39005400

69008400

990011400

1290014400

1590017400

1890020400

2190023400

2490026400

2790029400

3090032400

3390035400

3690038400

3990041400

4290044400

0100200300400500600700800900

1000

Chart Title

Page 37: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland37

Sample of Sampling Practices

• 100 samples/sec [gprof, XProfiler]• 200 samples/sec [T09]• 1000 samples/sec [Intel VTune]• 5200 samples/sec [DCPI, A91]• 10,000 samples/sec [A05]• 2.5% all memory ops [O05]• 15 sec CPU, 10 sec mem analysis [R08]• 1,000,000 mem accesses, skip 9,000,000 [W08]

Page 38: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland38

Current Practice

“To ensure good statistical coverage of profiled code, one must collect a large number of samples, either by measuring over a long interval, or by using a high sample rate.”

“Volume and accuracy are antithetical”

Page 39: Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis

University of Maryland39

References• A91 – Anderson, Berc, Dean, Ghemawat, …• A91 – Andersland, Casavant• A05 – Azimi, Stumm, Wisniewski• K71 – Knuth• K05 – Kumar, Childers, Soffa• L07 – Lahiri, Chatterjee, Maiti• M92 – Malony, Reed, Wijshoff• M07 – Malony, Shende, Morris, Wolf• N04 – Najafzadeh, Chaiken• O05 – Odom, Hollingsworth, DeRose, Ekanadham, …• P95 – Miller, Callaghan, Cargille, Hollingsworth, …• R08 - Reiss• T09 – Tallent, Mellor-Crummey, Fagan• V99 – Vetter, Reed• W47 – Wald• W08 – Weinberg, Snavely