Upload
nova
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Towards a Methodology for Deliberate Sample-Based Statistical Performance Analysis. Geoff Stoker. Why Deliberate Statistical Profiling?. Move statistical profiling sample rate selection out of the realm of ad hoc - PowerPoint PPT Presentation
Citation preview
University of Maryland1
Towards a Methodology for Deliberate Sample-Based
Statistical Performance Analysis
Geoff Stoker
University of Maryland2
Why Deliberate Statistical Profiling?
• Move statistical profiling sample rate selection out of the realm of ad hoc
• Use mathematical model to balance change in statistical accuracy and effects of perturbation
• Allow measured program parameters and system context to inform sampling
• Likely to scale better
University of Maryland3
Abstract Model
true performance
Mea
sure
d Pe
rfor
man
ce
total # of samples
PerturbationError
MeasurementError
Best PossibleMeasurement Point
How much execution is attributable to foo?
University of Maryland
Analytical ModelHow much execution time is attributable to foo?
• t(n) – execution time of foo, after n samples• n – number of samples• o – overhead time cost per sample• p – foo’s proportion of total program execution time• T – uninstrumented (true) total program execution time• z – standard score (z-value)
n
ppzpnopnt
)1(T)(
4
true performance
Mea
sure
d Pe
rfor
man
ce
total # of samples
PerturbationError
MeasurementError
Best PossibleMeasurement Point
University of Maryland5
Assumptions
• Time is a representative surrogate for all perturbation effects
• Hypergeometric distribution is appropriately approximated by a Normal (Gaussian) where n>=100; and M, N >> n
• Systematic sampling provides results similar to random sampling and occurs asynchronously with periodic events in the measured program
University of Maryland6
Example• Predict the expected range of measurement results for foo
(20% of a program’s execution) at the 95% confidence level
• o = 250 µseconds• p = .20• T = 300 seconds• z = 1.96 for 95% confidence level
n
ppzpnopnt
)1(T)(
University of Maryland7
Analytical Model Prediction
9002400
39005400
69008400
990011400
1290014400
1590017400
1890020400
2190023400
2490026400
2790029400
3090032400
3390035400
3690038400
3990041400
4290044400
45
50
55
60
65
70
75
total # samples taken during program execution
pred
icte
d 95
% c
onfid
ence
inte
rval
tim
e (s
ec) f
or fo
o
minimum
University of Maryland8
Example Continued
3
2
2
)1(T
op
ppzn
3
2
2).000250(.2
)2.1(2.300(1.96)
n
686,170001.
235.2 32
n
T = 300, z = 1.96, p = .2, o = .000250
University of Maryland9
Analytical Model Prediction
9002400
39005400
69008400
990011400
1290014400
1590017400
1890020400
2190023400
2490026400
2790029400
3090032400
3390035400
3690038400
3990041400
4290044400
45
50
55
60
65
70
75
total # samples taken during program execution
pred
icte
d 95
% c
onfid
ence
inte
rval
tim
e (s
ec) f
or fo
o
minimum
University of Maryland10
Simulation
• 1,000,000 int array• int=300 µsec of exec• 200,000, p=.2, foo• Shuffle• Draw rnd sample 1000x
– 900, then every 1,500 up to 44,400
• Sample rate 3/sec – 148/sec
• Assess 250 µsec/sample
array shuffled sample
ss
ssss
sss
s
University of Maryland11
Simulation Results
9002400
39005400
69008400
990011400
1290014400
1590017400
1890020400
2190023400
2490026400
2790029400
3090032400
3390035400
3690038400
3990041400
4290044400
45
50
55
60
65
70
75
total # samples taken during simulated program execution
time
(sec
) cal
cula
ted
for f
oo
University of Maryland12
Signal Handler
Experiment
• Measured program – executes ≈ 300 sec– 1,000,000 function calls– 300 µsec functions– Compiled with -g
• Tool– Forks measured program– Initialization, signal
handler, close-out– 23 different sample
rates: 3/sec to 166/sec
s
s
s
s
s
s
Setup
Close-out
University of Maryland13
Experimental Results
896.55
2394.03
3939.15
5450.33
6978.75
8583.46999999993
10024.85
11583.27
13107.69
14370.16
15901.25
17795.1499999999
18926.7
20204.71
21674.22
23368.87
25354.52
27708.6299999999
32019.37
35464.2
41090.75
44105.58
51774.8645
50
55
60
65
70
75
total # samples taken during program execution
time
(sec
) cal
cula
ted
for f
oo
University of Maryland14
Combined Results90
090
224
0024
0239
0039
0254
0054
0269
0069
0284
0084
0299
0099
0211
400
1140
212
900
1290
214
400
1440
215
900
1590
217
400
1740
218
900
1890
220
400
2040
221
900
2190
223
400
2340
224
900
2490
227
900
2790
232
400
3240
235
400
3540
241
400
4140
244
400
4440
2
45.00
50.00
55.00
60.00
65.00
70.00
75.00Series11
simulation
Series1
total # samples taken during program execution
time(
sec)
cal
cula
ted
for f
oo
University of Maryland15
Experiments with SPEC
• Omnetpp– Runtime ≈ 340 sec– 115 runs at 2 samples/sec to establish “truth” – 10 runs at 28 different sample rates (hpcrun 4.9.9)– Use Mean Absolute Percent Error (MAPE) to determine
closest sample set to “truth”• Bzip2– Similar experiment procedure– Runtime ≈ 90 sec– Look at functions 1-4
University of Maryland16
Omnetpp Analysis
0 20,000 40,000 60,000 80,000 100,000 120,000 140,00044.7
49.7
54.7
59.7
64.7
Total Samples
Exec
ution
Tim
e
0 20,000 40,000 60,000 80,000 100,000 120,000 140,0000.1385
0.1485
0.1585
0.1685
0.1785
Total Samples
% E
xecu
tion
Tim
e
Distribution of 10 runs at 28 different sampling intervals - cMessageHeap::shiftup(int)
University of Maryland17
Omnetpp Analysis
16653300
49196477
969310946
1252414162
1544216289
1686617768
1902519995
2146122515
2446726116
2811631264
3432837518
4228248325
5680368581
84739
1146350
0.01
0.02
0.03
0.04
0.05
0.06
Total Samples
Mea
n Ab
solu
te %
Err
or
T=343, z=1.96, p=.1585, o=.000068n=50,623
University of Maryland18
Bzip2 Analysis
0 5,000 10,000 15,000 20,000 25,000 30,0000.105
0.115
0.125
0.135
0.145
0.155
0.165
0.175
0.185
mainGtU – distribution of 10 runs at 28 different sampling intervals
Total Samples
% E
xecu
tion
Tim
e
University of Maryland19
Bzip2 Analysis
0 5,000 10,000 15,000 20,000 25,000 30,0000.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
Comparison of 2nd – 4th most computationally expensive functions at se-lected sample intervals
mainGtU
BZ2_decompress
BZ2_compressBlock
Total Samples
% E
xecu
tion
Tim
e
University of Maryland20
Bzip2 Analysis
431853
12621674
24932808
32373629
39614213
43984597
48435122
54405805
62326708
72557916
87089694
1090612462
1451917446
2183729198
0.00000
0.02000
0.04000
0.06000
0.08000
0.10000
0.12000mainGtUBZ2_blockSort
Total Samples
Mea
n Ab
solu
te %
Err
or
T=87, z=1.96, p=.385, o=.000050n=16,685
University of Maryland21
Some Concerns of Oversampling
• Tool performance variation• Misleading results– Functions heavily perturbed by sampling (SPEC
OMP examples)
University of Maryland22
Example Analytical Result
632 4424 8216 12008 15800 19592 23384 27176 30968 34760 38552 42344 46136 49928 53720 57512 6130460.00
61.00
62.00
63.00
64.00
65.00
66.00
67.00
68.00
Tool1Tool2
total # of samples
Best PossibleMeasurement Point
perf
orm
ance
(tim
e)
Published Result
University of Maryland23
Apsi Analysis
0 20,000 40,000 60,000 80,000 100,000 120,000 140,0000
0.02
0.04
0.06
0.08
0.1
0.12
3rd – 6th most expensive functions; 11 run sets; 4 core machine
radb3_ radf3_
radb2_ leapfr_.omp_fn.23
Total Samples(sample intervals of 1 s, 500 ms, 200 ms, 100 ms, 80 ms, 60 ms, 50 ms, 40 ms, 25 ms)
% E
xecu
tion
Tim
e
University of Maryland24
Fma3d Analysis
1,000 10,000 100,000 1,000,0000.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
2nd and 3rd most expensive functions ; 11 run sets; 4 core machine
scatter_element_nodal_forces.omp_fn.5
khplq_gradient_operator_
Total Samples - log scale(sample intervals of 1 s, 500 ms, 200 ms, 100 ms, 50 ms, 25 ms, 10 ms)
% E
xecu
tion
Tim
e
University of Maryland25
Future Work
• More experiments with additional sequential and parallel SPEC programs
• Overhead calculation process• Overhead function enhancement• Deliberate statistical profiling methodology
refinement
University of Maryland26
Conclusion
• Oversampling can generate misleading analysis
• Deliberate statistical profiling can lead to better results
• Questions??
University of Maryland27
Backup Slides
University of Maryland28
Determining Sample Size
• Sample size for determining proportions
– Jain: r is CI for p/100
– Lilja: r is CI for ci/p
2
2 )1(
r
ppzn
22 )1(
pr
ppzn
n
ppzp
)1(
University of Maryland29
Effective Sample Rates
0 50 100 150 200 250 300 350 400 450 5000
50
100
150
200
250
300
350
400
450
500
target sample rate
actu
al sa
mpl
e ra
te
University of Maryland30
Omnetpp “Truth”
660 670 680 690 700 710 7200.1185
0.1385
0.1585
0.1785
0.1985
University of Maryland31
Analytical Model
n
ppzpnopnt
)1(T)(
n
mp pnt mΤ)(
TTm no pnont T)(
pnopnt T)( n
pp )1(
How much execution time is attributable to foo?
University of Maryland32
Sample Size and Accuracy
90% 95% 99% 100% 100%0
5,00010,00015,00020,00025,00030,00035,00040,000
Sample sizes required for +/- 1% accuracy
0.50.40.30.20.10.050.01
confidence levels
sam
ple
size
90% 95% 99% 100% 100%0
500,0001,000,0001,500,0002,000,0002,500,0003,000,0003,500,0004,000,000
Sample sizes required for +/- .1% accuracy
0.50.40.30.20.10.050.01
confidence levels
sam
ple
size
90% 95% 99% 100% 100%0
50,000,000100,000,000150,000,000200,000,000250,000,000300,000,000350,000,000400,000,000
Sample sizes required for +/- .01% accuracy
0.50.40.30.20.10.050.01
confidence levels
sam
ple
size
One order of magnitude accuracy change
Two orders of magnitude sample size
change
University of Maryland33
Value of p(1-p)
0.010.04
0.07 0.10.13
0.160.19
0.220.25
0.28
0.3100000000000010.34
0.37 0.40.43
0.460.49
0.520.55
0.58
0.610000000000001
0.640000000000002
0.670000000000002
0.700000000000001
0.730000000000001
0.7600000000000020.79
0.820000000000001
0.8500000000000010.88
0.91
0.940000000000001
0.9700000000000010
0.05
0.1
0.15
0.2
0.25
value of p
n
ppzpnopnt
)1(T)(
University of Maryland34
Mathematical Model
n
ppzpnopnt
)1(T)(
3
2
2
)1(T
op
ppzn
32
)1(T)(
n
ppzopnt
University of Maryland35
Sample Size and Accuracy cont
0.010.05 0.1
0.15 0.20.25 0.3
0.35 0.40.45 0.5
0.55
0.600000000...
0.650000000...
0.700000000...
0.750000000... 0.8
0.850000000... 0.9
0.950000000...0.99
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
99%95%90%
p values
sam
ple
coun
t
University of Maryland36
Another Look at Statistical Results
9002400
39005400
69008400
990011400
1290014400
1590017400
1890020400
2190023400
2490026400
2790029400
3090032400
3390035400
3690038400
3990041400
4290044400
0
0.5
1
1.5
2
2.5
3
9002400
39005400
69008400
990011400
1290014400
1590017400
1890020400
2190023400
2490026400
2790029400
3090032400
3390035400
3690038400
3990041400
4290044400
0100200300400500600700800900
1000
Chart Title
University of Maryland37
Sample of Sampling Practices
• 100 samples/sec [gprof, XProfiler]• 200 samples/sec [T09]• 1000 samples/sec [Intel VTune]• 5200 samples/sec [DCPI, A91]• 10,000 samples/sec [A05]• 2.5% all memory ops [O05]• 15 sec CPU, 10 sec mem analysis [R08]• 1,000,000 mem accesses, skip 9,000,000 [W08]
University of Maryland38
Current Practice
“To ensure good statistical coverage of profiled code, one must collect a large number of samples, either by measuring over a long interval, or by using a high sample rate.”
“Volume and accuracy are antithetical”
University of Maryland39
References• A91 – Anderson, Berc, Dean, Ghemawat, …• A91 – Andersland, Casavant• A05 – Azimi, Stumm, Wisniewski• K71 – Knuth• K05 – Kumar, Childers, Soffa• L07 – Lahiri, Chatterjee, Maiti• M92 – Malony, Reed, Wijshoff• M07 – Malony, Shende, Morris, Wolf• N04 – Najafzadeh, Chaiken• O05 – Odom, Hollingsworth, DeRose, Ekanadham, …• P95 – Miller, Callaghan, Cargille, Hollingsworth, …• R08 - Reiss• T09 – Tallent, Mellor-Crummey, Fagan• V99 – Vetter, Reed• W47 – Wald• W08 – Weinberg, Snavely