Upload
rabbit
View
244
Download
0
Embed Size (px)
Citation preview
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
1/66
1
SummarizingPerformanceData
ConfidenceIntervals
ImportantEasytoDifficultWarning:somemathematicalcontent
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
2/66
Contents
1. Summarizeddata2. ConfidenceIntervals
3. IndependenceAssumption4. PredictionIntervals
5. WhichSummarizationtoUse?
2
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
3/66
3
1 SummarizingPerformanceData
Howdoyouquantify:CentralvalueDispersion(Variability)
old new
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
4/66
4
Histogramisoneanswer
old new
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
5/66
5
ECDFalloweasycomparison
oldnew
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
6/66
6
SummarizedMeasures
Median,QuantilesMedianQuartiles
PquantilesMeanandstandarddeviation
Mean
Standarddeviation
Whatistheinterpretationofstandarddeviation?
A:ifdataisnormallydistributed,with95%probability,anewdatasampleliesintheinterval
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
7/66
Example
7
mean and standard deviationquantiles
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
8/66
8
CoefficientofVariationSummarizes
VariabilityScalefreeSecondorder
Foradatasetwithnsamples
Exponential distribution:CoV =1
What does CoV =0mean ?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
9/66
LorenzCurve Gapis anAlternativetoCoV
AlternativetoCoV
Foradatasetwithnsamples
Scalefree,indexofunfairness
9
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
10/66
Jains Fairness Indexis anAlternativetoCoV
Quantifiesfairnessofx;
Rangesfrom1:allxi equal1/n:maximumunfairness
Fairnessandvariabilityaretwosidesofthesamecoin
10
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
11/66
LorenzCurve
Oldcode,newcode:isJFIlarger?Gap?Ginisindexisalsoused;Def:2xareabetweendiagonalandLorenzcurve
MoreorlessequivalenttoLorenzcurvegap11
Lorenz Curve gap
Perfect equality (fairness)
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
12/66
12
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
13/66
WhichSummarizationShouldOneUse?
Thereare(too)manysyntheticindicestochoosefromTraditionalmeasures inengineeringarestandarddeviation,meanandCoV
Traditionalmeasures in
computer
science
aremeanandJFIJFIis equivalent toCoVIneconomy,gapandGinis index(avariantofLorenzcurve gap)
Statisticians like medians andquantiles(robust tostatistical assumptions)Wewillcomebacktotheissueafterdiscussingconfidenceintervals
13
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
14/66
14
2.ConfidenceInterval
DonotconfusewithpredictionintervalQuantifiesuncertaintyaboutanestimation
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
15/66
15
mean and standard deviationquantiles
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
16/66
16
ConfidenceIntervalsforMeanofDifference
Meanreduction=
0isoutsidetheconfidenceintervalsformeanandformedianConfidenceintervalformedian
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
17/66
17
ComputingConfidenceIntervals
Thisissimpleifwecanassumethatthedatacomesfromaniidmodel
IndependentIdenticallyDistributed
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
18/66
18
CIformedian
IsthesimplestofallRobust:alwaystrueprovidediidassumptionholds
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
19/66
19
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
20/66
20
ConfidenceIntervalforMedian,level95%
n=31
n=32
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
21/66
21
Examplen ,confidenceintervalfor
median
Themedianestimateis Confidencelevel95% 50 9.8 40
51 9.8 60aconfidenceintervalforthemedianis
; Confidencelevel99% 50 12.8 37 51 12.8 64
aconfidenceintervalforthemediais ;
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
22/66
22
CIformeanandStandardDeviation
Thisisanothermethod,mostcommonlyusedmethodButrequiressomeassumptionstohold,maybemisleadingiftheydonothold
Thereisnoexacttheoremasformedianandquantiles,butthereareasymptoticresultsandaheuristic.
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
23/66
23
CIformean,asymptoticcase
Ifcentrallimittheoremholds(inpractice:n islargeanddistributionisnotwild)
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
24/66
24
Example
n=100;95%confidencelevel
CIformean:
amplitudeofCIdecreasesin
comparetopredictioninterval
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
25/66
25
NormalCase
Assumedatacomesfromaniid+normal
distributionUsefulforverysmalldatasamples(n
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
26/66
26
Example
n=100;95%confidencelevelCIformean:
CIforstandarddeviation:
sameasbeforeexceptsinsteadof forallninsteadof1.98forn=100
Inpracticeboth(normalcaseandlargenasymptotic)arethesameifn>30Butlargenasymptoticdoesnotrequirenormalassumption
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
27/66
27
Tablesin [WeberTables]
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
28/66
28
StandardDeviation:norn1?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
29/66
29
BootstrapPercentileMethod
Aheuristicthatisrobust(requiresonlyiidassumption)Butbecarefulwithheavytail,seenext
buttendstounderestimateCI
SimpletoimplementwithacomputerIdea:usetheempiricaldistributioninplaceofthetheoretical(unknown)distribution
Forexample,withconfidencelevel=95%:thedatasetisS=Dor=1tor=999
(replayexperiment)Drawn
bootstrapreplicateswithreplacement
fromSComputesamplemeanTr
Bootstrappercentileestimateis(T(25),T(975))
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
30/66
30
Example:CompilerOptions
Doesdatalooknormal?No
Methods2.3.1and2.3.2givesameresult(n>30)Method2.3.3(Bootstrap)givessameresult
=> Asymptoticassumptionvalid
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
31/66
ConfidenceIntervalforFairnessIndex
Usebootstrapifdataisiid
31
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
32/66
32
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
33/66
We testasystem10000timeforfailures
andfind 200
failures:
give a
95%
confidence
interval forthefailure probability .
33
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
34/66
We testasystem10000timeforfailures
andfind 200
failures:
give a
95%
confidence
interval forthefailure probability .
Let 0 or1 (failure /success); Sowe areestimating themean.Theasymptotic theoryapplies (noheavy tail)
0.02
1
1
1 0.02 0.98 0.02 0.02 0.14ConfidenceInterval: 0.02 0.003 atlevel 0.95
34
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
35/66
We testasystem10timeforfailures and
find 0failure:
give a
95%
confidence
interval
forthefailure probability .
1. [0;0]2. [0;0.1]
3. [0;0.11]4. [0;0.21]5. [0;0.31]
35
f d l f b b l
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
36/66
ConfidenceIntervalforSuccessProbability
Problemstatement:want toestimate proba offailure;observen outcomes;nofailure;confidenceinterval?Example:we testasystem10timeforfailures andfind 0failure:give a95%
confidenceintervalforthefailureprobability.Isthisaconfidenceintervalforthemean?(explainwhy)Thegeneral theory does notgive goodresults when mean is very small
36
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
37/66
37
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
38/66
38
We test a system 10000 time for failures and find 200 failures: give
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
39/66
We testasystem10 000timeforfailures andfind 200failures:give
a95%confidenceinterval forthefailure probability .
Applyformula2.29( 200 6and 60.02 1.9610000 200 1 0.02 0.02 1.9610000 10 2 0.02 0.003
39
T k H M
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
40/66
40
TakeHomeMessage
Confidenceintervalformedian (orotherquantiles)iseasytogetfromtheBinomialdistributionRequiresiid
NootherassumptionConfidenceintervalforthemean
Requiresiid
AndEitherifdatasampleisnormalandnissmallOrdatasampleisnotwildandnislargeenough
TheboostrapismorerobustandmoregeneralbutismorethanasimpleformulatoapplyConfidenceintervalforsuccessprobabilityrequiresspecialattentionwhensuccessorfailureisrare
Toweneedtoverify
theassumptions
3 The Independence Assumption
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
41/66
3.TheIndependenceAssumption
41
ConfidenceIntervalsrequirethatwecanassumethatthedatacomesfromaniidmodel
IndependentIdenticallyDistributed
HowdoIknowifthisistrue?Controlledexperiments:drawfactorsrandomlywithreplacementSimulation:independentreplications(withrandomseeds)Else:wedonotknowinsomecaseswewillhavemethodsfortimeseries
What does independence mean ?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
42/66
42
Whatdoesindependencemean?
Example
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
43/66
Example
Pretenddataisiid:CIformeanis[69;
69.8]Isthisbiased?
43
data ACF
What happens if data is not iid ?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
44/66
Whathappensifdataisnotiid?
IfdataispositivelycorrelatedNeighbouringvalueslooksimilarFrequentinmeasurements
CIisunderestimated:thereislessinformationinthedatathanonethinks
44
4 Prediction Interval
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
45/66
45
4.PredictionInterval
CIformeanormediansummarizeCentralvalue+uncertainty aboutit
Predictionintervalsummarizesvariability ofdata
Prediction Interval based on Order Statistic
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
46/66
46
PredictionIntervalbasedonOrderStatistic
AssumedatacomesfromaniidmodelSimplestandmostrobustresult(notwellknown,though):
Prediction Interval for small n
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
47/66
47
PredictionIntervalforsmalln
Forn=39,[xmin,xmax]isapredictionintervalatlevel95%Forn18
Forn=10wehaveapredictioninterval[xmin,xmax]atlevel81%
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
48/66
PredictionIntervalbasedonMean
48
Prediction Interval based on Mean
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
49/66
Prediction Interval based onMean
Ifdatais notnormal,there is nogeneral result bootstrap canbeused
Ifdatais assumed normal,howdoCIformean andPredictionInterval based onmean compare?
49
Prediction Interval based on Mean
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
50/66
Prediction Interval based onMean
Ifdatais notnormal,there is nogeneral result bootstrap canbeused
Ifdatais assumed normal,howdoCIformean andPredictionInterval based onmean compare?
estimated mean estimated varianceCIformean at level 95% = .
Prediction interval at level 95% = 1.96
50
ReScaling
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
51/66
51
Re Scaling
Manyresultsaresimpleifthedataisnormal,orclosetoit(i.e.notwild).Animportantquestiontoaskis:canIchangethescale ofmydatatohaveitlookmorenormal.
Ex:logofthedatainsteadofthedataAgenerictransformationusedinstatisticsistheBoxCoxtransformation:
Continuousinss=0:logs=1:1/xs=1:identity
Prediction Intervals for File Transfer Times
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
52/66
52
ed ct o te a s o e a s e es
mean and
standard deviationon rescaled data
mean andstandard deviationorder statistic
WhichSummarizationShouldIUse?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
53/66
53
TwoissuesRobustnesstooutliersCompactness
QQplotiscommontoolforverifyingassumption
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
54/66
54
p y g p
NormalQqplotXaxis:standardnormalquantiles
Yaxis:Orderedstatisticofsample:
Ifdatacomesfromanormaldistribution,qqplotisclosetoastraightline(exceptforendpoints)
VisualinspectionisoftenenoughIfnotpossibleordoubtful,wewillusetestslater
QQPlotsofFileTransferTimes
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
55/66
55
TakeHomeMessage
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
56/66
56
g
Theinterpretationof asmeasureofvariabilityismeaningfulifthedataisnormal(orclosetonormal).Else,itismisleading.Thedatashouldbebestrescaled.
5.WhichSummarizationtoUse?
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
57/66
IssuesRobustnesstooutliersDistributionassumptions
57
ADistributionwithInfiniteVariance
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
58/66
58
True mean
True median
True mean
True median
CI based on std dv CI based on bootsrp
CI for median
OutlierinFileTransferTime
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
59/66
59
RobustnessofConf/PredictionIntervals
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
60/66
60
mean + std dev
CI for median geom mean
Outlier removed
Outlier present
Order stat
Based onmean + std dev
Based on
mean + std dev
+ re-scaling
FairnessIndices
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
61/66
ConfidenceIntervalsobtainedbyBootstrapHow?
JFIisverydependentononeoutlier
Asexpected,sinceJFIisessentiallyCoV,i.e.standarddeviationGapissensitive,butless
Doesnotusesquaring;why?
61
Compactness
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
62/66
62
Ifnormalassumption(or,forCI;asymptoticregime)holds, and aremorecompacttwovaluesgiveboth:CIsatalllevels,predictionintervalsDerivedindices:CoV,JFI
Incontrast,CIsformediandoesnotgiveinformationonvariability
Predictionintervalbasedonorderstatisticisrobust(and,IMHO,best)
TakeHomeMessage
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
63/66
63
UsemethodsthatyouunderstandMeanandstandarddeviationmakesensewhendatasetsarenotwildClosetonormal,ornotheavytailedandlargedatasample
UsequantilesandorderstatisticsifyouhavethechoiceRescale
Questions
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
64/66
64
Questions
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
65/66
65
Questions
7/27/2019 SummarizingPerformanceData Confidence Intervals.pdf
66/66
66