ASTR633 Astrophysical Techniques STATISTICS PRIMER

Preview:

Citation preview

1

ASTR633AstrophysicalTechniques

STATISTICSPRIMER(originalnotesbyPatHenry,editedbyMikeLiuandJonathanWilliams)

Astronomerscannotavoidstatistics:

1. Wealwaysdealwithprobabilities:observingtimeislimited,sowewanttoobservejustlongenoughthatthereisahighprobabilitywehaveseenwhatwesought.Samplesizesareinevitablyfinite.Whatistheprobabilitythataparticularinterestingeffectisreal?

2. Noexperimentallydeterminedquantityisofuseunlessithasanerrorassociatedwithit.Weneedstatisticstocalculateerrors.

Herearesomeofthemostcommonsituationswhereastronomersusestatistics:

1. Detectionofasignal:isthegammarayburstvisibleintheoptical?HaveIdetectedanemissionline?

2. Aretwoquantitiescorrelated?Howsignificantly?3. Estimatetheparametersofamodel.Whataretheerrorsontheparameters?

Wasthemodelreasonableinthefirstplace?4. Comparisonofsampleswith

(a)thepredictionsofamodel(dotheyagree?)(b)eachother(aretheyfromthesamepopulation?)

Itgenerallycomesdowntocommonsense.

1. Ifitdoesn’tlookright,itprobablyisn’t.2. Therearelotsofwaystoscrewup,butonlyonewaytoberight.3. Mostresultsarenotrevolutionary.Beforeyoustartdraftingapressrelease,

makesureyouhaven'tmadeamistake...1.SampleandParentPopulationSupposewemakeNmeasurements,xi,ofaquantityx.(e.g.,multiplemeasurementsofastellarmagnitude,thedeclinationsofNstarsinthegalacticplane,etc.)TheseNmeasurementsarecalledasample.TheparentpopulationisahypotheticalinfinitesetofmeasurementsofwhichouroriginalNisassumedtobearandomsubset.Theparentpopulationisthe“truth”,whichwecanneverobtain.Thefundamentaltaskofstatisticsistoinferthepropertiesoftheparentpopulationfromthesample.(Note:thisistheso-called“frequentist”interpretationofstatistics.We’lldiscussthealternativeBayesianpointofviewlater.)

2

1.1ProbabilityDensityFunctionTheparentpopulationleadstotheprobabilitydensityfunction,p(x),wherep(x)dxistheprobabilitythatanxiwillbeintherange[x,x+dx]

𝑝(𝑥)𝑑𝑥 = 𝐴(𝑥)𝐴)*)

Clearly,thetotalareaisunityprobability,i.e.∫ 𝑝(𝑥)∞

,- 𝑑𝑥 = 1Probabilitythata<x<b= ∫ 𝑝(𝑥)/

0 𝑑𝑥Probabilityispositive-definite:p(x)>=0

Allquantitiesofinterestareobtainedfromintegralsofp(x).Ifxisadiscretevariable,thentheintegralsturnintosums.1.2PropertiesoftheParentPopulation

Location Mean:

𝜇 ≡ 3 𝑥𝑝(𝑥)𝑑𝑥∞

,∞

= 1𝑁5𝑥6

7

689

Median

𝜇9/;,suchthat12 = 3 𝑝(𝑥)𝑑𝑥

DE/F

,∞

= 3 𝑝(𝑥)𝑑𝑥∞

DE/F

5 𝑝(𝑥6)

DE/F

689

=5 𝑝(𝑥6)7

DE/F

Mode

𝜇G0H,suchthat𝑃(𝜇G0H) ≥ 𝑃(𝑥 ≠ 𝜇G0H)

3

ForasymmetricalPDF,these3areusuallyequal.ForanasymmetricalPDF,themedianismorestablethanthemean.Asinglebadpointcanbiasthemeanbyalargeamountbuthardlychangethemedian.(Thisisanexampleofa“robust”statistic.)

However,themedianisslightlynoisier(or“lessefficient”)thanthemean,withavariancethatisp/2=1.57xlarger(forlargeN).

Width Variance

𝜎; ≡ 3(𝑥 − 𝜇);𝑝(𝑥)𝑑𝑥∞

,∞

= N 3 𝑥;𝑝(𝑥)𝑑𝑥∞

,∞

O −𝜇;

≡ 1𝑁5

(𝑥6 − 𝜇);7

689

= 1𝑁5𝑥6;

7

689

−𝜇;

Standarddeviation𝜎 ≡ (variance)9/;

Moments

𝜇U ≡ 3(𝑥 − 𝜇)U𝑝(𝑥)𝑑𝑥∞

,∞

=1𝑁5

(𝑥6 − 𝜇)U7

689

μ0=1μ1=0μ2=σ2skewness≡ 𝛽9 ≡

DWDFW/F

deviationfromsymmetry=0forsymmetric,>0fortailextendingtopositive,andviceversa.

kurtosis≡ 𝛽; ≡

DXDFF− 3

degreeof"peakiness",wherethe-3makesthekurtosisofaGaussian=0

4

1.3EstimatingPropertiesoftheParentPopulationfromaSampleSampleMean

�̅� ≡ 1𝑁[5𝑥6

7\

689

• Is�̅�agoodestimatorofμ?Whatisitsaveragevalue?

< �̅� >= 3 �̅�-

,-𝑝(𝑥)𝑑𝑥 = 3 _

1𝑁5𝑥6

7

689

`-

,-𝑝(𝑥)𝑑𝑥 =

1𝑁53 𝑥6𝑝(𝑥)𝑑𝑥

-

,-=

1𝑁5𝜇 = 𝜇

7

689

7

689

Thus�̅�isanunbiasedestimatorofμ.Onaverage,itgetstherightanswer.Whatisthevarianceof�̅�?

𝑉𝑎𝑟(𝑥) = 𝑉𝑎𝑟 N1𝑁[5𝑥6

7\

689

O = 1𝑁[;

5𝑉𝑎𝑟(𝑥6)7\

689

= 1𝑁[;

5𝜎;7\

689

= 𝜎;

𝑁[

Knownasthe“standarddeviationofthemean”orthe“standarderror”.SampleVariance

𝑠; ≡ 1

𝑁 − 15(𝑥6 − �̅�); = 1

𝑁 − 15𝑥6;7

689

7

689

−𝑁

𝑁 − 1 �̅�;

Notethe1/(N-1)insteadof1/N.Thisisbecausetheexpectedvalueofs2isσ2andisthereforeanunbiasedestimator(leftasanexerciseforthereader...)

5

2.ErrorsNomeasurementofxiisinfinitelyprecise.Ithasanerrorassociatedwithit.Whataretheerrorsonvariouscomputedquantities?2.1AnalyticalconsiderationsConsidersomefunctionofuandv=f(u,v).Thepropagationoferrorscanbedeterminedasfollows: 𝑓(𝑢, 𝑣) = 𝑓(0,0) + 𝑢 jk

jl+ 𝑣 jk

jm+ ℎ𝑖𝑔ℎ𝑒𝑟𝑜𝑟𝑑𝑒𝑟𝑡𝑒𝑟𝑚𝑠

𝑓 −𝑓̅ = (𝑢 − 𝑢u) jk

jl+(𝑣 − �̅�) jk

jm+ ℎ𝑖𝑔ℎ𝑒𝑟𝑜𝑟𝑑𝑒𝑟𝑡𝑒𝑟𝑚𝑠,𝑤ℎ𝑒𝑟𝑒𝑓̅ = 𝑓(𝑢u, �̅�)

𝜎k; = lim7→-

97∑ (𝑓 − 𝑓̅); ≈ lim

7→-

97∑ |(𝑢6 − 𝑢u)

jkjl+(𝑣6 − �̅�)

jkjm};

𝜎k; = lim7→-

97∑ ~(𝑢6 − 𝑢u); �

jkjl�;+(𝑣6 − �̅�); �

jkjm�;+ 2 �jk

jl� �jk

jm� (𝑢6 − 𝑢u)(𝑣6 − �̅�)�

𝜎k; = �𝜕𝑓𝜕𝑢�

;

𝜎l; +�𝜕𝑓𝜕𝑣�

;

𝜎m; + 2 �𝜕𝑓𝜕𝑢� �

𝜕𝑓𝜕𝑣�𝜎lm

wherewehavedefinedthecovariance

𝜎lm = lim7→-

1𝑁5(𝑢6 − 𝑢u)(𝑣6 − �̅�)

Note:-Expansiontofirstorderonly,soonlytruefor“small”errors(e.g.σu/u~σv/v~10%),i.e.intheregimeof1storderTaylorseries.-Equationsaresimilarifmorethan2variablesareinvolved.-Firsttwotermsdominatesincepositivedefinite,whilethe3rdterm(covariance)canhavesomecancellation,asitcanbenegative.Itiszeroforuncorrelatedu&v(whichisoftenthecase).SeeProblemSet#2forexamples

6

2.2Monte-CarloerrorpropagationEmpiricallydetermineerrorsbycreatingfakedatasets.Dothisifyouwishtoavoidmakinganyassumptionsabouttheunderlyingdistribution.

a) Iferrorsarewellcharacterized,jiggleeachdatapointusingGaussianrandomnumbers

b) Bootstrapanewsamplebypickingatrandomwithreplacement

Fitthemodeltothefakedataset.Dothismanytimesandmakeahistogramofthebest-fitparameters.Computethemean(ormedian)andstandarddeviationoftheparameters.Canreadilyextendtoanyfunctionoftheparameters.3.CommonlyusedProbabilityDensityFunctions(PDF)3.1UniformDistribution

𝑝(𝑥; 𝑎, 𝑏) = �1

𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏

0, 𝑥 < 𝑎, 𝑥 > 𝑏

𝜇 = /�0

;𝜎; = (/,0)F

9;

Thissimpledistributionisusedasatoolinstudiesofgeneralcontinuousdistributionandisparticularlyvaluableinnon-parametricstatistics,e.g.,generatingrandomvaluesfromaspecificPDFasexplainedinthefollowing:Foranygivenfunction,𝑦 = 𝐹(𝑥),thePDFsofxandyarerelatedby

|𝑝(𝑥)𝑑𝑥| = |𝑝(𝑦)𝑑𝑦|fundamentaltransformationlawofprobability

7

Nowconsiderthespecificcase:

𝑝(𝑥) = �1, 0 ≤ 𝑥 ≤ 10, 𝑥 < 0, 𝑥 > 1

then(for0<x<1),

𝑥 = 3 𝑝(𝑥′)𝑑𝑥′ = 3 𝑝(𝑦′)𝑑𝑦′�(H)

�(�)

H

Thisstatesthatthecumulativedistributionofp(y)isuniformlydistributed.Theutilityisbestshownbyexample(seeproblemset):ConsiderastellarIMF,𝜉(𝑀)~𝑀,;.��between1and100Msun.CalculatethecumulativePDF,

𝐶𝐷𝐹(𝑀) = ∫ 𝜉(𝑀)𝑑𝑀�9

∫ 𝜉(𝑀)𝑑𝑀9��9

= 𝐶(1 − 𝑀,9.��)

where𝐶[= (1 −100,9.��)]isaconstantthatnormalizestheCDFtounityat100Msun.Thishasauniformdistributionsowegeneratearandomsetofuniformlydistributednumbers{x0,x1,x2,…}andinverttoamassdistribution,{𝑀6} = (1 −{𝑥6}/𝐶),9/9.��

8

3.2BinomialDistributionRecallthenumberofdifferentwaysnitemscanbetakenxatatimeis(“nchoosex”):

�𝑛𝑥� ≡

𝑛!𝑥! (𝑛 − 𝑥)!

wheren!=n(n-1)(n-2)…1and0!=1

Consideranobservationwithonly2possibleoutcomes(e.g.redgalaxiesorbluegalaxies;planetdetectionornon-detection).Lettheprobabilityofobtainingoneoutcome(redgalaxy,planetdetection=“success”)bep,andtheprobabilityofobtainingtheotheroutcome(bluegalaxy,planetnon-detection=“failure)beq=1-p.Theprobabilityofobtainingxsuccessesinnobservationsis=(#ofwaystogettoxsuccesses)x(probabilityofonesuchsetofxsuccesses)

𝑓(𝑥; 𝑛, 𝑝, 𝑞) ≡𝑛!

𝑥! (𝑛 − 𝑥)! 𝑝H𝑞U,H

Thisdoesindeedaddupto1,asitshould:

5𝑛!

𝑥! (𝑛 − 𝑥)! 𝑝H𝑞U,H

U

H8�

= (𝑝 + 𝑞)U = 1U = 1

Mean

𝜇 ≡ 3 𝑥𝑓(𝑥)𝑑𝑥-

,-¡⎯⎯£5 ~𝑥

𝑛!𝑥!(𝑛 − 𝑥)!𝑝

H𝑞U,H�7

H8�

𝜇 = 5𝑛!

𝑥!(𝑛 − 𝑥)!~𝑝𝜕𝜕𝑝𝑝

H� 𝑞U,H = 𝑝𝜕𝜕𝑝 �5 ~

𝑛!𝑥!(𝑛 − 𝑥)!𝑝

H𝑞U,H�7

H8�

¤7

H8�

𝜇 = 𝑝𝜕𝜕𝑝

(𝑝 + 𝑞)U= 𝑝𝑛(𝑝 + 𝑞)U,9 = 𝑛𝑝

9

Variance

𝜎; ≡ 3 (𝑥 − 𝜇);𝑓(𝑥)𝑑𝑥-

,-¡⎯⎯£5~𝑥;

𝑛!𝑥!(𝑛 − 𝑥)!𝑝

H𝑞U,H� − 𝜇;7

H8�

¡⎯⎯£�𝑝

𝜕𝜕𝑝� �𝑝

𝜕𝜕𝑝�5~𝑥

𝑛!𝑥!(𝑛 − 𝑥)!𝑝

H𝑞U,H�7

H8�

−𝜇;

¡⎯⎯£�𝑝

𝜕𝜕𝑝�

[𝑝𝑛(𝑝 + 𝑞)U,9] −𝜇;

¡⎯⎯£ 𝑝𝑛 +𝑝;𝑛(𝑛 − 1) −𝑛𝑝; = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞

E.g.,supposewerolltendice.Whatistheprobabilitythatxdicehavelandwiththe1up?Ifwethrowonedie,theprobabilityoflandingwith1upisp=1/6.Ifwethrow10dice,theprobabilityforxofthemlandingwith1upisgivenbythebinomialdistributionwithn=10andp=1/6:

𝑝 �𝑥; 𝑛 = 10, 𝑝 = 9¥, 𝑞 = �

¥� =

10!𝑥! (10 − 𝑥)!�

16�

H

�56�

9�,H

𝜇 = 𝑛𝑝 = 10 ×1 6© = 1.67𝜎 = «𝑛𝑝𝑞 = 10 ×1 6© ×5 6© = 1.181

3.3PoissonDistributionTheBinomialdistributiongetshardtoevaluateforlargen(becauseofthefactorial)andofteninsuchexperiments,neitherthenumberofpossibleeventsnnortheprobabilitypisknown.Weneedanexpressionthattellsusaboutthestatisticsofhavingdetectedanaveragenumberofeventspertimeinterval(μ=np).

Example:youareusingaGeigercountertomeasuretheemissionsfromablockofradioactivematerial.Youdon’tknowthetotalnumberofatoms(=n,thenumberoftrials)orthedecayprobability(=p),butyoudomeasurethemeancountrateμ.Youwanttoknowtheprobabilitydistributionassociatedwithμ.

10

Lettheaveragecountrateatwhichphotonsarrivebeλpersecond.Let𝑃(𝑥, 𝜆𝑡)betheprobabilityofxphotonsarrivingduringanintervalt.Thentheprobabilityof1photonarrivingindtis

𝑃(1, 𝜆𝑡) = 𝜆𝑑𝑡

forverysmalldt.Theprobabilityof>=2arrivingisnegligiblysmallifdtissmallenough.So:

𝑝(0, 𝜆𝑡) = 1– 𝑃(1, 𝜆𝑡)– 𝑃(2, 𝜆𝑡)– 𝑃(3, 𝜆𝑡) −… .= 1– 𝜆𝑡

Nowconsideranarbitrarynumberofcountsintimeinterval(t+dt),whichcanbewrittenas2terms,basedonwhathappenedasthe“last”event(photonarrivedornophotonarrived):

𝑃°𝑥, 𝜆(𝑡 + 𝑑𝑡)± = 𝑃(𝑥 − 1, 𝜆𝑡)𝑃(1, 𝜆𝑑𝑡) + 𝑃(𝑥, 𝜆𝑡)𝑃(0, 𝜆𝑑𝑡)

𝑃(𝑥, 𝜆𝑡) +𝑑𝑃(𝑥, 𝜆𝑡)

𝑑𝑡 𝑑𝑡 = 𝑃(𝑥 − 1, 𝜆𝑡)𝜆𝑑𝑡 + 𝑃(𝑥, 𝜆𝑡)(1 − 𝜆𝑑𝑡)𝑑𝑃(𝑥, 𝜆𝑡)

𝑑𝑡 = 𝜆𝑃(𝑥 − 1, 𝜆𝑡) − 𝜆𝑃(𝑥, 𝜆𝑡)

Thesolutiontothisdifferentialequationis

𝑃(𝑥, 𝜆𝑡) =(𝜆𝑡)H

𝑥! 𝑒,²)

settingμ=λtgivesusthePoissondistribution

𝑝(𝑥; 𝜇) =𝜇H

𝑥! 𝑒,D

wherex=#ofevents(integernumber).μ=countrate.

Let’scheckthatthisisproperlynormalized:

5𝑓(𝑥; 𝜇) =∞

H8�

5 𝜇H

𝑥! 𝑒,D

H8�

= 𝑒,D 5𝜇H

𝑥!

H8�

= 𝑒,D𝑒D = 1

11

Mean

𝜇 ≡ 3 𝑥𝑓(𝑥)𝑑𝑥-

,-¡⎯⎯£5 ~𝑥

𝜇H

𝑥! 𝑒,D� = 𝑒,D 5 ~𝑥

𝜇H

𝑥! �7

H8�

7

H8�

= 𝑒,D �0 +5~𝜇H

(𝑥 − 1)!�7

H89

¤ = 𝑒,D �𝜇5³𝜇H,9

(𝑥 − 1)!´7

H89

¤ = 𝑒,D𝜇𝑒D = 𝜇

Variance

𝜎; ≡ 3 (𝑥 − 𝜇);𝑓(𝑥)𝑑𝑥-

,-¡⎯⎯£5~𝑥;

𝜇H

𝑥! 𝑒,D� − 𝜇;

7

H8�

¡⎯⎯£𝑒,D𝜇(𝜇𝑒D +𝑒D) −𝜇; = 𝜇

Famousresultthat𝜎 = √𝜇.e.g.,ifwedetectNphotons,thentheerroris±√𝑁.Notethatsomecareisrequiredforverylowcountrates,whereN=0canoccurcommonly.WhenN=0,itwouldbesillytosaytheuncertaintyisalso0.TheuncertaintyinNcountsisthesquarerootoftheexpectednumberofcounts,𝜎(𝑁) = √𝜇where𝜇 = ⟨𝑁⟩.3.4Gaussian(orNormal)DistributionTheNormaldistributionisanapproximationtothebinomialdistributionforthelimitingcasewherethenumberofpossibledifferentoutcomesislargeandtheprobabilityofsuccessforeachisfinitelylarge,sonp>>1.ItIsalsothelimitingcaseforthePoissondistributionwhenμbecomeslarge.

𝑝(𝑥; 𝜇, 𝜎) = 1

𝜎√2𝜋𝑒,(H,D)F;ºF

Itisthemostimportantdistributioninstatistics!

Itisleftasanexercisetothereadertoshowthattheexpressionisindeedproperlynormalized,thatthemeanisμandthevarianceisσ2.BinomialandPoissonPDFstendtowardGaussiansastheirmeanincreases(μ≥20).You’llseereferencesto“log-normaldistribution”:thisiswhenthelogofavariablehasaGaussian(akanormal)distribution.

12

3.4.1CentralLimitTheoremSupposethatnindependentrandomvariables,xi,ofunknownprobabilitydensityfunctionareidenticallydistributedwiththesamemeanμandvariance𝜎;(bothfinite).Asnbecomeslarge,thedistributionof�̅� = 9

U∑ 𝑥6 tendstoaGaussiandistributionwithmeanμ

andvariance𝜎;/𝑛.Alsoknownasthelawoflargenumbers,itallowsforquantitativeprobabilitiestobeestimatedinexperimentalsituationsinvolvinganaverage.3.4.2ConfidenceLimitsoftheGaussianDistributionTheprobabilitythatameasurementwillfallwithin±nσofthemeanis

𝑃(𝑛, 𝜎) = 3 𝑝(𝑥; 𝜇, 𝜎)𝑑𝑥

D�Uº

D,Uº

n P(-nσ<x<nσ)1 68.27%2 95.45%3 99.73%4 99.9937%5 99.999943%

Inotherwords,foraGaussian,100±20means:-Thereisa68%probabilitythat80≤μ≤120(with16%μislargerand16%μissmaller)-Thereisa95.5%chancethat60≤μ≤140-Thereisa99.7%chancethat40≤μ≤160Notethattheprobabilitytableisfora“2-tailed”probability,e.g.ifwewanttoknowtheprobabilityofanothertriallandingwithinagivenrange.Wealsocareabout“1-tailed”probability.

Forexample:whatisthechancethatthe100±20resultisconsistentwith0?Separationfrom0is(5×standarddeviation),sochanceis

1– 𝑃(5𝜎)

2 =1– 0.99999943

2 =5.7 × 10,½

2

Wesaythat100±20isa“5-sigmameasurement.”

13

100±50isa2σmeasurementanditisconsistentwith0.The2σ(95.5%confidence)upperlimitis100+(2x50)=200.Thismeansthatthemeasurementis≤200atthe97.7%confidencelevel.Byconvention(thoughthisisfield-dependent),measurementswhichare<5σaretreatedwithcaution,andthosewhichare<3σareseenasconsistentwithzero.Whysuchstringentlimitsas3-5σ?Becauseanyobserver

• Isbiased,e.g.terminatedobservationwhenexpectedresultwasfound.• Can’testimateσ,e.g.chosea“nicequietpiece”ofthedataora“source-freeregion”

togetthebackground.YoucansimilarlycalculateconfidenceintervalsfordifferentdistributionssuchasBinomialandPoisson.Example:Detectionofasourceandmeasurementofitsbrightness.Wemeasure101countsina1''radiusaperturecenteredonthesourceand1800backgroundcountsinanannulusrangingfrom1''to5''radius.(a)Howconfidentarewethatthereisasourcethere?Weneedtoassesshowdifferentthesourceisfromthebackground,i.e.isthesourcestatisticallydistinctfromjustfluctuationsinthebackgroundcounts?

Expectedbackgroundin1′′aperture

= 1800

𝜋[(5ÆÆ); − (1ÆÆ);]𝜋(1ÆÆ); =

180024

= 75counts𝜎/Ç(1ÆÆ) = √75 = 8.66fromPoissonstatistics.canusePoissonherebecausehave>20counts.Netcountsabovebackground=101–75=26

Significanceofdetectionabovebackground=26/8.66=3.0σConfidence=(99.73+0.27/2)=99.87%(extra0.135isbecauseit’ssingle-tailed)

(b)Howbrightisit?Totalflux=101 ±√101

14

Backgroundin1’ = 1800 ± √1800𝜋[(5ÆÆ); − (1ÆÆ);]𝜋

(1ÆÆ); =1800 ± 42.4

24 = 75 ± 1.8cts

Netflux=101–75=26Uncertaintyonnetflux=√101 +1.8; = 10.2(onlymarginallylargerb/cofuncertaintyinskybackground)Significanceofmeasurement=26/10.2=2.7σ

Itismoredifficulttomeasurethefluxthantodetermineexistence.E.g.measuringexistenceisonlya1-bitmeasurement(yesvsno),whereasmeasuringthefluxyouneedmoreinfo.3.4.3BivariateGaussianDistributionTestAsitsnamesuggests,it’sthejointGaussiandistributionoftwovariables.

𝑝°𝑥, 𝑦;𝜇H, 𝜇�, 𝜎H, 𝜎�, 𝜌± = 1

2𝜋𝜎H𝜎�(1 − 𝜌;)𝑒

,ËF;(9,ÌF)

where

𝑧; = (𝑥 − 𝜇H);

𝜎H;+

(𝑦 − 𝜇�);

𝜎�;−2𝜌(𝑥 − 𝜇H)(𝑦 − 𝜇�)

𝜎H𝜎�

andthePearsoncorrelation

𝜌 = 𝐶𝑜𝑟(𝑥, 𝑦) = 𝑉𝑎𝑟(𝑥, 𝑦)𝜎H𝜎�

𝑉𝑎𝑟(𝑥, 𝑦) = 𝜎H�; = 1𝑁5(𝑥6 − 𝜇H)(𝑦6 − 𝜇�)

7

689

Inmatrixnotation,

𝑝°𝑥, 𝑦;𝜇H, 𝜇�, 𝜎H, 𝜎�, 𝜌± = 1

2𝜋𝜎H𝜎�(1 − 𝜌;)exp �−

12𝐷

Î𝐶,9𝐷�

where

𝐷 =Ï𝑥 − 𝜇H𝑦 − 𝜇�

Ð , 𝐶 = Ï𝜎H; 𝜎H�;

𝜎H�; 𝜎�;Ð = "𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑚𝑎𝑡𝑟𝑖𝑥"

15

Shapelooksmorecircularforsmallρ,moreellipticalforlargeρ

Canreadilygeneralizeto>2variables4.MaximumLikelihoodofGaussianVariablesSupposewehaveNdatapoints𝑦6witherrors𝜎6 thatareindependentofothersandGaussiandistributedabout𝑦u,whichyoudonotknowbutwanttofindout.Theprobabilityofany𝑦6 is

𝑃(𝑦6)𝛥𝑦 = 1

𝜎6√2𝜋𝑒,(�Ô,�u)

F

;ºÔF 𝛥𝑦

TheprobabilityofallN𝑦6'sisjusttheproductoftheindividualprobabilities

𝑃(𝑦6)𝛥𝑦 = Ö𝑃(𝑦6)𝛥𝑦7

689

=Ö1

𝜎6√2𝜋𝑒,(�Ô,�u)

F

;ºÔF 𝛥𝑦

7

689

Definethelikelihoodasℒ ≡ −2 ln𝑃,whichforourcaseis:

ℒ = 5(𝑦6 − 𝑦u);

𝜎6;

7

689

+ 25lnØ𝜎6√2𝜋Ù7

689

− 2𝑁𝑙𝑛𝛥𝑦

Theprincipleofmaximumlikelihoodisthemostprobableestimateof𝑦uoccurswhenℒisminimized.

6 The Bivariate Normal Distribution

which is just the product of two independent normal PDFs. We can get someinsight into the form of this PDF by considering its contours, i.e., sets of pointsat which the PDF takes a constant value. These contours are described by anequation of the form

x2

σ2X

+y2

σ2Y

= constant,

and are ellipses whose two axes are horizontal and vertical.In the more general case where X and Y are dependent, a typical contour

is described byx2

σ2X

− 2ρxy

σXσY+

y2

σ2Y

= constant,

and is again an ellipse, but its axes are no longer horizontal and vertical. Figure4.11 illustrates the contours for two cases, one in which ρ is positive and one inwhich ρ is negative.

x

y

x

y

Figure 4.11: Contours of the bivariate normal PDF. The diagram on the left(respectively, right) corresponds to a case of positive (respectively, negative) cor-relation coefficient ρ.

Example 4.28. Suppose that X and Z are zero-mean jointly normal randomvariables, such that σ2

X = 4, σ2Z = 17/9, and E[XZ] = 2. We define a new random

variable Y = 2X − 3Z. We wish to determine the PDF of Y , the conditional PDFof X given Y , and the joint PDF of X and Y .

As noted earlier, a linear function of two jointly normal random variables isalso normal. Thus, Y is normal with variance

σ2Y = E

[(2X − 3Z)2

]= 4E[X2] + 9E[Z2] − 12E[XZ] = 4 · 4 + 9 · 17

9− 12 · 2 = 9.

Hence, Y has the normal PDF

fY (y) =1√

2π · 3e−y2/18.

16

4.1WeightedMeanThemostlikelyvalueof𝑦uistheminimumvalueofℒ

𝜕ℒ𝜕𝑦u

= 5𝜕𝜕𝑦u

7

689

³(𝑦6 −𝑦u);

𝜎6;´ = −25

(𝑦6 −𝑦u)𝜎6;

7

689

= 0

=>5𝑦6𝜎6;

7

689

= 𝑦u51𝜎6;

7

689

=>𝑦u = ∑ 𝑤6𝑦67689

∑ 𝑤67689

Û wheretheweights𝑤6 = 1 𝜎6;©

aka“inverse-varianceweighting”.Notethatforequalweights,𝜎6 = 𝜎,

𝑦u = 1𝑁5𝑦6

7

689

asbefore

Theerrorontheweightedmeancomesfromourpreviousresultonerrorpropagation.

𝜎�u; = 5�𝜕𝑦u𝜕𝑦6

�;7

689

𝜎6; = 5𝜎6; Ï𝜕𝜕𝑦6

³∑ 𝑤6𝑦67689

∑ 𝑤67689

´Ð;7

689

=5𝜎6;

(∑ 𝑤67689 );

7

689

_𝜕𝜕𝑦6

Ý5𝑤6𝑦6

7

689

Þ`

;

=5𝜎6;𝑤6;

(∑ 𝑤67689 );

7

689

= ∑ 𝑤67689

(∑ 𝑤67689 );Û = 1 ∑ 𝑤67

689©

Inotherwords,

1𝜎�u;

= 51𝜎6;

7

689

if𝜎6 = 𝜎(dataallhavesameuncertainties),then𝜎�u = 𝜎/√𝑁asbefore.

17

4.2LinearRegressionorWeightedLeastSquaresNowsupposewehavemeasured2quantitiesinpairs,{𝑥6, 𝑦6}.Wethinkthat𝑦 = 𝑎 + 𝑏𝑥.Howdowederiveaandbfromourdata?Note:thisformisnotasrestrictiveasitseems.Wecantransformlotsofrelationsintothisform,e.g.𝑦 = 𝐴𝑥ß => log 𝑦 = log𝐴 + 𝛼 log 𝑥andthenperformanalysisonlogyandlogx.

ℒ = 25(𝑦6 − 𝑎 − 𝑏𝑥6);

2𝜎;

7

689

+ 25ln𝜎6√2𝜋7

689

− 2𝑁 ln ∆𝑦

𝜕ℒ𝜕𝑎 = 0 ⟹5

𝑦6𝜎6;= 𝑎5

1𝜎6;+ 𝑏5

𝑥6𝜎6;

𝜕ℒ𝜕𝑏 = 0 ⟹5

𝑥6𝑦6𝜎6;

= 𝑎5𝑥6𝜎6;+ 𝑏5

𝑥6;

𝜎6;

Twolinearequationsintwounknowns,aandb,withsolution

𝑎 = 1∆Ï5

𝑥6;

𝜎6;5

𝑦6𝜎6;−5

𝑥6𝜎6;5

𝑥6𝑦6𝜎6;

Ð , 𝑏 = 1∆ Ï5

1𝜎6;5

𝑥6𝑦6𝜎6;

−5𝑥6𝜎6;5

𝑦6𝜎6;Ð

wherethedenominator,

∆= 51𝜎6;5

𝑥6;

𝜎6;− Ï5

𝑥6𝜎6;Ð;

Theuncertaintiesinaandbfollowfromerrorpropagation,

𝜎0; = 5�𝜕𝑎𝜕𝑦6

�;

𝜎6; = 1∆5

𝑥6;

𝜎6;,𝜎/; = 5�

𝜕𝑏𝜕𝑦6

�;

𝜎6; = 1∆5

1𝜎6;

Note1:numpy.polyfitwithdeg=1doesthisNote2:don’tthrowawayupperlimits–theyhaveinformation.We'regettingaheadofourselvesbutmodernfolksuseaBayesianapproachtolinefittingthatfits“censored”data:Kelly2007,ApJ,665,1489

18

4.3LinearCorrelationIntheabsenceofanyhypothesis,anyknowledge,oranythingbettertodo,weoftencorrelateyiagainstxiinthehopeofdiscoveringsomeNewandUniversalTruth.Beforedoingso,youshouldask:-Doestheeyeseeanything?Ifnot,stopunlesstryingtodisprovesomehypothesis.-Istheapparentcorrelationduetoaselectioneffect?Commonmistakesarefluxlimits.-ApplytheRuleofThumb.Doesthecorrelationgoawayifyouplaceyourthumboversomeofthedata?UsethePearsoncorrelation,definedearlierforbivariateGaussiandistributions,

𝜌or𝑟 =𝜎H�;

𝜎H𝜎�

r=-1or+1meansdataperfectlyfitastraightline.

• “r”measuresthedegreeoflinearcorrelationbetween2variableswithoutknowledgeoftheerrors.Shouldnotbeusedfornon-linearrelationship.(Checkforthisbyplottingthedataandtakingalook!)

• Nodistinctionbetweenx&yintheformula.Eithercanbethedependent/independentvariable.

• Assumesvariablesareapproximatelynormallydistributed.• Tellsnothingabouttheslopeofthebestfittingline,onlythedegreeofcorrelation.

Forfinite-sizeddatasets,canhavefiniterevenforuncorrelateddata,andvice-versa,simplyduetotheuncertaintyin“r”.Canshowthat

𝜎ã = 1 − 𝑟;

√𝑁 − 1

Notethat𝜎ã CANNOTbeuseddirectlytoindicatesignificanceofacorrelationand/orwhetheroneobservedcorrelationissignificantlystrongerthananother.Itonlytellstheerroronthemeasurementofrinthissample.4.3.1Student'stdistributionInthecasewherex&yforma2-dimensionalGaussianabouttheirmeanvalues,thenwecantestwhethertheobserved“r”isconsistentwithaparentpopulationwithnocorrelation(r=0).Todoso,usethequantity

𝑡 ≡ 𝑟ä𝑁 − 21 − 𝑟;

19

whichisdistributedintheno-correlationcaseastheStudent’st-distributionwith𝜈 = 𝑁 −2degreesoffreedom:

𝑓(𝑡; 𝜈) = 𝛤(𝜈 + 12 )

√𝜈𝜋𝛤 �𝜈2� �1 +𝑡;𝜈 �

(ç�9) ;⁄

wherethegammafunctionisdefinedas

𝛤(𝑧) = 3 𝑡Ë,9𝑒)𝑑𝑡∞

andisjustthefactorialfunctionextendedtonon-integers.forintegerx,Γ(x+1)=x!

𝑓(𝑡; 𝜈)hasμ=0,𝜎; = 𝜈/(𝜈 − 2)forν>2

Inthiscase,wewanttheprobabilityforthe2-taileddistribution,∫ 𝑓(𝑡; 𝜈)𝑑𝑡)

,) .Ifwealreadyknewthesignofthecorrelation,wewoulduse1-tail∫ 𝑓(𝑡; 𝜈)𝑑𝑡)

,∞

FigurefromWikipediahttps://en.wikipedia.org/wiki/Student%27s_t-distributionpythonhasthiscodedupofcourse…Example:supposewefindr=0.5forN=10datapoints.

𝑡 = 0.5ä80.75 = 1.63

from scipy.stats import t c = t.cdf(1.63, 8) à 0.93=>7%chancethattwouldbehigherinaonesidedtestsig = 100*(1-2*(1-c)) à 86%significanceintwo-tailedtestStudentwasthepseudonymofW.S.Gossett(1876-1937).HewasachemistwhoworkedforGuinnessBreweryinDublin,Ireland,developedt-testtostudythequalityofbrewingingredients.Guinnessdidnotallowtheirchemiststopublishtheirfindings(toavoidtheircompetitorsfromlearningtheywereemployingstatisticians),hencethepseudonym.

20

4.3.2Fisherz-transformationWiththesameassumptionthatx&yaredrawnfromatwo-dimensionalGaussiandistribution,wecantestwhetherthedifferenceoftwononzero“r”valuesissignificantforN>10datasets.E.g.ifachangeinsomecontrolvariablesignificantlyaltersanexistingcorrelationbetweentwovariables.UseFisher’sz-transformation:

𝑧 = 9;𝑙𝑛 �9�ã

9,ã�=arctanh(r)

thisconvertsthePearsoncoefficienttoanapproximatelynormallydistributedvariable,z,withameanvalueof

𝑧̅ =12 ~𝑙𝑛 �

1 + 𝜌1 − 𝜌� +

𝜌𝑁 − 1�

andastandarddeviationof

𝜎Ë ≈1

√𝑁 − 3

wherer isthetruevalueofthecorrelationcoefficientfortheparentpopulation(tobetestedagainstthemeasured“r”).WecanusetheGaussianprobabilitytablesforthisandalsototestthesignificanceofadifferenceintwovaluesof“r”.Youcanalsoassumeaρ,createagaussianz,andinvert(=tanhz)tocreateconfidenceintervalsforrforlargesamples. 4.4Chi-squaredWehavediscussedprobabilitydistributionsandtheirstatistics.Nowwediscusswhetheraparticulardistributionandmodelactuallyfitthedata.Definea“badnessoffit”metric:

𝜒; = 5Ï𝑦6 − 𝑦°𝑥6, 𝑎ê±

𝜎êÐ

7

689

;

where𝑦°𝑥6, 𝑎ê±isageneralfunctionof𝑥6 withmodelparameters𝑎ê .

NoteforPoissonvariables,𝜎6 = ë𝑦°𝑥6, 𝑎ê±,butinordertogetconfidencesfrom𝜒;,we

need𝑦(𝑥6, 𝑎ê) > 20forall𝑥6 .Ifnotpossible,thenneedtobinupthedata(sumtheyi)forenoughxitomakeitso.

21

Procedurethenistoadjustmodelparameters,aj,untilthe𝜒;isminimized(i.e.leastbad)à𝜒G6U; .Thisthenmaximizesthelikelihood.#ofdegreesoffreedom=ν=(N–M),where

N=#ofdatapointsM=#ofmodelparameters

4.4.1Distributionofchi-squared

𝑝(𝜒;; 𝜈) = (𝜒;)

ç,;;

2ç;𝛤 �𝜈2�

e,ìF;

where𝜇 = 𝜈, 𝜎; = 2𝜈, 𝜈 = 𝑁 −𝑀.DistributiongoestoGaussianasNà∞

4.4.2GoodnessoffitForthebestfit,theexpectedvalueis

𝜒G6U; = (𝑁 −𝑀) ± «2(𝑁 −𝑀) = 𝜐 ± √2𝜐Soifitisn’t,weknowwehaveabadfit.Thiscancomefrom

• Wrongmodel:𝜒; ≫ 𝜈• Wrongmeasurementerrors:𝜒; ≪ 𝜈(toopessimistic),≫ 𝜈(toooptimistic)• Dataarenotnormallydistributed:𝜒; ≫ 𝜈

Thisisoftenwrittenas“reduced𝜒;”=𝜒ýG6U; = 𝜒G6U,ç; = 𝜒G6U;

𝜈©

𝜒ýG6U; = 1 ± ä2

𝑁 −𝑀

sothebestfitoccursaround𝜒ýG6U; ≈ 1,thoughthescatteraroundthisdependsonDOF.4.4.3.ConfidenceLimitsforGoodnessofFitWhatweneedistheprobabilityofobserving𝜒; > 𝜒G6U;

𝑃°> 𝜒G6U,"; ± = 1

2"/;Γ(𝜐/2)3(𝜒;)(",;)/;𝑒,$F/;

-

$%Ô&F

𝑑(𝜒;)

seescipy.stats.chi2anduseinthesamewayasforthestudent-tdistributionabove

22

4.4.4ConfidenceLimitsfortheFittedParameters(OBSOLETE)WetypicallywanttoidentifyaregionintheM-dimensionalspaceoftheparameters{aj}aboutthebestfitthatcontainsagivenpercentageofthetotalprobabilitydistribution,e.g.“thereisa99%chancethatthetrueparametervaluesareinthisregion.”Observergetstopicktheconfidencevalue,butcertainonesarecommon,e.g.,68%(1-sigma),95%(2sigma),etc.Notethatthesevaluesareaconventiontiedtoagaussiandistribution,eventhoughtheprobabilitydistributionoftheparametersareoftennotnormallydistributed.Conventionally,onechooses{aj}suchthat𝜒; = 𝜒G6U; +𝛥𝜒;(𝐶𝐿,#𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠).𝛥𝜒;isdistributedas𝜒;with𝜈 = 𝑀(=numberofparameters)degreesoffreedom,NOTwith𝜈 = 𝑁 −𝑀(=theactualdegreesoffreedomofthefit).Youcanmarginalizethistoasubset{aj}ofM’parameters.Then𝛥𝜒;isdistributedas𝜒;with𝜈 = 𝐼(=numberofinterestingparameters)degreesoffreedom.Example:M=1(i.e.onlyfitting1parameter).

Inthiscase,chi-squaredistributionforν=1isthesameasthesquareofasinglenormallydistributeddistribution(i.e.Gaussian).

𝛥𝜒; < 1occurs68.3%ofthetime(1sigmafornormaldistribution)𝛥𝜒; < 4occurs95.5%ofthetime(2sigma)𝛥𝜒; < 9occurs99.7%ofthetime(3sigma)

Example:M=2

1-sigmaCL:

fora1is[X1,X2]fora2is[Y1,Y2]

1-sigmaCLonbothparametersispurpleellipse,i.e.68.3%ofalldeterminationsofa1anda2shouldlieintheellipse.

CAUTION:verylimitedapplicationinrealastrophysicalsituations.See“DosandDon’tsofReducedChi-Squared”byAndraeetal.(arXiv:1012:3754).UseBayesianmethodsinstead.

23

4.4.5.SignificanceofAddingAnotherParameterWeoftenwanttoknowifitisnecessarytoaddanotherparametertoamodelwehavebeenfittingtoourdata.E.g.,mightbereddeninginMilkyWayandinthehostgalaxytoasupernova.𝜒;willofcoursebelowerbecausetheadditionalfreedomenablesustogetthemodelcurveclosertothedata.Butisthedecreasesignificant?Supposewehavetwomodelswith𝜒U;and𝜒G; andnandmDOF,respectively.Therearetwousefulstatisticstotestifthe𝜒;valuesbetweenthe2modelsaresignificantlydifferent.(1)TheFstatistic:

𝐹U,G ≡𝜒U; 𝑛©𝜒G; 𝑚©

followstheFprobabilitydistribution:

𝑝U,G(𝑥) = Γ �𝑛 + 𝑚2 �𝑛U/;𝑚G/;

Γ(𝑛/2)Γ(𝑚/2) 𝑥U;,9

(𝑚 + 𝑛𝑥)(U�G)/;

hasmeanandvariance

𝜇 = 𝑚

𝑚 − 2

𝜎; = 2𝑚;(𝑚 + 𝑛 − 2)𝑛(𝑚 − 2);(𝑚 − 4)

WewanttheprobabilityofobservingsuchalargeFvalue:

𝑃(> 𝐹; 𝑛,𝑚) = 3 𝑝U,G(𝑥)𝑑𝑥-

*

Caution:definitionofthestatisticdoesnotdistinguishbetweenexperiment“1”and“2”.Sowecanform2statistics,onethereciprocaloftheother(F12andF21).BotharedistributedaccordingtotheFdistribution.Typically,wetestboth,checkingthatF12isnottoolargeandF21isnottoosmall.Asusual,pythonisyourfriend;scipy.stats.f

CAVEAT:Twoimportantconditionsmustbesatisfiedtousethisstuff(Protassovetal2002,ApJ,571,545–readthesummarysection6)

24

1. Thetwomodelsyouarecomparingmustbe“nested”,i.e.allowedparametersvaluesofonemodelmustbeasubsetofthoseoftheothermodel.(SeealsoFreemanetal1999,APJ,524,53).e.g.Cannotcompareblackbodymodelwithsynchrotronemissionmodel,butcancomparegoodnessoffitforeachwith𝜒;fitting.

2. Zerovaluesofadditionalparametersmustnotbeontheboundaryofpossible

parameters.E.g.cannotcompare2pointsourceswith1,nordetectionofanemissionlineinaspectrum.Neithercaseallowsfornegativeflux.

Legitimateuses:-brokenpowerlawvssinglepowerlaw-non-solarvssolarabundance-comparingvariancesof2samples4.4.6.ProsandConsofChi-squaredPros:

• Mostpeoplehaveheardofthis,someevenaccepttheresultsJ• Sinceadditivebydefinition,differentsamplescanbetestedallatonce.• Automaticallygivesanestimateofwhethermodelisacceptable.

Cons:

• Datamustusuallybebinnedàlossofinformation.• Datamustbenormallydistributed.• Ifdatadonotagreewiththemodel,cannottellwhichdirectionisoff.• Cannotbeusedwithsmallsamples(<~20).• SeeAndraeetal.2010(arXiv:1012.3754)

5.RankTestsAgeneralsetoftestscomesfromreplacingthevaluesofNpairsofmeasurements(xi,yi)withtheirranks(Ri,Si).e.g.ifwehave4pairs(xi,yi)=(1,3)(5,0)(3,2)(4,1)

thenwehave(Ri,Si)=(1,4)(4,1)(2,3)(3,2)Forsimplicity,assumenoties.(SeeNumericalRecipesforthemoregeneralcase.)Whydothis?Becauseranksaredrawnfromauniformdistributionbetween1andN,witheachrankoccurringonlyonce(assumingnoties).Fromtheuniformdistribution: 𝑅u = 𝑆 ̅ = (𝑁 + 1) 2⁄ ,𝜎-; = 𝜎.; = (𝑁 − 1); 12⁄ ,etc.

25

Wehavethereforetransformedfromvariableswithunknowndistributionstooneswithknowndistributions.Thereissomelossofinformationinreplacingtheoriginalnumbersbytheirranks,butnotmuch.Andthestatisticsofranksaremorerobustthanstatisticsoftheoriginalvariables,justasthemedianismorerobustthanthemean(andslightlynoisier).5.2.1SpearmanRankCorrelationCoefficientDefinetobethelinearcorrelationcoefficientoftheranks:

𝑟. =∑ (𝑅6 − 𝑅u)6 (𝑆6 − 𝑆 ̅)

«∑ (𝑅6 − 𝑅u);6 «∑ (𝑆6 − 𝑆 ̅);6= 1 − 6

∑ (𝑅6 − 𝑆6);6

𝑁� − 𝑁

(laststepoccursifnoties,becausevaluesofRiandSiareknown)

Obviouslywhenx&yarecorrelated,R&Swillbetoo.Asbefore,−1 ≤ 𝑟. ≤ +1.Ahighvalueindicatessignificantcorrelation.Totestthelevelofsignificance,canactuallycalculatethedistributionexplicitlyforNsmallwhenR&Sareuncorrelated.IfN>50,cancompute

𝑡 ≡ 𝑟.ä𝑁 − 21 − 𝑟.;

whichisdistributedaccordingtoStudent’ststatistic,withN-2degreesoffreedom.scipy.stats.spearmanr 5.2.2Kolmogorov-Smirnov(K-S)TestUsedforunbinneddatathatareacontinuousfunctionofasinglevariable,i.e.alistofvalues,e.g.distributionofprotoplanetarydiskmasses.Determineswhetherasampleagreeswithafunction(“one-sided/sample”)orwhether2samplescomefromthesameparentpopulation(“two-sided/sample”).Pros:--nolossofinformation--canbeusedforverysmallsamplesCons:--cannotbeusedforparameterestimation

26

Thetestcomparesthecumulativedistributionofranks:

• Rankyoursampleinascendingorderofx.• CalculateSN(x),where

𝑆7(𝑥) = /

0𝑥 < 𝑥9

𝑟𝑁𝑥ã ≤ 𝑥 < 𝑥ã�9

1𝑥 ≥ 𝑥7

andNisthesizeofthesample.SN(x)isthefractionofthedatawithvalueslessthanx.

• Ifhavetwosamples,thencalculateSN1(x)andSN2(x)• Ifhaveafunction,thencalculate𝐹(𝑥) ≡ ∫ 𝑓(𝑦)𝑑𝑦H

,∞

• TheK-Sstatisticis

𝐷9,[601 = max,∞2H2∞

|𝑆7(𝑥) − 𝐹(𝑥)|𝐷;,[601 = max

,∞2H2∞3𝑆7E(𝑥) − 𝑆7F(𝑥)3

WhatmakestheK-Sstatisticusefulisthatitsdistributioninthecaseofdatadrawnfromthesame(unknown)distributioncanbecalculated.Andit'sallhere,readytoplugn'play:scipy.stats.kstest

27

6.PartingComments

• Don'thidedata.• Trytousedistribution-freetests.• Lotsofteststochoosefrom,butnotallareequallypowerfulforagivenapplication.• Don’tgettooenamored/lostintheworldofstatistics.Youarebudding

astronomers,notbuddingstatisticians.• USECOMMONSENSE.

Recommended