day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare...

Preview:

Citation preview

TestingproportionsBIO5312FALL2017

STEPHANIE J. SPIELMAN,PHD

EstimationAnestimatorisastatistic(~formula)forestimatingaparameterAgoodestimatorisunbiased◦ Theexpectedvalue(expectation)oftheestimatorshouldequaltheparameterbeingestimated

◦ Meanofthesamplingdistributionofthestatisticshouldequaltheparameterbeingestimated

Agoodestimatorisconsistent◦ IncreasingthesamplesizeproducesanestimatewithsmallerSE

Agoodestimatorisefficient◦ HasthesmallestSEamonganyestimatoryoucouldhavechosen

Weareusuallyinterestedinpointestimate,SE,andCINormally-distributedvariable◦ 𝜇" = �̅�

◦ 𝜎() = ∑ (,-.,̅)01-234.5

◦ Knownσ◦ SE= 6

4�

◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ◦ SE= ?

4�

◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸

Hypothesistestingframeworkst-tests comparemeansforcontinuousquantitativedata

Todaywewilllearntoanalyzediscretecountdata("proportions"):◦ Binomialtest◦ 𝝌2 goodness-of-fit◦ Contingencytableanalysis◦ 𝝌2 association/homogeneityandFisherexacttest

Binomialtest𝑃 𝑘𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 = 4

H 𝑝H 1 − 𝑝 (4.H) = 4

H 𝑝H𝑞(4.H)

◦ Binomialcoefficient: 4H = 4!H! 4.H !

Hypothesistest:◦ H0 :Therelativefrequencyofsuccessintheunderlyingpopulationisp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationisnotp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationis>/<p0

Nullproportionofsuccessestotestagainst

Binomialtestassumption:BInSconditionsaresatisfied

Binaryoutcomes

Independenttrials(outcomesdonotinfluenceeachother)

n isfixedbeforethetrialsbegin

Sameprobabilityofsuccess,p,foralltrials

Binomialtest:ExampleInacertainspeciesofwasp,eachwasphasa30%chanceofbeingmale.Icollect12wasps,ofwhich5aremale.Doesmysampleshowevidencethat30%ofwaspsaremale?Useα=0.05.

Inotherwords,istheobservedsuccessproportion5/12(41.67%)consistentwithapopulationwhoseprobabilityofsuccessis0.3?

VerifyingassumptionsBinaryoutcomes:Maleorfemale

Independenttrials:Waspsexdoesnotinfluencesexofotherwasps

n isfixedbeforethetrialsbegin: Icollect12wasps

Sameprobabilityofsuccess,p,foralltrials: P(male)=0.3foreverywasp

PerformingthebinomialtestMysample:◦ p=5/12=0.417◦ n=12◦ X=5 WegenerallysayXinsteadofkwhenperforminghypothesistests,byconvention

H0 :Theprobabilityofbeingamalewaspisp0 =0.3

HA:Theprobabilityofbeingamalewaspdiffersfromp0 =0.3

ThePMFforwaspsex

0.014

0.071

0.17

0.240.23

0.16

0.079

0.029

0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)

Prob

abilit

y m

ass

Thesamplingdistributionforthebinomialteststatisticisbinomial:Thisiseffectivelyournull.

PerformingthetestRecall,theP-valueistheprobabilityofobtainingaresultasextremeormore◦ Therefore,P-valueisP(numberofsuccesses>=5)

𝑃(𝑋 ≥ 5) = 5)< 0.3<0.7(5).<) + 5)

U 0.3U0.7(5).U) + ⋯+ 5)5) 0.3

5)0.7(5).5))

p0 =0.3n=12X=5

0.014

0.071

0.17

0.240.23

0.16

0.079

0.029

0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)

Prob

abilit

y m

ass

> 1 – pbinom(4, 12, 0.3)[1] 0.2673445

Conclusions,round1OurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.

NotesonbinomialtestsComputingtwo-sidedP-valuesisnon-trivial◦ Binomialdistributionsymmetriconlywhenp=0.5

> binom.test(5, 12, 0.3)Exact binomial test

data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303

sample estimates:probability of success

0.4166667

Thisisnot0.276*2!

Computingthebinomialstandarderror𝑺𝑬𝒑" = 𝒑" 𝟏 − 𝒑" /𝒏�

= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐

�= 0.142

Whatisthisvalue?1. Thestandarddeviationofthesamplingdistributionoftheprobabilityofsuccess2. Quantifiestheprecisionof�̂�,ourestimateofthepopulationprob.ofsuccess

ComputingthebinomialconfidenceintervalClassically,weusetheWaldmethod◦ Note:Only"precise"whennisnotverylarge(>0.8)orsmall(<0.2)

◦ 𝒑" istheestimatedproportionofsuccess,X/n=0.417◦ 𝒁𝟎.𝟎𝟐𝟓 is1.96

◦ 𝑺𝑬𝒑" = 𝒑"(𝟏.𝒑")

𝒏�

= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐

�=0.142

𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"

CalculatingthebinomialCI𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"

0.417– 0.278<p<0.417+0.278à 0.417± 0.278> binom.test(5, 12, 0.3)Exact binomial test

data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303

sample estimates:probability of success

0.4166667

Rusesamoreexactmethod,theClopper-Pearsoninterval

FinalconclusionsOurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.

Ourestimatedproportionofsuccessis0.417withSE=0.142anda95%CIof0.417± 0.278.

Pause:Binomialexercise

Use𝟀2Goodness-of-fittestifwedonothavebinaryoutcomesGoodness-of-fittestasksifobservedproportionsareequaltoanullproportion

df =(numberofcategories)– 1– (numberofparametersestimatedfromdata)0forgoodness-of-fittest

Example:Arebabiesbornwiththesamefrequencyeverydayoftheweek?

Freq

uenc

y

Sun.Mon.

Tue.W

ed.Thu.

Fri. Sat.

70

60

50

40

30

20

10

0

Dayin1999 #birthsSunday 33Monday 41Tuesday 63Wednesday 63Thursday 47Friday 56Saturday 47

H0 :Theprobabilityofbirthwasthesameeverydayoftheweekin1999.

HA:Theprobabilityofbirthwasnotthesameeverydayoftheweekin1999.

Teststatistic𝜒) = ∑ #ij?klmkn-.#k,okpqkn- 0

#k,okpqkn-�r

Day# Observed

births #daysin1999 Expectedprop # ExpectedbirthsSunday 33 52 52/365 =0.142 0.142*52=49.863Monday 41 52 0.142 49.863Tuesday 63 52 0.142 49.863Wednesday 63 52 0.142 49.863Thursday 47 52 0.142 49.863Friday 56 53 0.145 50.822Saturday 47 52 0.142 49.863

Total 350 365 1 1

Calculatingtheteststatisticanddf

Day # Observedbirths # ExpectedbirthsSunday 33 0.142*52=49.863Monday 41 49.863Tuesday 63 49.863Wednesday 63 49.863Thursday 47 49.863Friday 56 50.822Saturday 47 49.863

Total 350 1

𝜒) = s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑r − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r )

#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r

r

= (yy.z{.|Uy)0

z{.|Uy+(z5.z{.|Uy)

0

z{.|Uy+(Uy.z{.|Uy)

0

z{.|Uy+(Uy.z{.|Uy)

0

z{.|Uy+(z}.z{.|Uy)

0

z{.|Uy+(<U.<:.|)))

0

<:.|))+(z}.z{.|Uy)

0

z{.|Uy

=15.05

df =#categories – 1=7– 1=6

OurcategoricalvariableisDaysofweekIthassevencategories

Reportsandconclusions

At0.0199,we reject the nullhypothesis that are births are equally distributedacross days in1999.We haveevidence that frequency of births differs acrossdays.

Prob

abili

ty d

ensi

ty

0.160.140.120.100.080.060.040.02

00

26

!5 10 15

!2 = 15.05

20

> 1 - pchisq(15.05, 6) [1] 0.01987137

Noteson𝟀2Goodness-of-fittestAssumptionsforall𝟀2 tests◦ Randomlysampleddatafrompopulation◦ Twoormorecategoriesofacategoricalvariable(dataiscounts)◦ Expected frequenciesmustbe>=1◦ Nomorethan20%ofexpectedfrequenciesare<5

Wetakeonly>=teststatisticforP-value◦ Generaltoall𝟀2tests

𝟀2goodness-of-fitinR#### Prepare data: Observed counts and expected proportions ####> births <- c(33,41,63,63,47,56,47)> expected <- c(52,52,52,52,52,53,52)> expected <- expected/sum(expected)> expected[1] 0.1424658 0.1424658 0.1424658 0.1424658 0.1424658 0.1452055 0.1424658

> chisq.test(births, p = expected)

Chi-squared test for given probabilities

data: birthsX-squared = 15.057, df = 6, p-value = 0.01982

BinomialispreferredfortwogroupsTempleUniversitystudentsare52%female,48%male.DoesthisclassreflecttheTemplestudentpopulation?

Wehave19students:7femalesand12males.

BinomialP-valuesaremoreprecise> binom.test(7, 19, 0.52)

Exact binomial test

data: 7 and 19number of successes = 7, number of trials = 19, p-value = 0.251alternative hypothesis: true probability of success is not equal to 0.5295 percent confidence interval:0.1628859 0.6164221

sample estimates:probability of success

0.3684211

> chisq.test(c(7,12), p = c(0.52, 0.48))

Chi-squared test for given probabilities

data: c(7, 12)X-squared = 1.749, df = 1, p-value = 0.186

Pause:Goodnessoffitexercise

ContingencytableanalysisTestforanassociationbetweentwo(ormore)categoricalvariables◦ Areheartattacksmorelikelyforpeoplewhotakeaspirindaily?◦ Aresmokersmorelikelytodrinkthannon-smokers?

Twoflavors:◦ 𝟀2testforindependence(orhomogeneity)◦ Fisher'sExacttest

Contingencytablesshowassociatedcountsfortwo+categoricalvariables

Takes dailyaspirin Nodailyaspirin

Heartattack 75 62

Noheartattack 108 71

Example:𝟀2testforindependence/associationLifecycleofR.ondatrae

Uninfectedfrog Infectedfrog

Eatenbybird 1 47

Noteatenbybird 49 44

2variables:Eaten (2categoriesyes/no)Infected (2categoriesyes/no)

Example:𝟀2testforindependence

Uninfected frog Infectedfrog TOTAL

Eatenbybird 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

H0 :Infectionandbeingeatenareindependent

HA:Infectionandbeingeatenarenotindependent

Computingtheteststatistic

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0

#k,okpqkn~,��l

�p

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Underthenullhypothesis,thevariablesareindependent.ExpectedcalculationsemployP[AandB]=P[A]xP[B]

P[eatenanduninfected]=P[eaten]xP[uninfected]=48/141x50/141=0.1207

Expectedcount=P[eatenanduninfected] xtotal=17.02… =(row/total)x(column/total)x(total)

Performingthetest

𝜒) = s s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑l,p − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p

)

#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p

l

p

= (5.5}.:))0

5}.:)+(zz.y:.{)

0

y:.{+(z{.yy.y)

0

yy.y+(z}.U:.))

0

U:.)=31.9

df =(#r– 1)(#c– 1)=(2– 1)(2– 1)=1

Uninfected Infected

Eaten 117.02 4430.9

Noteaten 4933.3 4760.2

1 – pchisq(31.9, 1)[1] 1.623172e-08

Werejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.

PerformingthetestinR> data.table <- rbind(c(1,49), c(44,47)) > data.table

[,1] [,2][1,] 1 49[2,] 44 47

> chisq.test(data.table)Pearson's Chi-squared test with Yates' continuity correctiondata: data.tableX-squared = 29.809, df = 1, p-value = 4.768e-08

> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared testdata: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08

Thisiswhatwecalculatedonthelastslide("Rascalculator").Differencesarefromusingroundedexpectedcounts.

Yatescontinuitycorrection

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0

#k,okpqkn~,��l

�p

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,� .:.<0

#k,okpqkn~,��l

�p

Withoutcorrection

Yatescontinuitycorrection

DecreasestheteststatisticandincreasestheP-value

OddsTheodds ofsuccessaretheprobabilityofsuccessdividedbyfailure

𝑂 = o5.o

Theoddsofbeingeatenwhileinfected

𝑂 = �[k�qk4�4nr4�kpqkn]5.�[k�qk4�4nr4�kpqkn]

= z}/{55.z}/{5

= 1.07

𝑂 = �[k�qk4�4nr4�kpqkn]�[4iqk�qk4�4nr4�kpqkn]

= z}zz= 1.07

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Oddsratio,for2x2tablesTheodds ratioistheoddsofsuccessinonegroupdividedbyoddsofsuccessinasecondgroup

𝑂𝑅 = o3 (5.o3)⁄o0 (5.o0)⁄

Interpretation◦ OR=1:Oddsofsuccessisthesameforeithergroup◦ OR<1:Oddsofsuccessingroup2arehigherthangroup1◦ OR>1:Oddsofsuccessingroup1arehigherthangroup2

ORsquantifythedeviationfromnullin2x2contingencytabletests.

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02

𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑 Infectedfrogshave52.3 theoddsofbeingeaten

comparedtouninfectedfrogs.

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02

Infected frogshave52.3 theoddsofbeingeatencomparedtouninfected frogs.

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 0.02

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

𝑂5 = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛] =

471 = 47

𝑂) = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛] =

4449 = 0.899

𝑶𝑹 = 𝟒𝟕

𝟎. 𝟖𝟗𝟗 = 𝟓𝟐. 𝟑Infectedfrogshave52.3theoddsofbeingeatencomparedtouneaten frogs.

Eatenfrogshave52.3theoddsofbeinginfectedcomparedtouneatenfrogs.

TherearetwowaystocalculateOROnewillbe>1(52.3)andonewillbe<1(1/52.3 =0.019)◦ Wegenerallyusethe>1option◦ Convinceyourselfthatthisistrue.

Funfact:𝑶𝑹 = 𝒂∗𝒅𝒃∗𝒄

= 𝟏∗𝟒𝟒𝟒𝟗∗𝟒𝟕

= 𝟎. 𝟎𝟏𝟗

Oftenwereportlogodds =ln(OR) > log(52.3) [1] 3.956996

Uninfected Infected TOTAL

Eaten a 1 c 47 48

Noteaten b 49 d 44 93

TOTAL 50 91 141

CalculatingtheORstandarderror

𝑆𝐸 ln 𝑂𝑅 = 5�+ 5

j+ 5

p+ 5

n�

𝑆𝐸 ln 𝑂𝑅 = 55+ 5

z{+ 5

z}+ 5

zz� = 1.03

blah blah2

blob a c

blob1 b d

Uninfected Infected

Eaten 1 47

Noteaten 49 44

CalculatingthelogoddsCI

𝒍𝒏(𝑶𝑹� ) − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹� < 𝒍𝒏(𝑶𝑹) < 𝒍𝒏(𝑶𝑹)� + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹�

3.96– (1.96*1.03)<𝒍𝒏(𝑶𝑹)<3.96+(1.96*1.03)à 3.96± 2.02

Conclusions,withlogoddsWerejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.

Furthermore,frogsthatareeatenaremorelikelytobeinfectedcomparedtouneatenfrogs,withalogoddsratioof3.96andlogoddsCI of1.94– 5.98.

𝟀2testforhomogeneityIndependence:measuretwoproperties fromonesetofsubjects◦ Wemeasuredeaten andinfection forfrogs

Homogeneity:measureoneproperty ontwosetsofsubjectsfromdifferentpopulations◦ Measureeffectofmedicine insampleofcancerindividualsandsampleofhealthyindividuals

Example:testofhomogeneityDrug Placebo

Cancer 75 62

Healthy 108 71

H0 :Theprobabilitythatsymptomsimproveisthesameforbothcancerandhealthygroups.HA:Theprobabilitythatsymptomsimprovediffersbetweencancerandhealthygroups.

Inpracticalterms,thisusestheexactsameprocedureasatestforindependence.

Fisher'sExacttestMoreexactthan𝟀2andusedforlow-counttablesComputetheexactprobabilityofobservingtablewithcounts:

Fisher'stestcomputesthisvalueforallpossibletables withthesamerow/columntotals(margins)ComputesP-valuebysummingprobabilitiesfortableswithasextremeormorecountdistributions

blah blah2

blob a c

blob1 b d

𝑃 𝑎, 𝑏, 𝑐, 𝑑 =𝑎 + 𝑏 ! 𝑐 + 𝑑 ! 𝑎 + 𝑐 ! 𝑏 + 𝑑 !

𝑛! 𝑎! 𝑏! 𝑐! 𝑑!

Fisher'sexacttest

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared test

data: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08

> fisher.test(data.table)Fisher's Exact Test for Count Data

data: data.tablep-value = 8.37e-10alternative hypothesis: true odds ratio is not equal to 195 percent confidence interval:0.0005344122 0.1417331275

sample estimates:odds ratio0.02222648

ExactP-value

OurOR=52.3,or0.019.Slightdifferencesareexpectedbecausefisher.test()usesML

ApproximateP-value

Relativerisk:It'snottheORCommonlymeasuredinepidemiologicalstudies

Relativeriskistheprobabilityofanevent(ie disease)inanexposedgroup,relativetounexposedgroup◦ RR=P(eventwhenexposed)/P(eventwhennotexposed)

RelativeriskexampleLungcancer

Nolungcancer

Smoker 525 450

Non-smoker

32 621

RR=P(eventwhenexposed)/P(eventwhennotexposed)

RRofcancerduetosmokingexposure:=P(cancer|smoker)/P(cancer|notsmoker)=[525/(525+450)]/[32/(32+621)]

=10.99

à Smokershavea10.99timeshigherriskthandonon-smokerstodeveloplungcancer.

Liveexercise:Calculatetheoddsratioforasmokerdevelopingcancerrelativetoanon-smoker.

TheOddsRatioLungcancer

Nolungcancer

Smoker 525 450

Non-smoker

32 621

𝑂5 = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟]

𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟] =52532

𝑂) = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟]𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟] =

450621

𝑂𝑅 = 525/32450/621 = 𝟐𝟐. 𝟔𝟒

à Smokershave22.64timestheoddsofgettinglungcancerthannon-smokers.

What'sthepracticaldifference?Oddsratiosmeasuretheextentofassociationbetweenvariables.◦ Itistheratiooftwoodds(ratioofprob event:prob non-event)

Relativeriskisthemoreintuitivequantitythatwe"understand"◦ Itistheratiooftwoprobabilities(prob event)

RecaponestimationNormally-distributed variable◦ 𝜇" = �̅�◦ 𝜎() = 𝑠)

◦ Knownσ

◦ 𝑆𝐸,̅ =64�

◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ

◦ 𝑆𝐸,̅ =?4�

◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸

Binomially-distributedvariable

◦ �̂� = H4

◦ 𝑆𝐸o( = �̂�(1 − �̂�)/𝑛�

◦ 95%CI=�̂� ± 𝑍:.:)<𝑆𝐸o(

Log-Oddsratio

◦ 𝑙𝑜𝑔𝑂𝑅� = ln o(3 5.o(3⁄o(0 5.o(0⁄

◦ 𝑆𝐸¤i¥¦§� = 5�+ 5

j+ 5

p+ 5

n�

◦ 95%CI=𝑙𝑜𝑔𝑂𝑅� ± 𝑍:.:)<𝑆𝐸o(

Chooseyourownadventure,sofar

(Orfisher'sexacttest)

Recommended