16
7/21/12 1 The Urn Model, Simula6on, and Resampling methods Deborah Nolan University of California, Berkeley Overview Introductory example Basics elements of the Urn model Simula6on studies Hypothesis tes6ng Confidence Interval and Survey Sampling Salk Field Trials Polio leN many children crippled and some who could survive only with respirators

Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

1

TheUrnModel,Simula6on,andResamplingmethods

DeborahNolanUniversityofCalifornia,Berkeley

Overview

•  Introductoryexample•  BasicselementsoftheUrnmodel

•  Simula6onstudies

•  Hypothesistes6ng•  ConfidenceIntervalandSurveySampling

SalkFieldTrials

PolioleNmanychildrencrippledandsomewhocouldsurviveonlywithrespirators

Page 2: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

2

FieldTrial

•  JonasSalkhaddevelopedavaccinethatshowedpromisingresultsinthelaboratory

•  Readytotestonalargerscale•  1954‐ayear‐longfieldtrialconducted•  1.8Mchildrenfrom217areasinUS,Canada,andFinland

•  Cost$17.5M

Randomassignment

•  AcarefullyappliedchanceprocessthatgiveseachvolunteeranequalprobabilityofgeYngvaccineorsaltsolu6on.

•  Therandomiza6onturnspoten6albiasesintochanceerror,i.e.thetwogroupswillbesimilarwithrespecttofactorsthatmightbiastheresult,evenifthesefactorsareunobserved.

Placebocontrols

•  Childrenwereinoculatedwithsimplesaltsolu6on.

•  Placeboeffect–biasthatcomesfromreassuranceoftakinganotherwiseworthlessdrugsubs6tute.

Double‐blindevalua6on

•  Neitherchildrennorphysicianswhoevaluatedtheirsubsequenthealthstatusknowwhohadbeengiventhevaccine/salt.

•  Behaviorofparentsandchildrenwouldn’tbeaffectedbyknowledgethattheyreceivedthevaccine.

•  Physicianwouldn’tbebiasedwhenfacingaborderlinecasethatwasdifficulttodiagnose.

Page 3: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

3

Randomizedplacebocontrolmethod

•  Randomlyassigntreatmenttochildren.

•  Maximumeffortstoeliminate“observerbias”byuseofplacebo(injec6onwithsaltsolu6on)and“blinding”.

Outcome

•  56ofthechildrenreceivingthevaccinecontractedpolio•  142ofthosewhodidnotreceivethevaccinecontractedthedisease.•  Itseemslikethevaccineworks,butcouldthisresulthaveeasilyhappenedbychance?•  Therandomizeddesignlet’susanswerthatques6on

Ineffec6veVaccineScenario

•  The198childrenwhocontractedpoliowouldhavebecomesickwhetherornottheyreceivedthevaccine

•  Itwasonlytherandomiza6onthatledto142ofthe198sickchildrenbeingassignedtothesaltsolu6on.Nothingelsewasgoingon

HowLikelyIsThat?

•  400,000children:198aresickand399,802arehealthy.

•  200,000ofthe400,000childrenarepickedatrandomtoreceivethevaccine.

• What’sthechancethatasfewas56orfewerofthesickchildrenaregiventhevaccine?

Page 4: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

4

TheUrnModel

•  Marbles:400,000–oneforeachchild– 0forhealthyand1forcontractpolio– 399,8020sand1981s

•  Draws:– 200,000drawswithoutreplacementfromtheurn(thetreatedchildren)

•  Summary:– Sumthevaluesdrawn

BOXModel

01

0

BOXOF400,000Children

Sumofdraws=numberofpoliocasesintreatmentgroup

200,000drawsWithoutreplacement

>vals=c(0,1)>counts=c(399802,198)

>urn=rep(vals,6mes=counts)>results=replicate(50000,

sum(sample(urn,size=200000,replace=FALSE)))

>mean(results)[1]98.97672

>sd(results)[1]7.029245

>sum(results<=56)[1]0

>hist(results,breaks=50)ORplot(table….)

0500

1000

1500

2000

2500

3000

198 of 400,000 children are sick.

Draw 200,000 at random and count of number of sick

Number of sick children in sample of 200000

Counts

out of 50000 trials

72 75 78 81 84 87 90 93 96 99 103 107 111 115 119 123 128

Page 5: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

5

•  In50,000trialsitneverhappened–wedidnotgetasinglesampleof200,000with56orfewercases

•  TheSalkfieldtrialwasaturningpointinmedicalresearch

Anotherexample

Calciumandbloodpressure

•  Experiment:Studytheeffectofcalciumsupplementsonbloodpressureformalesubjects

•  Randomizedinto2groups–– Onereceivedcalciumsupplementsfor12wks– Onereceivedaplacebofor12wks

•  Double‐blindexperiment•  Response:reduc6oninbloodpressure: (ini6albp–bpaNer12wks)

Outcome

•  Treatmentgroup:7,‐4,18,17,‐3,‐5,1,10,11,‐2

•  Sothefirstpersonsbloodpressuredecreasedby7fromthestarttotheendoftheprogram.

•  Thesecondperson’sincreasedby4,thethirddecreasedby18,andsoon.

•  Forthegroupthatreceivedtheplacebo:‐1,12,‐1,‐3,3,‐5,5,2,‐11,‐1,‐3

Page 6: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

6

InformalAnalysisThoseinthecalciumgroupreducedtheirbloodpressureby5onaverage,but..

-10 -5 0 5 10 15

Calcium

Placebo

Blood Pressure

InformalAnalysis

•  Thedistribu6onshaveroughlythesamespread

•  Themeanofthecalciumgrouplookstobeabout5mmhigher

•  Themeanofthecontrolgrouplookstobeabout0mm

Scenario

•  Whatifthecalciummakesnodifference,andit’sjustbychancethatintherandomassignmentthecalciumgroupgotmorepeoplewhohadalowerbloodpressureaNer12weeks.

•  Wecanuseanurnmodeltoexaminethischanceprocess.

BOXModel

7…

‐4

Averageofdrawsfrombox

18

Boxof20pa6ents10drawsw/outreplacement

Page 7: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

7

TheUrnModel

•  Marbles:20– Values:bloodpressurechange7,‐4,18,17,‐3,‐5,1,10,11,‐2,‐1,12,‐1,‐3,3,‐5,5,2,‐11,‐1,‐3

– Oneforeachsubject•  Draws:– 10drawsfromtheurn(thecalciumtreatment)

•  Summary:– Averagechangeinbloodpressure

UsingtheurninR

Ifthecalciumsupplementhasnoeffectthentheallsubjectswouldhavethesameresponsewhethertheyreceivedthesupplementornot

> avgs = replicate(100000, mean(sample(bp, 10, replace = FALSE))) > plot(table(avgs))

Let’sfindtheapproximatedistribu6onofthesampleaverageundertheUrnModel.

>sum(results>=5)/100000[1]0.0616

TheBasicUrnModel

Page 8: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

8

Informa6onabouturn:

•  Marbles:– Whatvaluesdowewriteonthemarbles?– Howmanymarblesofeachvaluedowehave?

•  Draws:– Howmanydrawsdowetakefromtheurn?– Wewereplacemarblebetweendraws?

•  Summary:– Howdowesummarizethevaluesdrawn?

UsingtheUrninR:Simula6on

•  Setupanurnwithmarbles•  Samplefromtheurn(specifynumberofdrawsandwithorwithoutreplacement)

•  Dosomethingwiththeresults,e.g.takesum

•  Repeatmany,many,many6mes

•  Useempiricaldistribu6ontoapproximatetruedistribu6on

SeYnguptheurn

> vals = c(…) # values on marbles > counts = c(…) #counts for each val # Create the urn > urn = rep(vals, counts)

SamplingfromtheurninR

#Drawfromtheurnn6meswithreplacement> sample(urn, n, replace = TRUE)

# sum the draws from the urn > sum(sample(urn, n, replace = TRUE)) # repeat this process m times > replicate(m, sum(sample(urn, n, replace = TRUE)))

Page 9: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

9

Useurntoanswerques6ons

•  Rolladie96mesandtakethesumoftherolls.

• What’sthechancethesumis25orless?

Simula6on

•  Draw9marbleswithreplacementfromtheurn,recordthesum.

•  Repeat10,0006mestoget10,000sums

•  Findthepropor6onofthose10,000sumsthatare25orless

•  ThisempiricalproporFonshouldbeclosetothechancethatthesumwillbe25orless

StepsinR

•  Setuptheurnwiththe“marbles”urn=1:6

•  Samplefromtheurn

sample(urn,9,replace=TRUE)

•  Dosomethingwiththeresults,e.g.takesum

sum(sample(urn,9,replace=TRUE))

StepsinR

•  Repeatmany,many,many6mesobserva6ons=replicate(10000,

sum(sample(urn,9,replace=TRUE))

•  Makeatableofthepropor6onsof6mesobservedeachvalue

approxChance=table(observa6ons)/10000

Page 10: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

10

cumsum(approxChance)["25"]25

0.12220.00

0.02

0.04

0.06

0.08

observations

approxChance

14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48

ThePMFhasanormalshape

Connec6ontoProbability

•  TheUrnmodelisasimplechanceprocess•  Considerthefirstdraw:– Valuedrawnisrandom

•  Avgofurn=Expectedvalueforonedraw•  SDofurn=SDforonedraw

Connec6ontoProbability

•  Considersumofndraws:•  ExpectedValueofsum=Avgofurn

•  StandardErrorforsum=root(n)SDofurn

•  Whendrawwithoutreplacement– Expectedvalueremainssame– Correc6onfactorforSE

Oursimpleexample

•  Wecanfindtheexpectedvalueforthesum:•  Wecandeterminethestandarddevia6onforthesum:

•  Buthowdowecomputethechancethesumis25orless?

!

962"1

12= 5.12

n!UrnAVG=9*3.5= 31.5

Page 11: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

11

WhyusetheUrnModel?

•  Connectssta6s6caltechniques,– HypothesisTests– ConfidenceInterval

tounderlyingchanceprocess

•  Ifclearonthechanceprocessthenclearonthesta6s6calmethodtoapply

•  Simula6onstudypowerfultoolfordataanalysis

WhyusetheUrnModel?

•  Simula6onstudypowerfultoolforunderstandingachanceprocess

•  Discoverimportantproper6es–lawoflargenumbers,CLT,etc.

•  Offeralterna6veapproachtolearningconcepts

•  Helpstudentdiscoverfeaturesofachanceprocess,e.g.root(n)behavior

Urnmodelfordistribu6ons•  DiscreteUniform(1,2,…,m)

–  mmarbles–  eachmarblehasauniquevalue:1tom–  Drawamarblefromtheurn,notethevalue,replaceit

•  Binomial(n,p)–  mmarblesintheurn–  marblesaremarkedwitha1(forsuccess)ora0(forfailure)–  propor6onpofthemarblesaremarked1–  Drawnmarbleswithreplacementandsumthevalues

•  Hypergeometric(m,n1,n2)–  n1+n2marblesintheurn–  n1marblesaremarkedwitha1(forsuccess)–  n2marblesaremarked0(forfailure)–  Drawmmarbleswithoutreplacementandsumthevalues

HypothesisTes6ng

Page 12: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

12

Tes6ng

•  Permuta6ontests•  Fisher’sexacttest•  Chi‐squaretests•  Canalsouseforzandttests,whereplaceassump6onsonthecontentsoftheurn(anduselargesampletheory)

Chi‐squaretest

•  GradeexpectinclassandGender> table(grades, sex) sex grades F M A 9 22 B 21 31 C 8 0

Isgenderindependentofexpectedgrade?

Urn

•  Valuesonmarbles:vals = c("A","B", "C")

•  Countsofeachtype:counts = c(31, 52, 8)

•  Urn:urn = rep(vals, counts)

Simulate

•  Samplefromurn:table(sample(urn, 53, replace = FALSE)) A B C 15 33 5

•  Repeatmany6mesandcalculatetheteststa6s6cforeachtable

results = replicate(50000, sum((table(sample(urn, 53)) – expect)^2/expect)))

Page 13: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

13

> sum(stats >= 5.538192)[1] 45

SurveySample

SimpleRandomSampling

•  Therandomselec6onprocessgivesussamplesthataretendtoberepresentaFveofthepopulaFon

•  Therandomselec6onprocessmeanswecanusechancetodeterminethechancesomeoneisincludedinthesampleANDtheprobabilitydistribuFonofthesampleaverage

UrnModelforSimpleRandomSampling

•  Onemarbleforeachunitinthepopula6on•  Valueonthemarbleistheresponsetotheques6on(Weconsideronlyoneques6on)

•  Drawnmarbleswithoutreplacementtogetasample

•  Useasummaryofthevaluesdrawntomakeinferenceaboutthepopula6on

Page 14: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

14

3Views–ATriptych

•  Popula6on–DescribestheUrn•  YourSample–Whatyoufoundfromtheexecu6onofthechanceprocess,whatyoudrewfromtheurn

•  SamplingDistribu6on–UnderstandthebehaviorofthechanceprocessanduseitalongwithYourSampletomakestatementsaboutthePopula6on;Wefindthisbehaviorbysimula6ngwiththeurnorbyapplyinglargesampletheory

Popula6onView

Popula6on=Urn•  Nmarbles‐

•  Unknowndistribu6onofvalues•  UnknownPopula6onAverage•  UnknownPopula6onSD

SampleView

Sample•  ndrawswithoutreplacement

•  Observe– Sampledistribu6on– Samplemean

– SampleSD

SimpleRandomSampleshouldresemblepopula6on–duetochanceprocess

SamplingDistribu6onView

ProbabilityModelforhowthesamplemightturnout

•  EV(SampleAVG)=Popula6onAVG•  SE(SampleAVG)=

ThePopAVGandSDareunknown,andwedon’tknowthesamplingdistribu6onofoursta6s6c

SD(Pop)

n

N ! n

N !1

Page 15: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

15

InferenceaboutthePopula6on

•  SRStellsusthesampleshouldberepresenta6veofthepopula6on

•  CreateabootstrappopulaFonfromthesample

•  FindthetheapproximatesamplingdistribuFonofthesta6s6cforthebootstrappopula6onanduseittomakeinferencesaboutthetruepopula6on

ClusterSampling

UrnModelforClusterSampling

•  Urn:packetsofmarbles•  N=numberofpacketsintheurn

•  n=numberofdraws/packetsfromtheurn

•  Drawpacketswithoutreplacement

•  Openupthepacketandtakeallofthemarblesfromthepacket

Stra6fiedSampling

Page 16: Overview The Urn Model, Simulaon, and Resampling methodsnolan/talks/beijing/Urn... · 2012-07-21 · 7/21/12 1 The Urn Model, Simulaon, and Resampling methods Deborah Nolan University

7/21/12

16

UrnModelforStra6fiedSampling

•  Murns:oneforeachstratum•  UrnihasNimarblesinit

•  DrawnimarbleswithoutreplacementfromUrni

Terence’sStuff:Simula6onIMSBulle6nMay2011

Itnowseemstomethatweareheadingintoanerawhenallsta6s6calanalysiscanbedonebysimula6on.Wedon’tneedlikelihoodfunc6ons;wejustneedtoknowhowtosimulatefromthemodelsweentertainforourdata.That’sbeenasinequanonofsta6s6calanalysisforsome6menow.…Wedon’tneedtheorytotellusourmethodworks;wejustneedtosimulateandsee.

Resources

•  BoxmodelinSta7s7csbyFreedman,Pisani,Purves

•  RVideo#13:hzp://www.stat.berkeley.edu/share/rvideos/R_Videos/R_Videos.html