52
Quick overview of basic concepts in Sta4s4cs (and a few bad jokes)

Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Quick overview of basic concepts in Sta4s4cs

(andafewbadjokes)

Page 2: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Mo4va4on

Sta1s1calconceptscanbesubtleanddifficulttoexplain

IthoughtitwouldbeusefultoreviewsomecentralideasinSta1s1cs

Page 3: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Overview

Descrip1vesta1s1cs•  Distribu1ons,sampling•  Descrip1vesta1s1csandplots•  Samplingdistribu1ons

Inferen1alsta1s1cs

•  Pointandintervales1mates•  Confidenceintervalsandthebootstrap•  Hypothesistests

Ifthereisinterest:neuraldataanalysis

Page 4: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Where do data come from?

Distribu(ons!

Truth!DataL

Page 5: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

How do we get data?

Simplerandomsample:eachmemberinthepopula1onisequallylikelytobeinthesample•  Randomselec1on

Soupanalogy!

Q:Whyisthisgood?

Page 6: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

An Example Dataset (flight delays) Cases

Variables

Page 7: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

CategoricalVariable Quan1ta1veVariable

Cases

An Example Dataset (flight delays)

Page 8: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

What is a good first step when analyzing data?

Whatisausefulplotforcategoricaldata?

Whatisausefulplotforquan1ta1vedata?

Page 9: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Box and violin plots

Page 10: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Joy plots

Neatfact:Somesta1s1ciansfindviolinplotsugly•  Whydotheyfindthemugly?

X-massornaments

Page 11: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Dynamite plots

Neatfact:Manysta1s1cianshatedynamiteplots•  Whydotheyhatethem?

Page 12: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

What is a sta4s4c?

A:Asingledeathisatragedy;amilliondeathsisasta1s1c.

-JosephStallin

Descrip1vesta1s1csdescribethedatayouhavecollected•  Usuallydenotedwithromancharacters

•  Asinglecategoricalvariable:propor1on •  Asinglequan1ta1vevariable:themean •  Apairofquan1ta1vevariables:thecorrela1on

Page 13: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Example sta4s4c: correla4on coefficient

Thecorrela1oncoefficientrisasta1s1cthatmeasuresthelinearassocia1onbetweentwovariablesX,andY

risanes1mateofρ

r

Page 14: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Side note: correla4on ≠ causa4on

Thecorrela1oncoefficientrisasta1s1cthatmeasuresthelinearassocia1onbetweentwovariablesX,andY

Page 15: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Descrip1vesta(s(csareusuallydenotedwithromancharacters

•  Asinglequan1ta1vevariable:themean•  Asinglecategoricalvariable:propor1on •  Apairofquan1ta1vevariables:thecorrela1on

A variety of sta4s4cs than can be used to describe data

psocialr

Page 16: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Popula1on/processparametersareusuallydenotedwithgreekcharacters

•  Asinglequan1ta1vevariable:themean•  Asinglecategoricalvariable:propor1on •  Apairofquan1ta1vevariables:thecorrela1on

Sta4s4cs have corresponding parameters

πsocial

ρ

(infinitedata)(infinitedata)

Page 17: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Sta4s4cal inference Weusesamplesta(s(cstomakejudgementsaboutpopula(onparameters

x

μ

Page 18: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

π,μ,σ,ρ

p,x,s,r

Sta4s4cal inference

Page 19: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Es4ma4on

Pointes3ma3on:

xisapointes1mateforμ

μ x

Page 20: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Regression Sharka`

acks

Icecreamsales

Thebi'sareusuallychosentominimizetheMSE:

Inlinearregression,wetrytofitalinetoourdata

Truth:y=β0+β1·xFit:ŷ=b0+b1·x

Page 21: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Regression cau4on # 1 Avoidtryingtoapplytheregressionlinetopredictvaluesfarfromthosethatwereusedtocreatetheline

Page 22: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Sta4s4cs have distribu4ons too!

Asamplingdistribu3onisthedistribu1onofsamplesta1s1cscomputedfordifferentsamplesofthesamesizefromthesamepopula1onSamplingdistribu1onareomenapproximatelyNormal•  Duetothecentrallimittheorem:sumsofmanyrandomvariatesleadtoaNormaldistribu1on

x1,x2,x3,…,xk

Page 23: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Es4ma4on

Pointes3ma3on:Intervales3mate:pointes1mate+marginoferror

xisapointes1mateforμ

Thepopula1onmeanissomewhereinhere

Page 24: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Confidence Intervals

Aconfidenceintervalforaparameterisanintervalcomputedbyamethodthatwillcapturetheparameteraspecifiedpropor(onof(mes•  (ifthesamplingprocesswererepeatedmany1mes)

Thesuccessrate(propor1onofallsampleswhoseintervalscontaintheparameter)isknownastheconfidencelevel

Page 25: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Think ring toss…

Parameterexistsintheidealworld

Wetossintervalsatit

95%ofthoseintervalscapturetheparameter

(fora95%CI)

WitsandWagers…

Page 26: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

The bootstrap to es4mate standard errors

Theplug-inprinciple

Page 27: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

95% Confidence Intervals

Ifthebootstrapdistribu1onis~symmetricwecanusepercen1lestocalculatea95%confidenceinterval

Formulasalsoexistforcalcula1ngCIs•  SEofthemeancanbees1matedas:•  CIis:Sta0s0c±2·SE

Page 28: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Ques4on: why is a p-value?

Ap-valueistheprobabilityofgeungasta1s1casormoreextremethantheobservedsta1s1cfromanullmodel(nulldistribu1on)

Page 29: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Hypothesis tests in 2 steps

1.Createadistribu1onshowingwhatthesta1s1cwouldlikelikeiftherewasnoeffect(H0=true)•  Nulldistribu1on(distribu1onifnoeffect)

2.Showthatthesta1s1cyouobservedisunlikelytocomefromthisnulldistribu1on•  P-valueis:Pr(S≥obs_stat|H0)•  P-valueisnot:Pr(H0|data)

Page 30: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Trial metaphor of hypothesis tes4ng

1.Assumeinnocence:H0istrue•  StateH0andHA

2.Gatherevidence

•  Calculatetheobservedsta1s1c3.Createadistribu1onofwhatevidencewouldlooklikeifH0istrue

•  Nulldistribu1on

4.Assesstheprobabilitythattheobservedevidencewouldcomefromthenulldistribu1on

•  p-value5.Makeajudgement

•  Assesswhethertheresultsaresta1s1callysignificant

Page 31: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Types of hypothesis tests test

Permuta3ontest:Createnulldistribu1onsimula1ngmanyrandomassignments•  1.Shufflethecondi1onlabels•  2.Computesta1s1conshuffledlabels•  3.Repeatmany1mestogetanulldistribu1on

Parametrictests:basedonthefactthatdistribu1onofsta1s1cisknown•  t-tests,ANOVAs,chi-squaredtests,etc...•  Lessrobust(basedonNormalapproxima1ons)

Page 32: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Visual hypothesis tests

“Visualhypothesistests”createvisualplotsofrandomlyrearrangedpoints•  i.e.,asetof‘innocent’plotswherethereisnopa`ern

Ifyoucantelltherealdatafromtheplotsofrandomlygenerateddatathisisequivalenttorejec1ngthenullhypothesis

Page 33: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Whichplotshowstherealdata?

Visual hypothesis tests

Page 34: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Visual hypothesis tests

SeeWickham,etal,2010

Page 35: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Permuta4on test example: Hypothesis tests for comparing two means

Ques3on:Isthispilleffec1ve?

Page 36: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Experimental design

Takeagroupofpar1cipantandrandomlyassign:•  Halftoatreatmentgroupwheretheygetthepill

•  Halfinacontrolgroupwheretheygetafakepill(placebo)•  Seeifthereismoreimprovementinthetreatmentgroupcomparedtothecontrolgroup

Par1cipantpool

ControlgroupTreatmentgroup

RandomAssignment

Page 37: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Hypothesis tests for differences in two group means

1.Statethenullandalterna1vehypothesis•  H0:μTreatment=μControlor μTreatment–μControl=0•  HA:μTreatment>μControlor μTreatment–μControl>0

2.Calculatesta1s1cofinterest•  xTreatment-xControl

3.Whatshouldwedonext…?

Page 38: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

3. Create the null distribu4on!

Reconstructedpar1cipantpooldataunderH0

Shuffled‘treatmentgroup’ Shuffled‘controlgroup’

ShuffledataforrandomassignmentconsistentwithH0

ControlgroupTreatmentgroup

Onenulldistribu1onsta1s1c:xShuff_Treatment-xShuff_control

Page 39: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Repeat 10,000 4mes

p-value=.064

Step4?

Page 40: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Fisher vs. Neyman

Ques3on:shouldyoureportexactp-values?•  p<.05•  p=.021

Null-hypothesissignificancetes1ng(NHST)isahybridoftwotheories:

•  1.Significancetes1ngofRonaldFisher

•  2.Hypothesistes1ngofJezyNeymanandEgonPearson

Fisher(1890-1962) Neyman(1894-1981)

Neatfact:theyhatedeachother

Page 41: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Type I and Type II Errors

RejectH0 DonotrejectH0

H0istrue TypeIerror(α)(falseposi1ve)

Noerror

H0isfalse Noerror TypeIIerror(β)(falsenega1ve)

Page 42: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example
Page 43: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Do reproducible research!

Toolsexistthatallowyoutoembedyouranalysesintoyourwri`enreports:

•  R:RMarkdown•  Python/Julia/R:Jupyter•  MATLAB:LiveScripts

Page 44: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

John Tukey

“Farbe`eranapproximateanswertotherightques1on,whichisomenvague,thananexactanswertothewrongques1on”

-Thefutureofdataanalysis.AnnalsofMathema1calSta1s1cs33(1),(1962),page13.

Page 45: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

What is Data Science?

SeeDonoho,2017

Page 46: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Ques4ons

Page 47: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example
Page 48: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example
Page 49: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Interac4ve single neuron analysis tutorial…

h`ps://neuraldata.net/spike/

CreatedbyBrookeFItzgerald

Page 50: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

Are we feeling ok about hypothesis tests?

Page 51: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example
Page 52: Quick overview of basic concepts in Sta4s4cs · Soup analogy! Q: Why is this good? An Example Dataset (flight delays) Variables Categorical Variable Quan1tave Variable An Example

AmericanSta1s1calAssocia1on’sStatementonp-values