40
Statistics: Unlocking the Power of Data Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal distribution for confidence intervals Normal distribution for p-values Standard normal

Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Embed Size (px)

Citation preview

Page 1: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Normal Distribution

STAT 250

Dr. Kari Lock Morgan

Chapter 5• Normal distribution• Central limit theorem• Normal distribution for confidence intervals• Normal distribution for p-values• Standard normal

Page 2: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

slope (thousandths)-60 -40 -20 0 20 40 60

Measures from Scrambled RestaurantTips Dot Plot

r-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

Measures from Scrambled Collection 1 Dot Plot

Nullxbar98.2 98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0

Measures from Sample of BodyTemp50 Dot Plot

Diff-4 -3 -2 -1 0 1 2 3 4

Measures from Scrambled CaffeineTaps Dot Plot

xbar26 27 28 29 30 31 32

Measures from Sample of CommuteAtlanta Dot Plot

Slope :Restaurant tips

Correlation: Malevolent uniforms

Mean :Body Temperatures

Diff means: Finger taps

Mean : Atlanta commutes

phat0.3 0.4 0.5 0.6 0.7 0.8

Measures from Sample of Collection 1 Dot PlotProportion : Owners/dogs

What do you notice?

All bell-shaped distributions!

Bootstrap and Randomization Distributions

Page 3: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

• The symmetric, bell-shaped curve we have seen for almost all of our bootstrap and randomization distributions is called a normal distribution

Normal Distribution

Fre

qu

en

cy

-3 -2 -1 0 1 2 3

05

00

10

00

15

00

Page 4: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Central Limit Theorem!

For a sufficiently large sample size, the distribution of sample

statistics for a mean or a proportion is normal

www.lock5stat.com/StatKey

Page 5: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Distribution of 100n30n10n1n

0.5p

0.0 0.5 1.0

50n

0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

0.7p

0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

0.0 0.5 1.0

0.1p

0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

Page 6: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

CLT for a MeanPopulation Distribution of

Sample DataDistribution of Sample Means

n = 10

n = 30

n = 50

Fre

qu

en

cy

0 1 2 3 4 5 6

0.0

1.5

3.0

1.0 2.0 3.0

Fre

qu

en

cy

0 1 2 3 4 5

04

8

1.5 2.0 2.5 3.0

Fre

qu

en

cy

0 2 4 6 8 12

01

02

5

1.4 1.8 2.2 2.6

x

0 2 4 6 8 10

Page 7: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Central Limit Theorem

• The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies

• The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work

• For small samples, it is more important that the data itself is approximately normal

Page 8: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Accuracy• The accuracy of intervals and p-values generated using simulation methods (bootstrapping and randomization) depends on the number of simulations (more simulations = more accurate)

• The accuracy of intervals and p-values generated using formulas and the normal distribution depends on the sample size (larger sample size = more accurate)

• If the distribution of the statistic is truly normal and you have generated many simulated randomizations, the p-values should be very close

Page 9: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

• The normal distribution is fully characterized by it’s mean and standard deviation

Normal Distribution

mean,standard deviationN

Page 10: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Bootstrap DistributionsIf a bootstrap distribution is approximately normally distributed, we can write it as

a) N(parameter, sd)b) N(statistic, sd)c) N(parameter, se)d) N(statistic, se)

sd = standard deviation of variablese = standard error = standard deviation of statistic

Page 11: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Hearing Loss• In a random sample of 1771 Americans aged 12 to 19, 19.5% had some hearing loss (this is a dramatic increase from a decade ago!)

• What proportion of Americans aged 12 to 19 have some hearing loss? Give a 95% CI.

Rabin, R. “Childhood: Hearing Loss Grows Among Teenagers,” www.nytimes.com, 8/23/10.

Page 12: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Hearing Loss

(0.177, 0.214)

Page 13: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Hearing Loss

N(0.195, 0.0095)

Page 14: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Confidence Intervals

If the bootstrap distribution is normal:

To find a P% confidence interval , we just need to find the middle P% of the distribution

N(statistic, SE)

www.lock5stat.com/statkey

Page 15: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Hearing Loss

(0.176, 0.214)

www.lock5stat.com/statkey

Page 16: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Randomization DistributionsIf a randomization distribution is approximately normally distributed, we can write it as

a) N(null value, se)b) N(statistic, se)c) N(parameter, se)

Page 17: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

p-valuesIf the randomization distribution is normal:

To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution

N( , )

Page 18: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

First Born Children• Are first born children actually smarter?

• Explanatory variable: first born or not• Response variable: combined SAT score

• Based on a sample of college students, we find

• From a randomization distribution, we find SE = 37

Page 19: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

First Born Children

SE = 37

What normal distribution should we use to find the p-value?

a) N(30.26, 37)b) N(37, 30.26)c) N(0, 37)d) N(0, 30.26)

Page 20: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Hypothesis TestingDistribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Observed Statistic

Distribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Distribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Observed Statistic

p-value

Page 21: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

First Born ChildrenN(0, 37)

www.lock5stat.com/statkey

p-value = 0.207

Page 22: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Standardized DataOften, we standardize the data to have mean 0

and standard deviation 1

This is done with z-scores

From x to z : From z to x:

Places everything on a common scale

x meanz

sd

Page 23: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Standard Normal• The standard normal distribution is the normal distribution with mean 0 and standard deviation 1

0,1N

Distribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Page 24: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Standardized DataConfidence Interval (bootstrap distribution):

mean = sample statistic, sd = SE

From z to x: (CI)

x statist Eic Sz

Bootstrap Distribution: N(statistic, SE)

Page 25: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

z*-z*

P%

P% Confidence Interval2. Return to

original scale with statistic z* SE

1. Find z-scores (–z* and z*) that capture

the middle P% of the standard normal

Page 26: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Confidence Interval using N(0,1)

If a statistic is normally distributed, we find a confidence interval for the parameter using

statistic z* SE

where the area between –z* and +z* in the standard normal distribution is the desired

level of confidence.

Page 27: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Confidence IntervalsFind z* for a 99% confidence interval.

www.lock5stat.com/statkey

z* = 2.575

Page 28: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

z*Why use the standard normal?

z* is always the same, regardless of the data!

Common confidence levels:

95%: z* = 1.96 (but 2 is close enough)

90%: z* = 1.645

99%: z* = 2.576

Page 29: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

In March 2011, a random sample of 1000 US adults were asked

“Do you favor or oppose ‘sin taxes’ on soda and junk food?”

320 adults responded in favor of sin taxes.

Give a 99% CI for the proportion of all US adults that favor these sin taxes.

From a bootstrap distribution, we find SE = 0.015

Sin Taxes

Page 30: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Sin Taxes

Page 31: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Sin Taxes

Page 32: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Standardized DataHypothesis test (randomization distribution):

mean = null value, sd = SE

From x to z (test) :

x meanz

sd

Randomization Distribution: N(null value, SE)

Page 33: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

p-value using N(0,1)

If a statistic is normally distributed under H0, the p-value is the probability a standard normal is beyond

Page 34: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

First Born Children

SE = 37

1) Find the standardized test statistic

2) Compute the p-value

Page 35: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

z-statistic

If z = –3, using = 0.05 we would

(a) Reject the null(b) Not reject the null(c) Impossible to tell(d) I have no idea

Page 36: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

z-statistic

• Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale

Page 37: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Confidence Interval Formula

*sample statistic z SE

From original data

From bootstrap

distribution

From N(0,1)

IF SAMPLE SIZES ARE LARGE…

Page 38: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Formula for p-values

From randomization

distribution

From H0

sample statistic null value

SEz

From original data

Compare z to N(0,1) for p-value

IF SAMPLE SIZES ARE LARGE…

Page 39: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

Standard Error

• Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations?

• We can!!!

• Or at least we’ll be able to next class…

Page 40: Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal

Statistics: Unlocking the Power of Data Lock5

To DoRead Chapter 5

Do HW 5 (due Friday, 4/3)