68
Lecture 06 10.12.21

Lecture 06 - raw.githubusercontent.com

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 06 - raw.githubusercontent.com

Lecture 0610.12.21

Page 2: Lecture 06 - raw.githubusercontent.com

Refresher QuizThe heights of men in a certain population follow a normal distribution with mean 69.7 inches and standard deviation 2.8 inches.a) If a man is chosen at random from the population, find the probability that he will be more than 72 inches tall.

b) If two men were chosen at random from the population, find the probability that (i) both of them will be more than 72 inches tall; (ii) their mean height will be more than 72 inches.

Page 3: Lecture 06 - raw.githubusercontent.com

The heights of men in a certain population follow a normal distribution with mean 69.7 inches and standard deviation 2.8 inches.a) If a man is chosen at random from the population, find the probability that he will be more than 72 inches tall.

b) If two men were chosen at random from the population, find the probability that (i) both of them will be more than 72 inches tall; (ii) their mean height will be more than 72 inches.

> pnorm(72, 69.7, 2.8, lower.tail = F) = 20.5%

P(both > 72 in tall) = P(> 72) * P(> 72) = 0.205 * 0.205 = 4.20%

> pnorm(72, 69.7, 2.8/sqrt(2), lower.tail = F) = 12.26%

Page 4: Lecture 06 - raw.githubusercontent.com

The normal distribution

N(μ, σ2)

Page 5: Lecture 06 - raw.githubusercontent.com

SamplesPopulation

Summary: sampling distribution

μ, σy, s

y, s

y, s

y, s

μY = μσY = σ/ n

Y(Distribution of )

Page 6: Lecture 06 - raw.githubusercontent.com

Statistical estimation

• We view our data as a random sample from a population and use the information about our data to infer facts about the population

• Goals:

• (1) Determine estimate of some feature of the population (i.e. mean)

• (2) Assess the precision of the estimate

Page 7: Lecture 06 - raw.githubusercontent.com

Statistical estimation

Page 8: Lecture 06 - raw.githubusercontent.com

Statistical estimation

Population

μσ

Sample

y s

14x

Page 9: Lecture 06 - raw.githubusercontent.com

Statistical estimation

Population

μσ

Sample 1

y1 s1

14x

Sample 2

y2 s2

14x

Page 10: Lecture 06 - raw.githubusercontent.com

μY

σY

Standard error of the mean is estimated from the sampling distribution

σY = σn

Standard deviation of sampling distribution

SEY = sn

Standard error of the mean

Page 11: Lecture 06 - raw.githubusercontent.com

Standard error of the mean is estimated from the sampling distribution

SEY = sn

Standard error of the mean

• Standard error of the mean (SE) is a measure of reliability or precision of the sample mean as an estimate of the population mean

• SE incorporates the two factors that influence reliability:

• Variability of observations (s)

• Sample size (n)

Page 12: Lecture 06 - raw.githubusercontent.com

Standard error of the mean is estimated from the sampling distribution

n = 14 y = 32.81cm2

s = 2.48cm2

SEY = sn

Page 13: Lecture 06 - raw.githubusercontent.com

Standard error of the mean is estimated from the sampling distribution

n = 14 y = 32.81cm2

s = 2.48cm2

SEY = 2.4814

= 0.66cm2What does the SE mean?

Page 14: Lecture 06 - raw.githubusercontent.com

Standard error of the mean is estimated from the sampling distribution

Page 15: Lecture 06 - raw.githubusercontent.com

Standard error of the mean is estimated from the sampling distribution

sSE

Page 16: Lecture 06 - raw.githubusercontent.com

Standard error (SE) versus standard deviation (SD)

SD = dispersion of data SE = unreliability in the estimate of the population mean

Page 17: Lecture 06 - raw.githubusercontent.com

When would we choose to plot SE v. SD?

SD = dispersion of data SE = unreliability in the estimate of the population mean

Do you want to compare means or summarize data variability?

Page 18: Lecture 06 - raw.githubusercontent.com

Standard error (SE) versus standard deviation (SD)

32.69 32.12 31.97

1.68 2.47 2.52

0.451 0.209 0.067

ys

SE

n = 14 n = 140 n = 1400 n → ∞y → μs → σ

SE → 0

y = 32.81cm2 s = 2.48cm2

Page 19: Lecture 06 - raw.githubusercontent.com

Standard error (SE) versus standard deviation (SD)

32.69 32.12 31.97

1.68 2.47 2.52

0.451 0.209 0.067

ys

SE

n = 14 n = 140 n = 1400 n → ∞y → μs → σ

SE → 0

y = 32.81cm2 s = 2.48cm2

Page 20: Lecture 06 - raw.githubusercontent.com

Standard error (SE) versus standard deviation (SD)

32.69 32.12 31.97

1.68 2.47 2.52

0.451 0.209 0.067

ys

SE

n = 14 n = 140 n = 1400 n → ∞y → μs → σ

SE → 0

y = 32.81cm2 s = 2.48cm2

Page 21: Lecture 06 - raw.githubusercontent.com

A pharmacologist measured the concentration of dopamine in the brains of eight rats. The mean concentration was 1,269 ng/gm and the standard

deviation was 145 ng/gm. What was the standard error of the mean?

Example

SE =s

n

SE =145

8= 51.2

Page 22: Lecture 06 - raw.githubusercontent.com

This quantity is a measure of the accuracy of the sample mean as an estimate of the population mean

Practice

This quantity tends to stay the same as the sample size goes up

This quantity tends to go down as the sample size goes up

Standard Error (SE) Standard Deviation (SD)

Standard Error (SE) Standard Deviation (SD)

Standard Error (SE) Standard Deviation (SD)

Page 23: Lecture 06 - raw.githubusercontent.com

This quantity is a measure of the accuracy of the sample mean as an estimate of the population mean

Practice

This quantity tends to stay the same as the sample size goes up

This quantity tends to go down as the sample size goes up

Standard Error (SE) Standard Deviation (SD)

Standard Error (SE) Standard Deviation (SD)

Standard Error (SE) Standard Deviation (SD)

Page 24: Lecture 06 - raw.githubusercontent.com

The confidence intervalThe woman’s position

Page 25: Lecture 06 - raw.githubusercontent.com

The confidence interval

The woman’s position

Page 26: Lecture 06 - raw.githubusercontent.com

The confidence interval

The woman’s position

Page 27: Lecture 06 - raw.githubusercontent.com

The confidence interval

Not likely, but possible

The woman’s position

Page 28: Lecture 06 - raw.githubusercontent.com

The confidence intervalThe woman’s position

μ y2(SE)

95% confidence interval of woman’s position = position of the dog +/- 2 x SE

Page 29: Lecture 06 - raw.githubusercontent.com

The confidence interval

95%

Sampling distribution of Y - random sample from normal distribution

y 2SE2SE

Page 30: Lecture 06 - raw.githubusercontent.com

Y ± 1.96 σn

Will contain for 95% of all samples

μ

The confidence interval

95%

Pr[−1.96 < Z < 1.96] = 0.95

Y − μσ/ n

But… we don’t know σ

Page 31: Lecture 06 - raw.githubusercontent.com

Student’s t distribution for confidence intervals

95%

Pr[−1.96 < Z < 1.96 = 0.95

Y − μσ/ n

Y ± 1.96 σn

Will contain mu for 95% of all samples

μ

But… we don’t know muσ

s

t0.025y

“Critical value”

Page 32: Lecture 06 - raw.githubusercontent.com

Student’s t distribution for confidence intervals

Standard normal

Page 33: Lecture 06 - raw.githubusercontent.com

Student’s t distribution for confidence intervals

Standard normaldf = 3

df = 10

Shape of distribution depends on degrees of freedom (df = n - 1)

Student’s t distribution is similar to normal distribution

but has a larger SD

y ± t0.025sn

“Critical value”

“Two-tailed 5% critical

value”

Page 34: Lecture 06 - raw.githubusercontent.com

Critical value and Student’s t distribution

95%

“Two-tailed 5% critical value” = Combined area above t and below -t = 5%“Two-tailed 5% critical value” = Area between two tails = 95%

“Two-tailed 5% critical

value”

Page 35: Lecture 06 - raw.githubusercontent.com
Page 36: Lecture 06 - raw.githubusercontent.com

N-1: degrees of freedom explained

100-10 20 30

0-10 = -10 20-10 = 10

30-10 = 20-10-10 = -20

(-20) + (-10) + (10) + (20) = 0Sum of deviations is always zero!

Page 37: Lecture 06 - raw.githubusercontent.com

N-1: degrees of freedom explained

100-10 20 30

0-10 = -10 20-10 = 10

30-10 = 20-10-10 = -20

(-20) + (-10) + (10) + X = 0Sum of deviations is always zero!

Has to be +20 from mean…

Page 38: Lecture 06 - raw.githubusercontent.com

Calculating the confidence interval: butterflies

n = 14 y = 32.81cm2

s = 2.48cm2df = 13

y ± t0.025sn

Page 39: Lecture 06 - raw.githubusercontent.com
Page 40: Lecture 06 - raw.githubusercontent.com

qt(p, df)

Critical value

Calculating the confidence interval: butterflies

n = 14 y = 32.81cm2

s = 2.48cm2df = 13

y ± t0.025sn

32.81 ± 2.16 2.4814

32.81 ± 1.43qt(0.975, 13)

(31.4,34.2)

95%

Page 41: Lecture 06 - raw.githubusercontent.com

Note: why use Student’s t distribution?

n = 14 y = 32.81cm2

s = 2.48cm2df = 13

y ± z0.025sn

32.81 ± 1.96 2.4814

32.81 ± 1.29 (31.51,34.11)

95%

Normal

Student’s t

Page 42: Lecture 06 - raw.githubusercontent.com

sSE

Calculating the confidence interval: butterflies

32.81 ± 1.43

CI

Page 43: Lecture 06 - raw.githubusercontent.com

Critical value and Student’s t distribution

2.5% 2.5%95%

−t0.025 t0.0250

y ± t0.025sn

Page 44: Lecture 06 - raw.githubusercontent.com

Critical value and Student’s t distribution

5% 5%90%

−t0.05 t0.050

y ± t0.05sn

Page 45: Lecture 06 - raw.githubusercontent.com

Calculating the confidence interval: butterflies

n = 14 y = 32.81cm2

s = 2.48cm2df = 13

y ± t0.05sn

90%

Page 46: Lecture 06 - raw.githubusercontent.com
Page 47: Lecture 06 - raw.githubusercontent.com

Calculating the confidence interval: butterflies

n = 14 y = 32.81cm2

s = 2.48cm2df = 13

y ± t0.05sn

32.81 ± 1.77 2.4814

32.81 ± 1.17 (31.6,34.0)

The higher the confidence level, the wider the confidence interval

90%

Page 48: Lecture 06 - raw.githubusercontent.com

A pharmacologist measured the concentration of dopamine in the brains of eight rats. The mean concentration was 1,269 ng/gm and

the standard deviation was 145 ng/gm. Construct a 95% confidence interval for the population mean.

Example

SE =s

n

SE =145

8= 51.2

y ± t0.025sn

1269 ± (t0.025)(51.2)

(df = n - 1 = 7)

Page 49: Lecture 06 - raw.githubusercontent.com

qt(p = 0.025, df = 7, lower.tail = F)

qt(p = 0.975, df = 7, lower.tail = T)

Page 50: Lecture 06 - raw.githubusercontent.com

A pharmacologist measured the concentration of dopamine in the brains of eight rats. The mean concentration was 1,269 ng/gm and

the standard deviation was 145 ng/gm. Construct a 95% confidence interval for the population mean.

Example

SE =s

n

SE =145

8= 51.2

y ± t0.025sn

1269 ± (2.365)(51.2)1269 ± 121.1

(df = n - 1 = 7)

(1147.9,1390.1)

Page 51: Lecture 06 - raw.githubusercontent.com

One-sided confidence intervals

5%95%

t0.050

y + t0.05sn

“one-tailed 5% critical

value”

Page 52: Lecture 06 - raw.githubusercontent.com

y + t0.05sn

(df = n - 1 = 7)

One-sided confidence intervalsA pharmacologist measured the concentration of dopamine in the brains of eight rats after a treatment aimed to reduce dopamine.

The mean concentration was 1,269 ng/gm and the standard deviation was 145 ng/gm. Construct a 95% confidence interval for

the population mean after treatment.

We don’t expect dopamine to increase, so we can use a one-sided CI here with the upper

boundary fixed

Page 53: Lecture 06 - raw.githubusercontent.com

qt(p = 0.05, df = 7, lower.tail = F)

qt(p = 0.95, df = 7, lower.tail = T)

Page 54: Lecture 06 - raw.githubusercontent.com

y + t0.05sn

(df = n - 1 = 7)

1269 + (1.895)(51.2)

(∞,1366.0)

One-sided confidence intervalsA pharmacologist measured the concentration of dopamine in the brains of eight rats after a treatment aimed to reduce dopamine.

The mean concentration was 1,269 ng/gm and the standard deviation was 145 ng/gm. Construct a 95% confidence interval for

the population mean after treatment.

We don’t expect dopamine to increase, so we can use a one-sided CI here with the upper

boundary fixed

Page 55: Lecture 06 - raw.githubusercontent.com

One-sided confidence intervals

95%, two-sided

(1147.9,1390.1)(−∞,1366.0)

95%, one-sided

5%2.5%2.5%

Page 56: Lecture 06 - raw.githubusercontent.com

Confidence intervals and randomness

Population

μ = 25.4σ = 0.08

25.40 25.45 25.5025.3525.30

y = 25.419s = 0.085

y = 25.32s = 0.056

y = 25.39s = 0.091

y = 25.451s = 0.064

95% of the sample confidence intervals will

contain the true population mean

Page 57: Lecture 06 - raw.githubusercontent.com

Confidence intervals and randomness

Population

μ = 25.4σ = 0.08

y = 25.419s = 0.085

y = 25.32s = 0.056

y = 25.39s = 0.091

y = 25.451s = 0.064

25.40 25.45 25.5025.3525.30

95% of the sample confidence intervals will

contain the true population mean

Page 58: Lecture 06 - raw.githubusercontent.com

Confidence intervals and randomness• Larger samples produce narrower confidence intervals

• Because SE is smaller (divide by square root of n)

• A confidence interval can be interpreted as a probability… with caution!

• Pr{a sample will give us a CI that contains the true mean} = 0.95

• Pr{the true mean is within our CI} = 0.95

• An individual statement can be TRUE or FALSE, but if you create numerous statements, one statement will be TRUE 95% of the time

• “We are 95% confident that the true mean is between X and X”

Page 59: Lecture 06 - raw.githubusercontent.com

Confidence intervals and randomness

Pr[Y = 2] =16

Pr[5 = 2] ≠16

Y = 5

Pr{a sample will give us a CI that contains the true mean} = 0.95

Pr{the true mean is within our CI} = 0.95

Pr{31 < < 34} 0.95μ ≠

Page 60: Lecture 06 - raw.githubusercontent.com

Confidence intervals and randomness• Larger samples produce narrower confidence intervals

• Because SE is smaller (divide by square root of n)

• A confidence interval can be interpreted as a probability… with caution!

• Pr{a sample will give us a CI that contains the true mean} = 0.95

• Pr{the true mean is within our CI} = 0.95

• An individual statement can be TRUE or FALSE, but if you create numerous statements, one statement will be TRUE 95% of the time

• “We are 95% confident that the true mean is between X and X”

Page 61: Lecture 06 - raw.githubusercontent.com

A pharmacologist measured the concentration of dopamine in the brains of eight rats. The mean concentration was 1,269 ng/gm and

the standard deviation was 145 ng/gm. Construct a 95% confidence interval for the population mean.

SE =s

n

SE =145

8= 51.2

y ± t0.025sn

1269 ± (2.365)(51.2)1269 ± 121.1

(df = n - 1 = 7)

(1147.9,1390.1)

We are 95% confident that the mean concentration of dopamine of all rats is between 1,147.9 and 1,390.1 ng/gm

Page 62: Lecture 06 - raw.githubusercontent.com

Assumptions for estimating confidence intervals

• Conditions on the design of the study:

• (1) Data is a random sample from a large population

• (2) Observations in the sample must be independent of each other

Page 63: Lecture 06 - raw.githubusercontent.com

What does it mean for samples to be independent of each other?

#1 #2 #3

n = 3? n = 9?

Still a good experimental

design! Means > individual sample

“Hierarchical data structures”

“Biological replicates”

“Technical replicates”

Page 64: Lecture 06 - raw.githubusercontent.com

What does it mean for samples to be independent of each other?

A B C

“Hierarchical data structures”

How can we know for SURE the difference between A, B, and C is due to treatment and not due to difference between flasks?

Page 65: Lecture 06 - raw.githubusercontent.com

Assumptions for estimating confidence intervals

• Conditions on the design of the study:

• (1) Data is a random sample from a large population

• (2) Observations in the sample must be independent of each other

• Conditions on the form of the population distribution

• (3) If n is small, the population distribution must be ~normal

• (4) If n is large, the population distribution doesn’t have to be normal

Page 66: Lecture 06 - raw.githubusercontent.com

Why does it matter if our population is normally distributed?

Is the mean even a meaningful measure for

this population?

If n is large enough, the sample distribution will be normal regardless

of the shape of the population

Page 67: Lecture 06 - raw.githubusercontent.com

How can we tell if a population is normally distributed?

• Plot distribution (i.e. histogram)

• Every analysis should begin with an inspection of the data and the points that lie far from the center

• Quantile plot

• Shapiro-wilks test for non-normality

• If not normal distribution, try a data transformation

Page 68: Lecture 06 - raw.githubusercontent.com

Announcements

• Extra practice problems from the textbook posted to GitHub and Canvas

• Note: no solutions, but use your classmates/TAs for help!

• If you need help keeping track of the stats R functions we have been using, check out the stats_R_cheatsheet.md on GitHub!