25
June 23, 2008 Stat 111 - Lecture 13 - One Mean 1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two-sample Tests Statistics 111 - Lecture 13

June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

Embed Size (px)

Citation preview

Page 1: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 1

Inference for a Population Mean

Confidence Intervals and Tests with unknown variance and Two-

sample Tests

Statistics 111 - Lecture 13

Page 2: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 2

Administrative Notes

• Homework 4 due Wednesday• Homework 5 assigned tomorrow

• The final is ridiculously close (next Thursday)

Page 3: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 3

Outline• Review:• Confidence Intervals and Hypothesis Tests which

assume known variance• Population variance unknown:• t-distribution

• Confidence intervals and Tests using the t-distribution• Small sample situation• Two-Sample datasets: comparing two means• Testing the difference between two samples when

variances are known• Moore, McCabe and Craig: 7.1-7.2

Page 4: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 4

Chapter 5: Sampling Distribution of

• Distribution of values taken by sample mean in all possible samples of size n from the same population

• Standard deviation of sampling distribution:

• Central Limit Theorem:

Sample mean has a Normal distribution

• These results all assume that the sample size is large and that the population variance is known

Page 5: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 5

Chapter 6: Confidence Intervals

• We used sampling distribution results to create two different tools for inference

• Confidence Intervals: Use sample mean as the center of an interval of likely values for pop. mean

• Width of interval is a multiple Z* of standard deviation of sample mean

• Z* calculated from N(0,1) table for specific confidence level (eg. 95% confidence means Z*=1.96)

• We assume large sample size to use N(0,1) distribution, and we assume that is known (usually just use sample SD s)

Page 6: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 6

Chapter 6: Hypothesis Testing

• Compare sample mean to a hypothesized population mean 0

• Test statistic is also a multiple of standard deviation of the sample mean

• p-value calculated from N(0,1) table and compared to -level in order to reject or accept null hypothesis

• Eg. p-value < 0.05 means we reject null hypothesis• We again assume large sample size to use N(0,1)

distribution, and we assume that is known

Page 7: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 7

Unknown Population Variance

• What if we don’t want to assume that population SD is known?

• If is unknown, we can’t use our formula for the standard deviation of the sample mean:

• Instead, we use the standard error of the sample mean:

• Standard error involves sample SD s as estimate of

Page 8: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 8

t distribution

• If we have small sample size n and we need to use the standard error formula because the population SD is unknown, then:

The sample mean does not have a

normal distribution!

• Instead, the sample mean has a

T distribution with n - 1 degrees of freedom

• What the heck does that mean?!?

Page 9: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 9

t distribution• t distribution looks like a normal distribution, but has

“thicker” tails. The tail thickness is controlled by the degrees of freedom

• The smaller the degrees of freedom, the thicker the tails of the t distribution

• If the degrees of freedom is large (if we have a large sample size), then the t distribution is pretty much identical to the normal distribution

Normal distribution

t with df = 5

t with df = 1

Page 10: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 10

Known vs. Unknown Variance• Before: Known population SD

• Sample mean is centered at and has standard deviation:

• Sample mean has Normal distribution

• Now: Unknown population SD • Sample mean is centered at and has standard error:

• Sample mean has t distribution with n-1 degrees of freedom

Page 11: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 11

New Confidence Intervals• If the population SD is unknown, we need a new

formula for our confidence interval• Standard error used instead of standard deviation• t distribution used instead of normal distribution

• If we have a sample of size n from a population with unknown , then our 100·C % confidence interval for the unknown population mean is:

• The critical value is calculated using a table for the t distribution (back of textbook)

Page 12: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 12

Tables for the t distribution

• If we want a 100·C% confidence interval, we need to find the value so that we have a probability of C between -t* and t* in a t distribution with n-1 degrees of freedom

• Example: 95% confidence interval when n = 14 means that we need a

tail probability of 0.025, so t*=2.16

= 0.95

= 0.025

t*-t*

df = 13

Page 13: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 13

Example: NYC blackout baby boom• Births/day from August 1966:

• Before: we assumed that was known, and used the normal distribution for a 95% confidence interval:

• Now: let be unknown, and used the t distribution with n-1 = 13 degrees of freedom to calculate our a 95% confidence interval:

• Interval is now wider because we are now less certain about our population SD

Page 14: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 14

Another Example: Calcium in the Diet

• Daily calcium intake from 18 people below poverty line (RDA is 850 mg/day)

• Before: used known = 188 from previous study, used normal distribution for 95% confidence interval:

• Now: let be unknown, and use the t distribution with n-1 = 17 degrees of freedom to calculate our a 95% confidence interval:

Again, Wider interval because we have an unknown

Page 15: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 15

New Hypothesis Tests

• If the population SD is unknown, we need to modify our test statistics and p-value calculations as well

• Standard error used in test statistic instead of standard deviation

• t distribution used to calculate the p-value instead of standard normal distribution

Page 16: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 16

Example: Calcium in Diet• Daily calcium intake from 18 people below poverty line

• Test our data against the null hypothesis that 0 = 850 mg (recommended daily allowance)

• Before: we assumed known = 188 and calculated test statistic T= -2.32

• Now: is actually unknown, and we use test statistic with standard error instead of standard deviation:

Page 17: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 17

Example: Calcium in Diet• Before: used normal distribution to get p-value = 0.02

• Now: is actually unknown, and we use t distribution with n-1 = 17 degrees of freedom to get p-value ≈ 0.04

• With unknown , we have a p-value that is closer to the usual threshold of = 0.05 than before

Normal distribution

T= -2.32

prob = 0.01

T= 2.32

t17

distribution

T= -2.26

prob ≈ 0.02

T= 2.26

Page 18: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - One Mean 18

Review

• Known population SD • Use standard deviation of sample mean:

• Use standard normal distribution

• Unknown population SD • Use standard deviation of sample mean:

• Use t distribution with n-1 d.f.

Page 19: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 19

Small Samples

• We have used the standard error and t distribution to correct our assumption of known population SD

• However, even t distribution intervals/tests not as accurate if data is skewed or has influential outliers

• Rough guidelines from your textbook:• Large samples (n> 40): t distribution can be used even for

strongly skewed data or with outliers• Intermediate samples (n > 15): t distribution can be used

except for strongly skewed data or presence of outliers • Small samples (n < 15): t distribution can only be used if data

does not have skewness or outliers

• What can we do for small samples of skewed data?

Page 20: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 20

Techniques for Small Samples• One option: use log transformation on data

• Taking logarithm of data can often make it look more normal

• Another option: non-parametric tests like the sign test• Not required for this course, but mentioned in text book if

you’re interested

Page 21: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 21

Comparing Two Samples

• Up to now, we have looked at inference for one sample of continuous data

• Our next focus in this course is comparing the data from two different samples

• For now, we will assume that these two different samples are independent of each other and come from two distinct populations

Population 1:1 , 1

Sample 1: , s1

Population 2:2 , 2

Sample 2: , s2

Page 22: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 22

Blackout Baby Boom Revisited

• Nine months (Monday, August 8th) after Nov 1965 blackout, NY Times claimed an increased birth rate

• Already looked at single two-week sample: found no significant difference from usual rate (430 births/day)

• What if we instead look at difference between weekends and weekdays?

Sun Mon Tue Wed Thu Fri Sat

452 470 431 448 467 377

344 449 440 457 471 463 405

377 453 499 461 442 444 415

356 470 519 443 449 418 394

399 451 468 432

Weekdays Weekends

Page 23: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 23

Two-Sample Z test• We want to test the null hypothesis that the two

populations have different means• H0: 1 = 2 or equivalently, 1 - 2 = 0• Two-sided alternative hypothesis: 1 - 2 0

• If we assume our population SDs 1 and 2 are known, we can calculate a two-sample Z statistic:

• We can then calculate a p-value from this Z statistic using the standard normal distribution

• Next class, we will look at tests that do not assume known 1 and 2

Page 24: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 24

Two-Sample Z test for Blackout Data

• To use Z test, we need to assume that our pop. SDs are known: 1 = s1 = 21.7 and 2 = s2 = 24.5

• We can then calculate a two-sided p-value for Z=7.5 using the standard normal distribution• From normal table, P(Z > 7.5) is less than 0.0002, so our p-

value = 2 P(Z > 7.5) is less than 0.0004

• We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays

• Next class: get rid of assumption of known 1 and 2

Page 25: June 23, 2008Stat 111 - Lecture 13 - One Mean1 Inference for a Population Mean Confidence Intervals and Tests with unknown variance and Two- sample Tests

June 23, 2008 Stat 111 - Lecture 13 - Means 25

Next Class – Lecture 14

• More on Comparing Means between Two Samples

• Moore, McCabe and Craig: 7.1-7.2