View
222
Download
3
Embed Size (px)
Citation preview
Chapter 21 1
Chapter 21
What Is a Confidence Interval?
Chapter 21 2
Recall from previous chapters:• Parameter
– fixed, unknown number that describes the population
• Statistic– known value calculated from a sample– a statistic is used to estimate a parameter
• Sampling Variability– different samples from the same population may yield
different values of the sample statistic– estimates from samples will be closer to the true
values in the population if the samples are larger
Chapter 21 3
Recall from previous chapters:
• Sampling Distribution– tells what values a statistic takes and how often it
takes those values in repeated sampling.
• Example:– sample proportions ( ’s) from repeated sampling
would have a normal distribution with a certain mean and standard deviation.
p̂
• Example:– The amount by which the proportion obtained from the
sample ( ) will differ from the true population proportion (p) rarely exceeds the margin of error.
p̂
Chapter 21 4
Overview
• The two major applications of inferential statistics involve the use of sample data to (1) estimate the value of a population parameter, and (2) test some claim (or hypothesis) about a population.
• We introduce methods for estimating values of these important population parameters: proportions and means.
• We also present methods for determining sample sizes necessary to estimate those parameters.
This chapter presents the beginning of inferential statistics.
Chapter 21 5
Definition
A point estimate is a single value (or point) used to approximate a population parameter.
Chapter 21 6
The sample proportion p is the best point estimate of the population proportion p.
ˆ
Definition
Chapter 21 7
Sampling Distribution of Sample Proportions
If numerous simple random samples of size n are taken from the same population, the sample proportions from the various samples will have an approximately normal distribution. The mean of the sample proportions will be p (the true population proportion). The standard deviation will be:
(1 ), where 1- .
p p pqq p
n n
)ˆ( p
Chapter 21 8
Rule Conditions• For the sampling distribution of the sample proportions to be valid, we must have
Random samples “Large” sample size
Chapter 21 9
Definition
A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI.
Chapter 21 10
Confidence Interval for a Population Proportion
• An interval of values, computed from sample data, that is almost sure to cover the true population proportion.
• “We are ‘highly confident’ that the true population proportion is contained in the calculated interval.”
• Statistically (for a 95% C.I.): in repeated samples, 95% of the calculated confidence intervals should contain the true proportion.
Chapter 21 11
Chapter 21 12
• since we do not know the population proportion p (needed to calculate the standard deviation) we will use the sample proportion in its place.
Formula for a 95% Confidence Interval for the Population
Proportion (Empirical Rule)• sample proportion plus or minus two
standard deviations ofthe sample proportion:
p̂
n)p(p
p̂ 1
2
Chapter 21 13
ˆ ˆ(1 )ˆ 2
p pp
n
standard error (estimated standard deviation of )p̂
Formula for a 95% Confidence Interval for the Population
Proportion (Empirical Rule)
Chapter 21 14
Margin of Error
nn
..
n
p̂p̂
1)501(50
)1(
2
2
(plus or minus part of C.I.)
Chapter 21 15
Formula for a C-level (%) Confidence Interval for the
Population Proportion
ˆ ˆ(1 )ˆ * p pp
nz
where z* is the critical value of the standard normal distribution for confidence level C
Chapter 21 16
Chapter 21 17
Table 21.1: Common Values of z*
Confidence LevelC
Critical Valuez*
50% 0.67
60% 0.84
68% 1
70% 1.04
80% 1.28
90% 1.64
95% 1.96 (or 2)
99% 2.58
99.7% 3
99.9% 3.29
Chapter 21 18
Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of photo radars for issuing traffic tickets. Using the survey results:
a) Find the margin of error E that corresponds to a 95% confidence level.
b) Find the 95% confidence interval estimate of the population proportion p.
c) Based on the results, can we safely conclude that the majority of adult Nanaimo residents oppose use photo radars?
Chapter 21 19
a) Find the margin of error E that corresponds to a 95% confidence level.
Next, we calculate the margin of error. We have found
that p = 0.51, q = 1 – 0.51 = 0.49, z* = 1.96, and n = 829.
E = 1.96
ˆ
(0.51)(0.49)829
E = 0.03403
ˆ
Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radar for issuing traffic tickets. Using the survey results:
Chapter 21 20
b) Find the 95% confidence interval for the population proportion p.
We substitute our values from part a) to obtain:0.51 – 0.03403 < p < 0.51 + 0.03403,
0.476 < p < 0.544
Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radars for issuing traffic tickets. Using the survey results:
Chapter 21 21
c) Based on the results, can we safely conclude that the majority of adult Nanaimo residents oppose use of the photo radars?
Based on the survey results, we are 95% confident that the limits of 47.6% and 54.4% contain the true percentage of adult Nanaimo residents opposed to the photo radar. The percentage of adult Nanaimo residents who oppose the use of photo radars is likely to be any value between 47.6% and 54.4%. However, a majority requires a percentage greater than 50%, so we cannot safely conclude that the majority is opposed (because the entire confidence interval is not greater than 50%).
Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radars for issuing traffic tickets. Use these survey results.
Chapter 21 22
Example: Page 444 #21.20
A random sample of 1500 adults finds that 60% favour
balancing the federal budget over cutting Taxes. Use
this poll result and Table 21.1 to give 70%, 80%, 90%,
and 99% confidence intervals for the proportion of all
adults who fell this way. What do your results show
about the effect of changing the confidence level?
Chapter 21 23
Sample Size
Suppose we want to collect sample data with the objective of estimating some population proportion. The question is how many sample items must be obtained?
The required sample size depends on the confidence level and the desired margin of error.
Chapter 21 24
Determining Sample Size
(solve for n by algebra)
( )2 ˆp q
Z*n =
ˆE 2
Z*E =
p qˆ ˆn
Always round up.
Chapter 21 25
Sample Size for Estimating Proportion p
When an estimate of p is known: ˆ
ˆ( )2 p qn =
ˆE 2
z*
When no estimate of p is known:
Use P = 0.5 ( )2 0.25n =
E 2 Z*
ˆ^
- the “worst case” value
Chapter 21 26
Example: Suppose a sociologist wants to determine the current percentage of Nanaimo households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points?
a) Use this result from an earlier study: In 1997, 16.9%
of Nanaimo households used e-mail.
b) Assume that we have no prior information
suggesting a possible value of p.
ˆ
Chapter 21 27
a) Use this result from an earlier study: In 1997, 16.9% of Nanaimo households used e-mail.
n = [z* ]2 p q
E2
ˆ ˆ
0.042
= [1.96]2 (0.169)(0.831)
= 337.194= 338 households
To be 95% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 338 households.
Example: Suppose a sociologist wants to determine the current percentage of Nanaimo households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points?
Chapter 21 28
b) Assume that we have no prior information suggesting a possible value of p.
ˆ
Example: Suppose a sociologist wants to determine the current percentage of Nanaimo households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points?
0.042
n = [z* ]2 • 0.25
E2 = (1.96)2 (0.25)
= 600.25 = 601 households
With no prior information, we need a larger sample to achieve the same results with 95% confidence and an error of no more than 4%.
Chapter 21 29
Key Concepts (1st half of Ch. 21)
• Different samples (of the same size) will generally give different results.
• We can specify what these results look like in the aggregate.
• Rule for Sample Proportions• Compute and interpret confidence
intervals for population proportions based on sample proportions
Chapter 21 30
Inference for Population MeansSampling Distribution, Confidence Intervals
• The remainder of this chapter discusses the situation when interest is in making conclusions about population means rather than population proportions– includes the rule for the sampling distribution
of sample means ( )– includes confidence intervals for a mean
s'X
Chapter 21 31
Thought Question 1(from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316)
Suppose the mean weight of all women at a university is 135 pounds, with a standard deviation of 10 pounds.
• Recalling the material from Chapter 13 about bell-shaped curves, in what range would you expect 95% of the women’s weights to fall? 115 to 155 pounds
Chapter 21 32
Thought Question 1 (cont.)
• If you were to randomly sample 10 women at the university, how close do you think their average weight would be to 135 pounds?
• If you randomly sample 1000 women, would you expect the average to be closer to 135 pounds than it would be for the sample of 10 women?
Chapter 21 33
Thought Question 2
A study compared the serum HDL cholesterol levels in people with low-fat diets to people with diets high in fat intake. From the study, a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5...
a. Does this mean that 95% of all people with low-fat diets will have HDL cholesterol levels between 43.5 and 50.5? Explain.
Chapter 21 34
Thought Question 2 (cont.)
… a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5. A 95% confidence interval for the mean HDL cholesterol for the high-fat group extends from 54.5 to 61.5.
b. Based on these results, would you conclude that people with low-fat diets have lower HDL cholesterol levels, on average, than people with high-fat diets?
Chapter 21 35
Thought Question 3
The first confidence interval in Question 2 was based on results from 50 people. The confidence interval spans a range of 7 units. If the results had been based on a much larger sample, would the confidence interval for the mean cholesterol level have been wider, more narrow, or about the same? Explain.
Chapter 21 36
The Central Limit Theorem (CLT)
If simple random samples of size n (n large) are
taken from the same population, the sample
means from the various samples will have
an approximately normal distribution. The
mean of the sample means will be (the
population mean). The standard deviation will
be: ( is the population s.d.)
n
)(X
Chapter 21 37
Chapter 21 38
Conditions for the Rule for Sample Means
• Random sample
• Population of measurements…
– Follows a bell-shaped curve
- or -
– Not bell-shaped, but sample size is “large”
– When is not known, we use the sample standard
deviation to estimate . In this case, we use a t - distribution to find the critical value if the underlying distribution is normal.
Chapter 21 39
μ
σ
n
σn
135 lb ( mean for population and X)
10 lb ( s.d. for population)
10
10 3.16 ( s.d. for X)10
Case Study: Weights Sampling Distribution
(for n = 10)
Chapter 21 40
• Where should 95% of the sample mean weights fall (from samples of size n=10)? mean plus or minus two standard deviations
135 2(3.16) = 128.68 135 + 2(3.16) = 141.32
95% should fall between 128.68 lb & 141.32 lb
Case Study: Weights Answer to Question
(for n = 10)
Chapter 21 41
135
10
25
10 225
lb
lb
n
μ
σ
Case Study: Weights Sampling Distribution
(for n = 25)
Chapter 21 42
• Where should 95% of the sample mean weights fall (from samples of size n=25)? mean plus or minus two standard deviations
135 2(2) = 131 135 + 2(2) = 139
95% should fall between 131 lb & 139 lb
Case Study: Weights Answer to Question
(for n = 25)
Chapter 21 43
Sampling Distribution of Mean (n=25)Simulated Data: Sample Size=25
0
50
100
150
200
120
121.
5000
123.
0000
124.
5000
126.
0000
127.
5000
129.
0000
130.
5000
132.
0000
133.
5000
135.
0000
136.
5000
138.
0000
139.
5000
141.
0000
142.
5000
144.
0000
145.
5000
147.
0000
148.
5000
150.
0000
Sample Means
Chapter 21 44
135
10
100
10 1100
lb
lb
n
μ
σ
Case Study: Weights Sampling Distribution
(for n = 100)
Chapter 21 45
• Where should 95% of the sample mean weights fall (from samples of size n=100)? mean plus or minus two standard deviations
135 2(1) = 133 135 + 2(1) = 137
95% should fall between 133 lb & 137 lb
Case Study: Weights Answer to Question
(for n = 100)
Chapter 21 46
Sampling Distribution of Mean (n=100)Simulated Data: Sample Size=100
0
50
100
150
200
120
121.
5000
123.
0000
124.
5000
126.
0000
127.
5000
129.
0000
130.
5000
132.
0000
133.
5000
135.
0000
136.
5000
138.
0000
139.
5000
141.
0000
142.
5000
144.
0000
145.
5000
147.
0000
148.
5000
150.
0000
Sample Means
Chapter 21 47
Case Study
Hypothetical
Exercise and Pulse Rates
Is the mean resting pulse rate of adult subjects who regularly exercise different from the mean resting pulse rate of those who do not regularly exercise?
Find Confidence Intervals for the means
Chapter 21 48
n mean std. dev. Exercisers 29 66 8.6 Nonexercisers 31 75 9.0
Case Study: Results
Exercise and Pulse RatesA random sample of n1=29 exercisers yielded a sample
mean of =66 beats per minute (bpm) with a sample standard deviation of s1=8.6 bpm. A random sample of
n2=31 nonexercisers yielded a sample mean of =75
bpm with a sample standard deviation of s2=9.0 bpm.
1X
2X
Chapter 21 49
The Rule for Sample Means
We do not know the value of !
We assume that the pulse rates are normally distributed.
We need to use the t – distribution to find the critical value.
For large samples, we can use the normal distribution as an approximation.
Chapter 21 50
Standard Error of the (Sample) Mean
SEM = standard error of the mean
(standard deviation from the sample) = divided by
(square root of the sample size)
=n
s
Chapter 21 51
Case Study: Results
Exercise and Pulse Rates
n mean std. dev. std. err. Exercisers 29 66 8.6 1.6 Nonexer. 31 75 9.0 1.6
Typical deviation of an individual pulse rate(for Exercisers) is s = 8.6
Typical deviation of a mean pulse rate(for Exercisers) is = 1.6
ns
298.6
Chapter 21 52
Case Study: Confidence Intervals
Exercise and Pulse Rates
Exercisers: 66 ± 2(1.6) = 66 3.2 = (62.8, 69.2)
Non-exercisers: 75 ± 2(1.6) = 75 3.2 = (71.8, 78.2)
Do you think the population means are different?
95% C.I. for the population mean: sample mean z* (standard error)
X ns
2
Yes, because the intervals do not overlap
Chapter 21 53
Careful Interpretation of a Confidence Interval
• “We are 95% confident that the mean resting pulse rate for the population of all exercisers is between 62.8 and 69.2 bpm.” (We feel that plausible values for the population of exercisers’ mean resting pulse rate are between 62.8 and 69.2.)
• ** This does not mean that 95% of all people who exercise regularly will have resting pulse rates between 62.8 and 69.2 bpm. **
• Statistically: 95% of all samples of size 29 from the population of exercisers should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the C.I.s should contain the true population mean.
Chapter 21 54
Exercise and Pulse Rates 95% C.I. for the difference in population
means (nonexercisers minus exercisers): (difference in sample means)
2 (SE of the difference) Difference in sample means: = 9 SE of the difference = 2.26 (given) 95% confidence interval: (4.4, 13.6)
– interval does not include zero ( means are different)
EN XX
Case Study: Confidence Intervals
Chapter 21 55
The NAEP test (Example 7, page 439) was given to a sample of 1077 women of ages 21 to 25 years. Their mean quantitativescore was 275 and the standard deviation was 58.
a) Give a 95% confidence interval for the mean score μ in the population of all young women.
Example: Page 445 # 21.26
Chapter 21 56
The NAEP test (Example 7, page 439) was given to a sample of 1077 women of ages 21 to 25 years. Their mean quantitativescore was 275 and the standard deviation was 58.
b) Give the 90% and 99% confidence intervals for μ.
Example: Page 445 # 21.26 Cont.
Chapter 21 57
The NAEP test (Example 7, page 439) was given to a sample of 1077 women of ages 21 to 25 years. Their mean quantitativescore was 275 and the standard deviation was 58.
c) What are the margins of error for 90%, 95%, and 99% confidence? How does increasing the confidence level affect the margin of error of a confidence interval?
Example: Page 445 # 21.26 Cont.
Chapter 21 58
Sample Size for Estimating Mean
(z*) n =
E
2
Where
Z* = critical z score based on the desired confidence level
E = desired margin of error
σ = population standard deviation
Chapter 21 59
Example: Assume that we want to estimate the mean IQ score for the population of statistics professors. How many statistics professors must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 2 IQ points of the population mean? Assume that = 15, as is found in the general population.
Chapter 21 60
Key Concepts (2nd half of Ch. 21)
• Sampling distribution of sample Means
• The Central Limit Theorem
• Compute confidence intervals for means
• Interpret Confidence Intervals for Means