Chapter 211 What Is a Confidence Interval?. Chapter 212 Recall from previous chapters: Parameter –fixed, unknown number that describes the population

Chapter 21 1

Chapter 21

What Is a Confidence Interval?

Chapter 21 2

Recall from previous chapters:• Parameter

– fixed, unknown number that describes the population

• Statistic– known value calculated from a sample– a statistic is used to estimate a parameter

• Sampling Variability– different samples from the same population may yield

different values of the sample statistic– estimates from samples will be closer to the true

values in the population if the samples are larger

Chapter 21 3

Recall from previous chapters:

• Sampling Distribution– tells what values a statistic takes and how often it

takes those values in repeated sampling.

• Example:– sample proportions ( ’s) from repeated sampling

would have a normal distribution with a certain mean and standard deviation.

p̂

• Example:– The amount by which the proportion obtained from the

sample ( ) will differ from the true population proportion (p) rarely exceeds the margin of error.

p̂

Chapter 21 4

Overview

• The two major applications of inferential statistics involve the use of sample data to (1) estimate the value of a population parameter, and (2) test some claim (or hypothesis) about a population.

• We introduce methods for estimating values of these important population parameters: proportions and means.

• We also present methods for determining sample sizes necessary to estimate those parameters.

This chapter presents the beginning of inferential statistics.

Chapter 21 5

Definition

A point estimate is a single value (or point) used to approximate a population parameter.

Chapter 21 6

The sample proportion p is the best point estimate of the population proportion p.

ˆ

Definition

Chapter 21 7

Sampling Distribution of Sample Proportions

If numerous simple random samples of size n are taken from the same population, the sample proportions from the various samples will have an approximately normal distribution. The mean of the sample proportions will be p (the true population proportion). The standard deviation will be:

(1 ), where 1- .

p p pqq p

n n

)ˆ( p

Chapter 21 8

Rule Conditions• For the sampling distribution of the sample proportions to be valid, we must have

Random samples “Large” sample size

Chapter 21 9

Definition

A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI.

Chapter 21 10

Confidence Interval for a Population Proportion

• An interval of values, computed from sample data, that is almost sure to cover the true population proportion.

• “We are ‘highly confident’ that the true population proportion is contained in the calculated interval.”

• Statistically (for a 95% C.I.): in repeated samples, 95% of the calculated confidence intervals should contain the true proportion.

Chapter 21 11

Chapter 21 12

• since we do not know the population proportion p (needed to calculate the standard deviation) we will use the sample proportion in its place.

Formula for a 95% Confidence Interval for the Population

Proportion (Empirical Rule)• sample proportion plus or minus two

standard deviations ofthe sample proportion:

p̂

n)p(p

p̂ 1

2

Chapter 21 13

ˆ ˆ(1 )ˆ 2

p pp

n

standard error (estimated standard deviation of )p̂

Formula for a 95% Confidence Interval for the Population

Proportion (Empirical Rule)

Chapter 21 14

Margin of Error

nn

..

n

p̂p̂

1)501(50

)1(

2

2

(plus or minus part of C.I.)

Chapter 21 15

Formula for a C-level (%) Confidence Interval for the

Population Proportion

ˆ ˆ(1 )ˆ * p pp

nz

where z* is the critical value of the standard normal distribution for confidence level C

Chapter 21 16

Chapter 21 17

Table 21.1: Common Values of z*

Confidence LevelC

Critical Valuez*

50% 0.67

60% 0.84

68% 1

70% 1.04

80% 1.28

90% 1.64

95% 1.96 (or 2)

99% 2.58

99.7% 3

99.9% 3.29

Chapter 21 18

Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of photo radars for issuing traffic tickets. Using the survey results:

a) Find the margin of error E that corresponds to a 95% confidence level.

b) Find the 95% confidence interval estimate of the population proportion p.

c) Based on the results, can we safely conclude that the majority of adult Nanaimo residents oppose use photo radars?

Chapter 21 19

a) Find the margin of error E that corresponds to a 95% confidence level.

Next, we calculate the margin of error. We have found

that p = 0.51, q = 1 – 0.51 = 0.49, z* = 1.96, and n = 829.

E = 1.96

ˆ

(0.51)(0.49)829

E = 0.03403

ˆ

Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radar for issuing traffic tickets. Using the survey results:

Chapter 21 20

b) Find the 95% confidence interval for the population proportion p.

We substitute our values from part a) to obtain:0.51 – 0.03403 < p < 0.51 + 0.03403,

0.476 < p < 0.544

Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radars for issuing traffic tickets. Using the survey results:

Chapter 21 21

c) Based on the results, can we safely conclude that the majority of adult Nanaimo residents oppose use of the photo radars?

Based on the survey results, we are 95% confident that the limits of 47.6% and 54.4% contain the true percentage of adult Nanaimo residents opposed to the photo radar. The percentage of adult Nanaimo residents who oppose the use of photo radars is likely to be any value between 47.6% and 54.4%. However, a majority requires a percentage greater than 50%, so we cannot safely conclude that the majority is opposed (because the entire confidence interval is not greater than 50%).

Example: 829 adult Nanaimo residents were surveyed, and 51% of them are opposed to the use of the photo radars for issuing traffic tickets. Use these survey results.

Chapter 21 22

Example: Page 444 #21.20

A random sample of 1500 adults finds that 60% favour

balancing the federal budget over cutting Taxes. Use

this poll result and Table 21.1 to give 70%, 80%, 90%,

and 99% confidence intervals for the proportion of all

adults who fell this way. What do your results show

about the effect of changing the confidence level?

Chapter 21 23

Sample Size

Suppose we want to collect sample data with the objective of estimating some population proportion. The question is how many sample items must be obtained?

The required sample size depends on the confidence level and the desired margin of error.

Chapter 21 24

Determining Sample Size

(solve for n by algebra)

( )2 ˆp q

Z*n =

ˆE 2

Z*E =

p qˆ ˆn

Always round up.

Chapter 21 25

Sample Size for Estimating Proportion p

When an estimate of p is known: ˆ

ˆ( )2 p qn =

ˆE 2

z*

When no estimate of p is known:

Use P = 0.5 ( )2 0.25n =

E 2 Z*

ˆ^

- the “worst case” value

Chapter 21 26

Example: Suppose a sociologist wants to determine the current percentage of Nanaimo households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points?

a) Use this result from an earlier study: In 1997, 16.9%

of Nanaimo households used e-mail.

b) Assume that we have no prior information

suggesting a possible value of p.

ˆ

Chapter 21 27

a) Use this result from an earlier study: In 1997, 16.9% of Nanaimo households used e-mail.

n = [z* ]2 p q

E2

ˆ ˆ

0.042

= [1.96]2 (0.169)(0.831)

= 337.194= 338 households

To be 95% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 338 households.


Chapter 21 28

b) Assume that we have no prior information suggesting a possible value of p.

ˆ


0.042

n = [z* ]2 • 0.25

E2 = (1.96)2 (0.25)

= 600.25 = 601 households

With no prior information, we need a larger sample to achieve the same results with 95% confidence and an error of no more than 4%.

Chapter 21 29

Key Concepts (1st half of Ch. 21)

• Different samples (of the same size) will generally give different results.

• We can specify what these results look like in the aggregate.

• Rule for Sample Proportions• Compute and interpret confidence

intervals for population proportions based on sample proportions

Chapter 21 30

Inference for Population MeansSampling Distribution, Confidence Intervals

• The remainder of this chapter discusses the situation when interest is in making conclusions about population means rather than population proportions– includes the rule for the sampling distribution

of sample means ( )– includes confidence intervals for a mean

s'X

Chapter 21 31

Thought Question 1(from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316)

Suppose the mean weight of all women at a university is 135 pounds, with a standard deviation of 10 pounds.

• Recalling the material from Chapter 13 about bell-shaped curves, in what range would you expect 95% of the women’s weights to fall? 115 to 155 pounds

Chapter 21 32

Thought Question 1 (cont.)

• If you were to randomly sample 10 women at the university, how close do you think their average weight would be to 135 pounds?

• If you randomly sample 1000 women, would you expect the average to be closer to 135 pounds than it would be for the sample of 10 women?

Chapter 21 33

Thought Question 2

A study compared the serum HDL cholesterol levels in people with low-fat diets to people with diets high in fat intake. From the study, a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5...

a. Does this mean that 95% of all people with low-fat diets will have HDL cholesterol levels between 43.5 and 50.5? Explain.

Chapter 21 34

Thought Question 2 (cont.)

… a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5. A 95% confidence interval for the mean HDL cholesterol for the high-fat group extends from 54.5 to 61.5.

b. Based on these results, would you conclude that people with low-fat diets have lower HDL cholesterol levels, on average, than people with high-fat diets?

Chapter 21 35

Thought Question 3

The first confidence interval in Question 2 was based on results from 50 people. The confidence interval spans a range of 7 units. If the results had been based on a much larger sample, would the confidence interval for the mean cholesterol level have been wider, more narrow, or about the same? Explain.

Chapter 21 36

The Central Limit Theorem (CLT)

If simple random samples of size n (n large) are

taken from the same population, the sample

means from the various samples will have

an approximately normal distribution. The

mean of the sample means will be (the

population mean). The standard deviation will

be: ( is the population s.d.)

n

)(X

Chapter 21 37

Chapter 21 38

Conditions for the Rule for Sample Means

• Random sample

• Population of measurements…

– Follows a bell-shaped curve

- or -

– Not bell-shaped, but sample size is “large”

– When is not known, we use the sample standard

deviation to estimate . In this case, we use a t - distribution to find the critical value if the underlying distribution is normal.

Chapter 21 39

μ

σ

n

σn

135 lb ( mean for population and X)

10 lb ( s.d. for population)

10

10 3.16 ( s.d. for X)10

Case Study: Weights Sampling Distribution

(for n = 10)

Chapter 21 40

• Where should 95% of the sample mean weights fall (from samples of size n=10)? mean plus or minus two standard deviations

135 2(3.16) = 128.68 135 + 2(3.16) = 141.32

95% should fall between 128.68 lb & 141.32 lb

Case Study: Weights Answer to Question

(for n = 10)

Chapter 21 41

135

10

25

10 225

lb

lb

n

μ

σ


(for n = 25)

Chapter 21 42


135 2(2) = 131 135 + 2(2) = 139

95% should fall between 131 lb & 139 lb


(for n = 25)

Chapter 21 43

Sampling Distribution of Mean (n=25)Simulated Data: Sample Size=25

0

50

100

150

200

120

121.

5000

123.

0000

124.

5000

126.

0000

127.

5000

129.

0000

130.

5000

132.

0000

133.

5000

135.

0000

136.

5000

138.

0000

139.

5000

141.

0000

142.

5000

144.

0000

145.

5000

147.

0000

148.

5000

150.

0000

Sample Means

Chapter 21 44

135

10

100

10 1100

lb

lb

n

μ

σ


(for n = 100)

Chapter 21 45


135 2(1) = 133 135 + 2(1) = 137

95% should fall between 133 lb & 137 lb


(for n = 100)

Chapter 21 46

Sampling Distribution of Mean (n=100)Simulated Data: Sample Size=100

0

50

100

150

200

120

121.

5000

123.

0000

124.

5000

126.

0000

127.

5000

129.

0000

130.

5000

132.

0000

133.

5000

135.

0000

136.

5000

138.

0000

139.

5000

141.

0000

142.

5000

144.

0000

145.

5000

147.

0000

148.

5000

150.

0000

Sample Means

Chapter 21 47

Case Study

Hypothetical

Exercise and Pulse Rates

Is the mean resting pulse rate of adult subjects who regularly exercise different from the mean resting pulse rate of those who do not regularly exercise?

Find Confidence Intervals for the means

Chapter 21 48

n mean std. dev. Exercisers 29 66 8.6 Nonexercisers 31 75 9.0

Case Study: Results

Exercise and Pulse RatesA random sample of n1=29 exercisers yielded a sample

mean of =66 beats per minute (bpm) with a sample standard deviation of s1=8.6 bpm. A random sample of

n2=31 nonexercisers yielded a sample mean of =75

bpm with a sample standard deviation of s2=9.0 bpm.

1X

2X

Chapter 21 49

The Rule for Sample Means

We do not know the value of !

We assume that the pulse rates are normally distributed.

We need to use the t – distribution to find the critical value.

For large samples, we can use the normal distribution as an approximation.

Chapter 21 50

Standard Error of the (Sample) Mean

SEM = standard error of the mean

(standard deviation from the sample) = divided by

(square root of the sample size)

=n

s

Chapter 21 51

Case Study: Results


n mean std. dev. std. err. Exercisers 29 66 8.6 1.6 Nonexer. 31 75 9.0 1.6

Typical deviation of an individual pulse rate(for Exercisers) is s = 8.6

Typical deviation of a mean pulse rate(for Exercisers) is = 1.6

ns

298.6

Chapter 21 52

Case Study: Confidence Intervals


Exercisers: 66 ± 2(1.6) = 66 3.2 = (62.8, 69.2)

Non-exercisers: 75 ± 2(1.6) = 75 3.2 = (71.8, 78.2)

Do you think the population means are different?

95% C.I. for the population mean: sample mean z* (standard error)

X ns

2

Yes, because the intervals do not overlap

Chapter 21 53

Careful Interpretation of a Confidence Interval

• “We are 95% confident that the mean resting pulse rate for the population of all exercisers is between 62.8 and 69.2 bpm.” (We feel that plausible values for the population of exercisers’ mean resting pulse rate are between 62.8 and 69.2.)

• ** This does not mean that 95% of all people who exercise regularly will have resting pulse rates between 62.8 and 69.2 bpm. **

• Statistically: 95% of all samples of size 29 from the population of exercisers should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the C.I.s should contain the true population mean.

Chapter 21 54

Exercise and Pulse Rates 95% C.I. for the difference in population

means (nonexercisers minus exercisers): (difference in sample means)

2 (SE of the difference) Difference in sample means: = 9 SE of the difference = 2.26 (given) 95% confidence interval: (4.4, 13.6)

– interval does not include zero ( means are different)

EN XX

Case Study: Confidence Intervals

Chapter 21 55

The NAEP test (Example 7, page 439) was given to a sample of 1077 women of ages 21 to 25 years. Their mean quantitativescore was 275 and the standard deviation was 58.

a) Give a 95% confidence interval for the mean score μ in the population of all young women.

Example: Page 445 # 21.26

Chapter 21 56


b) Give the 90% and 99% confidence intervals for μ.

Example: Page 445 # 21.26 Cont.

Chapter 21 57


c) What are the margins of error for 90%, 95%, and 99% confidence? How does increasing the confidence level affect the margin of error of a confidence interval?

Example: Page 445 # 21.26 Cont.

Chapter 21 58

Sample Size for Estimating Mean

(z*) n =

E

2

Where

Z* = critical z score based on the desired confidence level

E = desired margin of error

σ = population standard deviation

Chapter 21 59

Example: Assume that we want to estimate the mean IQ score for the population of statistics professors. How many statistics professors must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 2 IQ points of the population mean? Assume that = 15, as is found in the general population.

Chapter 21 60

Key Concepts (2nd half of Ch. 21)

• Sampling distribution of sample Means

• The Central Limit Theorem

• Compute confidence intervals for means

• Interpret Confidence Intervals for Means

Documents

Chapter 211 What Is a Confidence Interval?. Chapter 212 Recall from previous chapters: Parameter –fixed, unknown number that describes the population