37
Lecture 5 • 12.2 Inference for a population mean when the stdev is unknown; one more example • 12.3 Testing a population variance • 12.4 Testing a population proportion

Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Lecture 5

• 12.2 Inference for a population mean when the stdev is unknown; one more example

• 12.3 Testing a population variance

• 12.4 Testing a population proportion

Page 2: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Announcements

• Answer to Problem 12.77: Sample size of 752.

• Extra office hour this week: Wednesday, 9-10

• Homework – due Thursday, see web page for correction on last problem

• Type II error calculation from last lecture – solution on web page

Page 3: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Hypothesis Testing – Basic Steps

1. Set up alternative and null hypotheses2. Choose appropriate test statistic and values of

test statistic that will be considered evidence in favor of H1, e.g., for testing , reject for large values of z-score

3. Find critical values and compare the observed test statistic to critical value (rejection region method) or find p-value (p-value method)

4. Make substantive conclusions.

01 : H

Page 4: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Example 12.2– An investor is trying to estimate the return on

investment in companies that won quality awards last year.

– A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them.

– Construct a 95% confidence interval for the mean return.

– Is there evidence that the returns are greater than 10%?

Estimating when is unknown

Page 5: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Solution:– The problem objective is to describe the

population of annual returns from buying shares of quality award-winners.

– Given: x-bar=15.02, s=8.31, n=83– Data: Xm12-02

– There is evidence that the returns are >10% at the 2.5% significance level. (Why?)

t.025,82 t.025,80

Estimating when is unknown

85.16,19.1383

31.8990.102.151,2

n

stx n

Page 6: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

12.3 Inference About a Population Variance

• Sometimes we are interested in making inference about the variability of processes.

• Examples:– The consistency of a production process for

quality control purposes.– Investors use variance as a measure of risk.

• To draw inference about variability, the parameter of interest is 2.

Page 7: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• The sample variance s2 is an unbiased, consistent and

efficient point estimator for 2.

• The statistic has a distribution called Chi-

squared, if the population is normally distributed. 2

2s)1n(

1..)1(

2

22

nfd

sn

1..

)1(2

22

nfd

sn

d.f. = 5

d.f. = 10

12.3 Inference About a Population Variance

Page 8: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Confidence Interval for Population Variance

• From the following probability statement

P(21-/2 < 2 < 2

/2) = 1-

we have (by substituting 2 = [(n - 1)s2]/2.)

22/1

22

22/

2 s)1n(s)1n(

22/1

22

22/

2 s)1n(s)1n(

Page 9: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Example 12.3 (operation management application)– A container-filling machine is believed to fill 1 liter

containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter).

– To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03). s2=0.8659.

– Do these data support the belief that the variance is less than 1cc at 5% significance level?

– Find a 99% confidence interval for the variance of fills.

Testing the Population Variance

Page 10: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population
Page 11: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

JMP implementation of two-sided test

998.0

998.5

999.0

999.5

1000.0

1000.5

1001.0

1001.5

Hypothesized Value

Actual Estimate

df

1

0.93054

24

Test Statistic

Prob > |ChiSq|

Prob < ChiSq

Prob > ChiSq

20.7816

0.6969

0.3484

0.6516

ChiSquare

Test Standard Deviation=value

Fills

Distributions

Page 12: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

12.4 Inference About a Population Proportion

• When the population consists of nominal data (e.g., does the customer prefer Pepsi or Coke), the only inference we can make is about the proportion of occurrence of a certain value.

• When there are two categories (success and failure), the parameter p describes the proportion of successes in the population. The probability of obtaining X successes in a random sample of size n from a large population can be calculated using the binomial distribution.

Page 13: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

.sizesamplen.successesofnumberthex

wherenx

.sizesamplen.successesofnumberthex

wherenx

• Statistic and sampling distribution– the statistic used when making inference about p is:

– Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, with

= p and 2 = p(1 - p)/n.p̂

12.4 Inference About a Population Proportion

Page 14: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Testing and Estimating a Proportion

• Test statistic for p

• Interval estimator for p (1- confidence level)

5)1(5

/)1(

ˆ

pnandnpwhere

npp

ppZ

5)1(5

/)1(

ˆ

pnandnpwhere

npp

ppZ

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

Page 15: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Why are Proportions Different?

• The true variance of a proportion is determined by the true proportion:

• The CI of a proportion is NOT derived from the z-test:

• The denominator of the z-statistic is NOT estimated, but the width of the CI is estimated.

• => “CI test” and z-test can differ sometimes.

nppp /)1(2

Page 16: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Example 12.5 (Predicting the winner in election day)– Voters are asked by a certain network to participate

in an exit poll in order to predict the winner on election day.

– The exit poll consists of 765 voters. 407 say that they voted for the Republican candidate.

– The polls close at 8:00. Should the network announce at 8:01 that the Republican candidate will win?

Testing the Proportion

Page 17: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population
Page 18: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Selecting the Sample Size to Estimate the Proportion

• Recall: The confidence interval for the proportion is

• Thus, to estimate the proportion to within W, we can write

• The required sample size is:

nppzp /)ˆ1(ˆˆ 2/

nppzW /)ˆ1(ˆ2/

2

2/ )ˆ1(ˆ

W

ppzn

2

2/ )ˆ1(ˆ

W

ppzn

Page 19: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Example– Suppose we want to estimate the proportion of

customers who prefer our company’s brand to within .03 with 95% confidence.

– Find the sample size needed– Solution

W = .03; 1 - = .95,

therefore /2 = .025,

so z.025 = 1.96

2

03.)p̂1(p̂96.1

n

Since the sample has not yet been taken, the sample proportionis still unknown.

We proceed using either one of the following two methods:

Sample Size to Estimate the Proportion

Page 20: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Method 1:– There is no knowledge about the value of

• Let . This results in the largest possible n needed for a 1- confidence interval of the form .

• If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below.

• Method 2:– There is some idea about the value of

• Use the value of to calculate the sample size

5.ˆ p03.ˆ p

068,103.

)5.1(5.96.1n

2

683

03.)2.1(2.96.1

n

2

Sample Size to Estimate the Proportion

p̂p̂

Page 21: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Chapter 12: IntroductionChapter 12: Introduction

• Variety of techniques are presented whose objective is to compare two populations.

• We are interested in:– The difference between two means.– The ratio of two variances.– The difference between two proportions.

Page 22: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Inference about the Difference between Two Means

• Example 13.1– Do people who eat high-fiber cereal for breakfast consume,

on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as an eater or non-eater of high fiber cereal.

– For each person the number of calories consumed at lunch was recorded. There were 43 high-fiber eaters who had a mean of 604.02 calories for lunch with s=64.05. There were 107 non-eaters who had a mean of 633.23 calories for lunch with s=103.29.

Page 23: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Two random samples are drawn from the two populations of interest.

• Because we compare two population means, we use the statistic .

13.2 Inference about the Difference between Two Means: Independent Samples

21 xx

Page 24: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

21 xx 1. is normally distributed if the (original) population distributions are normal .

2. is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30).

3. The expected value of is 1 - 2

4. The variance of is 12/n1 + 2

2/n2

The Sampling Distribution ofThe Sampling Distribution of

21 xx

21 xx

21 xx

21 xx

Page 25: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• If the sampling distribution of is normal or approximately normal we can write:

• Z can be used to build a test statistic or a confidence interval for 1 - 2

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

21xx

Making an inference about –

Page 26: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Practically, the “Z” statistic is hardly used, because the population variances are not known.

? ?

• Instead, we construct a t statistic using the sample “variances” (s1

2 and s22) to estimate

Making an inference about –

21

21

ˆˆ

)()(

nn

xxt

21

21

ˆˆ

)()(

nn

xxt

22

21 ,

Page 27: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

• Two cases are considered when producing the t-statistic:– The two unknown population variances are

equal.– The two unknown population variances are

not equal.

Making an inference about –

Page 28: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Inference about Inference about ––: Equal : Equal

variancesvariances

2

)1()1(

21

222

2112

nn

snsnsp

2

)1()1(

21

222

2112

nn

snsnsp

Example 1: s12 =4103.02; s2

2 = 10669.77; n1 = 43; n2 = 107.

23.8806210743

)77.10669)(1107()02.4103)(143(2

ps

• Calculate the pooled variance estimate by:

n2 = 107n1 = 43

21S

22S

The pooledvariance estimator

Page 29: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Inference about Inference about ––: Equal : Equal

variancesvariances• Construct the t-statistic as follows:

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

• Perform a hypothesis test H0: = 0 H1: > 0

or < 0 or 0

Build a confidence interval

level. confidence the is where

)n1

n1

(st)xx(21

2

p21

Page 30: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Example 13.1

• Assuming that the variances are equal, test the scientist’s claim that people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast at the 5% significance level.

• There were 43 high-fiber eaters who had a mean of 604.02 calories for lunch with s=64.05. There were 107 non-eaters who had a mean of 633.23 calories for lunch with s=103.29.

Page 31: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population
Page 32: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

1

)(

1

)(

)/(d.f.

)(

)()(

2

22

22

1

21

21

22

221

21

2

22

1

21

21

n

ns

n

ns

nsns

n

s

n

s

xxt

1

)(

1

)(

)/(d.f.

)(

)()(

2

22

22

1

21

21

22

221

21

2

22

1

21

21

n

ns

n

ns

nsns

n

s

n

s

xxt

Inference about –: Unequal variances

Page 33: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Inference about –: Unequal variances

Conduct a hypothesis test as needed, or, build a confidence interval

level confidence the is where

n

s

n

s2txx

intervalConfidence

)2

22

1

21()21(

Page 34: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Which case to use:Equal variance or unequal

variance?• Whenever there is insufficient evidence that

the variances are unequal, it is preferable to perform the equal variances t-test.

• This is so, because for any two given samples

The number of degrees of freedom for the equal variances case

The number of degrees of freedom for the unequal variances case

Page 35: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Example 13.1 continued

• Test the scientist’s claim about high-fiber cereal eaters consuming less calories than non-high fiber cereal eaters assuming unequal variances at the 5% significance level.

• There were 43 high-fiber eaters who had a mean of 604.02 calories for lunch with s=64.05. There were 107 non-eaters who had a mean of 633.23 calories for lunch with s=103.29.

Page 36: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population
Page 37: Lecture 5 12.2 Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population

Practice Problems

• 12.58,12.77,12.98,13.34