29
Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Embed Size (px)

Citation preview

Page 1: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Page 2: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 2

Solution: E

Page 3: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Chapter 9Sampling Distribution Models

Page 4: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Sampling Distributions Parameter – # that describes the population

Fixed unknown number Statistic - # that describes a sample

(can change from sample to sample) Mean of a population - µ Mean of a sample – x bar

Estimate of µ Sampling Variability – the value of a statistic varies

in repeated random sampling Sample proportion – p hat – estimates the unknown

parameter - X/NSlide 18 - 4

Page 5: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Sampling Distribution

The distribution of values taken by the statistic in all possible samples of the same size from the same population

Slide 18 - 5

Page 6: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Describing Sample Distributions

Slide 18 - 6

ShapeCenterSpreadOutliers

Page 7: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

New Scaling

Slide 18 - 7

Page 8: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Variability of a Statistic – spread of the sample distribution*Determined by the sample design and size of sample

Slide 18 - 8

Unbiased Statistic – used to describe stats if the sample mean is equal to the true value of the parameter being estimated

Try problem 9.9 on page 500

Page 9: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 9

Modeling the Distribution of Sample Proportions To use a Normal model, we need to specify its

mean and standard deviation. We’ll put µ, the mean of the Normal, at p.

When working with proportions, knowing the mean automatically gives us the standard deviation as well—the standard deviation we will use is

So, the distribution of the sample proportions is modeled with a probability model that is

,pq

N pn

pq

n

Page 10: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 10

Assumptions and Conditions

The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger.

There are two assumptions in the case of the model for the distribution of sample proportions:

1. The Independence Assumption: The sampled values must be independent of each other.

2. The Sample Size Assumption: The sample size, n, must be large enough. When the population is 10 times as large as the sample

Page 11: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 11

Assumptions and Conditions (cont.)

The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition, the 10% Condition and the Success/Failure Condition.

Page 12: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 12

Assumptions and Conditions (cont.)

1. Randomization Condition: The sample should be a simple random sample of the population.

2. 10% Condition: the sample size, n, must be no larger than 10% of the population.

3. Success/Failure Condition: The sample size has to be big enough so that both np (number of successes) and nq (number of failures) are at least 10.

…So, we need a large enough sample that is not too large.

Page 13: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 13

The Sampling Distribution Model for a Proportion

Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of is modeled by a Normal model with Mean: Standard deviation:

SD( p̂)pq

n

( p̂)p

Page 14: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 14

Page 15: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 15

Example: The center for Disease Control and Prevention report that 22% of 18-year-old women in the US have a body mass index (BMI) of 25 or more. As part of a routine health check at a large college, the physical ed. department usually requires students to come in to be measured and weighed. This year, the department decided to try out a self-report system. It asked 200 randomly selected female students to report their heights and weights. Only 31 of these students had BMIs greater than 25. Is this proportion of High –BMI students unusually small?

Page 16: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 16

What About Quantitative Data?

Proportions summarize categorical variables. The Normal sampling distribution model can be

used with quantitative data, too!

Page 17: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 17

Means – The “Average” of One Die

Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Page 18: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 18

Means – Averaging More Dice

Looking at the average of two dice after a simulation of 10,000 tosses:

The average of three dice after a simulation of 10,000 tosses looks like:

Page 19: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 19

Means – Averaging Still More Dice

The average of 5 dice after a simulation of 10,000 tosses looks like:

The average of 20 dice after a simulation of 10,000 tosses looks like:

Page 20: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 20

Means – What the Simulations Show

As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. So, we see the shape continuing to tighten

around 3.5 And, it probably does not shock you that the

sampling distribution of a mean becomes Normal.

Page 21: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 21

The Fundamental Theorem of Statistics The sampling distribution of any mean becomes more

nearly Normal as the sample size grows. All we need is for the observations to be independent

and collected with randomization. The Fundamental Theorem of Statistics is called the

Central Limit Theorem (CLT). Not only does the histogram of the sample means get

closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution.

The CLT works better (and faster) the closer the population model is to a Normal itself. It also works better for larger samples.

Page 22: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 22

Assumptions and Conditions (cont.) We can’t check these directly, but we can think about

whether the Independence Assumption is plausible. We can also check some related conditions:

Randomization Condition: The data values must be sampled randomly.

10% Condition: When the sample is drawn without replacement, the sample size, n, should be no more than 10% of the population.

Large Enough Sample Condition: The CLT doesn’t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

Page 23: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc.

Central Limit Theorem

For any population with mean µ and standard deviation σ When n is large:

The sample mean is close to N(µ, σ/ √n)

Slide 18 - 23

Page 24: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 24

Modeling the Distribution of Sample Proportions To use a Normal model, we need to specify its

mean and standard deviation. We’ll put µ, the mean of the Normal, at p.

When working with proportions, knowing the mean automatically gives us the standard deviation as well—the standard deviation we will use is

So, the distribution of the sample proportions is modeled with a probability model that is

,pq

N pn

pq

n

Page 25: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 25

Page 26: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 26

Example: A college ph. ed. Department asked a random sample of 200 female students to self-report their heights and weights, but the percentage of students with body mass indexes over 25 seemed suspiciously low. One possible explanation may be that the respondents “shaded” their weights down a bit. The CDC reports that the mean weight of 18 year old women is 143.74 lbs, with a standard deviation of 51.54 lbs, but these 200 randomly selected women reported a mean weight of only 140 lbs. Based on the CLT and the 68-95-99.7 Rule, does the mean weight in this sample seem exceptionably low, or might this just be random sample-to-sample variation?

Page 27: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 27

Example: You want to poll a random sample of 100 students on campus to see if they are in favor of the proposed location for the new student center. Of course, you’ll get just one number, your sample proportion, p-hat.

a. But if you imagined all the possible samples of 100 students you could draw and imagined the histogram of all the sample proportions from these samples, what shape would it have?

b. Where would the center of that histogram be?c. If you think that about half the students are in

favor of the plan, what would the standard deviation of the sample proportions be?

Page 28: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 28

Example: Human gestation times have a mean of about 266 days, with a SD = 16 days.

a. If we record the gestation times of a sample of 100 women, do we know that a histogram of the times will be well modeled by a Normal model?

b. Suppose we look at the average gestation times for a sample of 100 women. If we imagined all the possible random samples of 100 women we could take and looked at the histogram of all the sample means, what shape would it have?

c. Where would the center of the histogram be?d. What would be the standard deviation of that

histogram?

Page 29: Copyright © 2010 Pearson Education, Inc. Slide 18 - 1

Copyright © 2010 Pearson Education, Inc. Slide 18 - 29

About Variation

The standard deviation of the sampling distribution declines only with the square root of the sample size (the denominator contains the square root of n).

Therefore, the variability decreases as the sample size increases.

While we’d always like a larger sample, the square root limits how much we can make a sample tell about the population. (This is an example of the Law of Diminishing Returns.)