67
9.1 – Sampling Distributions

9.1 – Sampling Distributions

Embed Size (px)

DESCRIPTION

9.1 – Sampling Distributions. Many investigations and research projects try to draw conclusions about how the values of some variable x are distributed in a population. Often, attention is focused on a single characteristic of that distribution. Examples include:. - PowerPoint PPT Presentation

Citation preview

Page 1: 9.1 – Sampling Distributions

9.1 – Sampling Distributions

Page 2: 9.1 – Sampling Distributions

Many investigations and research projects try to draw conclusions about how the values of some variable x are distributed in a population. Often, attention is focused on a single characteristic of that distribution. Examples include:

1. x = fat content (in grams) of a quarter-pound hamburger, with interest centered on the mean fat content μ of all such hamburgers

Page 3: 9.1 – Sampling Distributions

2. x = fuel efficiency (in miles per gallon) for a 2003 Honda Accord, with interest focused on the variability in fuel efficiency as described by σ, the standard deviation for the fuel efficiency population distribution

3. x = time to first recurrence of skin cancer for a patient treated using a particular therapy, with attention focused on p, the proportion of such individuals whose first recurrence is within 5 years of the treatment.

Page 4: 9.1 – Sampling Distributions

Parameter:

Statistic:

A number that describes the population. This number is typically unknown.

A number that describes the sample. We use this number to estimate the parameter.

Page 5: 9.1 – Sampling Distributions

Population Sample

Mean

Standard Deviation

Proportion

Standard deviation of the proportion

Parameter Statistic

x

p p̂

p

x

Page 6: 9.1 – Sampling Distributions

Sampling Distribution:

The distribution of all values taken by the statistic in all possible samples of the same size from the same population

Ex: Take 100 samples of size n = 20.

Page 7: 9.1 – Sampling Distributions

Sampling Variability:

The variation between each groups of samples of the same size.

If I compare many different samples and the statistic is very similar in each one, then the sampling variability is low. If I compare many different samples and the statistic is very different in each one, then the sampling variability is high.

Page 8: 9.1 – Sampling Distributions
Page 9: 9.1 – Sampling Distributions
Page 10: 9.1 – Sampling Distributions
Page 11: 9.1 – Sampling Distributions
Page 12: 9.1 – Sampling Distributions
Page 13: 9.1 – Sampling Distributions

Unbiased:

When the statistic is equal to the true value of the parameter

Unbiased Estimator:

The unbiased statistic

Ex: 20 and = 20x

Page 14: 9.1 – Sampling Distributions

How sampling works:

1. Take a large number of samples from the same population.

2. Calculate the sample mean or sample proportion for each sample

3. Make a histogram of the values of the statistics

4. Examine the distribution

Page 15: 9.1 – Sampling Distributions

Facts about Samples:

If the population mean ( ) and the population standard

deviation ( ) are unknown, we can use x to estimate

and use to estimate . These estimates may or may

not be reliable.x

• If I chose a different sample, it would still represent the same population. A different sample almost always produces different statistics.

• A statistic can be unbiased and still have high variablility. To avoid this, increase the size of the sample. Larger samples give smaller spread.

Page 16: 9.1 – Sampling Distributions

Example #1: Classify each underlined number as a parameter or statistic. Give the appropriate notation for each.

a. Forty-two percent of today’s 15-year-old girls will get pregnant in their teens.

p =

Parameter

0.42

Page 17: 9.1 – Sampling Distributions

b. The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 and the standard deviation is 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure for these executives is 126.07.

Example #1: Classify each underlined number as a parameter or statistic. Give the appropriate notation for each.

=

128 and 15 are parameters 126.07 is a statistic

128

= 15

x 126.07

Page 18: 9.1 – Sampling Distributions

Example #2: Suppose you have a population in which 60% of the people approve of gambling.

a. Is 60% a parameter or a statistic? Give appropriate notation for this value.

p = 0.60Parameter,

Page 19: 9.1 – Sampling Distributions

You want to take many samples of size 10 from this population to observe how the sample proportion who approve of gambling vary in repeated samples.b. Describe the design of a simulation using the partial random digits table below to estimate the sample proportion who approve of gambling. Label how you will conduct the simulation. Then carry out five trials of your simulation. What is the average of the samples? How close is it to the 60%?

0 – 5 approve of gambling 6-9 don’tAssign:

After choosing 10 Stop:

# of people that approve of gamblingCount:

Ok to have repeat numbers, represent new person

Repeaters:

Page 20: 9.1 – Sampling Distributions

3 6 0 0 9 1 9 3 6 5 1 5 4 1 23 9 6 3 8 8 5 4 5 3 4 6 8 1 6

3 8 4 4 8 4 8 7 8 9 1 8 3 3 82 4 6 9 7 3 9 3 6 4 4 2 0 0 6

8 2 7 3 9 5 7 8 9 0 2 0 8 0 74 7 5 1 1 8 1 6 7 6 5 5 3 0 0

6 0 9 4 0 7 2 0 2 4 1 7 8 6 8 2 4 9 4 3 6 1 7 9 0 9 0 6 5 6

6 8 4 1 7 3 5 0 1 3 1 5 5 2 97 2 7 6 5 8 5 0 8 9 5 7 0 6 7

A D A A D A D A D A

Page 21: 9.1 – Sampling Distributions

1: 6/10 = 60%

2: 4/10 = 40%

3: 4/10 = 40%

4: 7/10 = 70%

5: 7/10 = 70%

.6 .4 .4 .7 .7

5

ˆ 0.56p

Page 22: 9.1 – Sampling Distributions

c. The sampling distribution of is the distribution of from all possible SRSs of size 10 from this population. What would be the mean of this distribution if this process was repeated 100 times?

p = 0.60

p̂p̂

Page 23: 9.1 – Sampling Distributions

d. If you used samples of size 20 instead of size 10, which sampling distribution would give you a better estimate of the true proportion of people who approve of gambling? Explain your answer.

20, larger the sample size means less variability

Page 24: 9.1 – Sampling Distributions

e. Make a histogram of the sample distribution. Describe the graph.

C: 60%

U: none

S: Approx. symmetrical

S: Range = 10-1 = 9

Page 25: 9.1 – Sampling Distributions

9.2 – Sample Proportions

Page 26: 9.1 – Sampling Distributions

count of "successes in sampleˆ

size of sample

Xp

n

Using proportions:

Page 27: 9.1 – Sampling Distributions

Remember Ch8?

x np (1 )x np p

Use these when you know “p”

What if you only know the proportion of a sample?

Page 28: 9.1 – Sampling Distributions

Sampling Distribution of a Sample Proportion:

ˆ ˆp p p

ˆ(1 )

pp p

n

Page 29: 9.1 – Sampling Distributions

Rule of Thumb #1:

You can only use if the population is 10X the sample size . A census should be impractical!

ˆ(1 )

pp p

n

N 10nwhen

Page 30: 9.1 – Sampling Distributions

Rule of Thumb #2:

Only use the Normal approximation of the sampling distribution of when: p̂

10np and (1 ) 10n p

Page 31: 9.1 – Sampling Distributions

Conclusion:

If p is the population proportion then,

, (1 )N np np p

If is the sample proportion then,

(1 ),p p

N pn

ONLY if 10np and (1 ) 10n p

Page 32: 9.1 – Sampling Distributions
Page 33: 9.1 – Sampling Distributions

So, to calculate a Z-score for this!

ˆ

(1 )

p pZ

p pn

statistic parameterStandardized test statistic:

standard deviation of statistic

Or ˆ

ˆ

p

p

pZ

Page 34: 9.1 – Sampling Distributions

Example #1Suppose you are going to roll a fair six-sided die 60 times and record , the proportion of times that a 1 or a 2 is showing.

a. Where should the distribution of the 60 -values be centered?

2

6p

1

3

Page 35: 9.1 – Sampling Distributions

b. What is the standard deviation of the sampling distribution of , the proportion of all rolls of the die that show a 1 or a 2 out of the 60 rolls ?

.33(1 .33)

60

ˆ

(1 )p

p p

n

Rule of Thumb #1: Population is 10X sample size

0.60858

Page 36: 9.1 – Sampling Distributions

c. Describe the shape of the sampling distribution of Justify your answer.

10np and (1 ) 10n p

160 10

3

20 10

160 1 10

3

40 10

Approximately Normal. 0.5,0.60858N

Rule of Thumb #2:

Page 37: 9.1 – Sampling Distributions

Example #2According to government data, 22% of American children under the age of 6 live in households with incomes less than the official poverty level. A study of learning in early childhood chooses an SRS of 300 children. What is the probability that more than 20% of the sample are from poverty households?

Rule of Thumb #1: N 10n

N 10(300)

N 3000

Population is 10X sample size, ok to use standard deviation

Page 38: 9.1 – Sampling Distributions

Example #2According to government data, 22% of American children under the age of 6 live in households with incomes less than the official poverty level. A study of learning in early childhood chooses an SRS of 300 children. What is the probability that more than 20% of the sample are from poverty households?

0.22p

Page 39: 9.1 – Sampling Distributions

Example #2According to government data, 22% of American children under the age of 6 live in households with incomes less than the official poverty level. A study of learning in early childhood chooses an SRS of 300 children. What is the probability that more than 20% of the sample are from poverty households?

10np and (1 ) 10n p 300 0.22 10

66 10

300 1 0.22 10 234 10

Approximately Normal.

Rule of Thumb #2:

Page 40: 9.1 – Sampling Distributions

(1 ),p p

N pn

0.22p

.22(1 .22)

300

ˆ

(1 )p

p p

n

0.0239

0.22,0.0239N

Page 41: 9.1 – Sampling Distributions

Example #2According to government data, 22% of American children under the age of 6 live in households with incomes less than the official poverty level. A study of learning in early childhood chooses an SRS of 300 children. What is the probability that more than 20% of the sample are from poverty households?

ˆ

(1 )

p pZ

p pn

0.20 0.22

0.0239

0.8362

Page 42: 9.1 – Sampling Distributions

0.22

= 0.0239

0.20

P(Z – 0.8362) = 1 – P(Z – 0.8362)

Page 43: 9.1 – Sampling Distributions
Page 44: 9.1 – Sampling Distributions

0.22

= 0.0239

0.20

P(Z – 0.8362) =

= 1 – 0.2005

Or: normalcdf(0.20, 1000000, 0.22, 0.0239) = 0.7985

1 – P(Z – 0.8362)

= 0.7995

Page 45: 9.1 – Sampling Distributions

b. How large a sample would be needed to guarantee that the standard deviation of is no more than 0.01? Explain.

n

ppp

)1(ˆ

n

)22.1(22.01.0

0.17160.0001

n

1716n

0.0001 0.1716n

Page 46: 9.1 – Sampling Distributions

9.3 – Sample Means

Page 47: 9.1 – Sampling Distributions

x

Sample Means Distribution:

nx

Page 48: 9.1 – Sampling Distributions
Page 49: 9.1 – Sampling Distributions

How do you determine normality?

• If sample distribution is drawn from a Normal population, sample distribution is Normal, no matter how big n is

• If sample distribution is drawn from a Skewed population, sample distribution is Skewed, if n is small.

Page 50: 9.1 – Sampling Distributions

Central Limit Theorem: (CLT)

• No matter what the population distribution looks like, if n 30, then the sample distribution is approximately normal.

Page 51: 9.1 – Sampling Distributions
Page 52: 9.1 – Sampling Distributions

To calculate z-scores:

xZ

n

statistic parameterStandardized test statistic:

standard deviation of statistic

Or x

x

Z

Page 53: 9.1 – Sampling Distributions

Example #1A soft-drink bottler claims that, on average, cans contain 12 oz of soda. Let x denote the actual volume of soda in a randomly selected can. Suppose that x is normally distributed with = 0.16 oz. Sixteen cans are to be selected, and the soda volume will be determined for each one.

a. Describe the shape of the sample distribution

Because the population is approx normal, so is the sample distribution

Page 54: 9.1 – Sampling Distributions

b. Calculate the sample mean and standard deviation

Example #1A soft-drink bottler claims that, on average, cans contain 12 oz of soda. Let x denote the actual volume of soda in a randomly selected can. Suppose that x is normally distributed with = 0.16 oz. Sixteen cans are to be selected, and the soda volume will be determined for each one.

x

x

12

n

0.16

16 0.04

Page 55: 9.1 – Sampling Distributions

Example #1A soft-drink bottler claims that, on average, cans contain 12 oz of soda. Let x denote the actual volume of soda in a randomly selected can. Suppose that x is normally distributed with = 0.16 oz. Sixteen cans are to be selected, and the soda volume will be determined for each one.

c. Determine the probability the sample mean soda volume is between 11.9 oz and 12.1 oz of the company’s claim.

12 12.111.9

= 0.04x

Z

n

12.1 12

0.04

2.5

xZ

n

11.9 12

0.04

2.5

Page 56: 9.1 – Sampling Distributions

P( -2.5 < Z < 2.5) =

= P(Z < 2.5) – P(Z< -2.5)

P(Z < 2.5) – P(Z< -2.5)

Page 57: 9.1 – Sampling Distributions
Page 58: 9.1 – Sampling Distributions
Page 59: 9.1 – Sampling Distributions

P( -2.5 < Z < 2.5) =

= 0.9938 – 0.0062

P(Z < 2.5) – P(Z< -2.5)

= 0.9876

Or: normalcdf(11.9, 12.2, 12, 0.04) = 0.9876

Page 60: 9.1 – Sampling Distributions

Example #2The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds.

a. What is the probability that a baby chosen at random weighs less than 5.5 pounds at birth?

=7.5

= 1.25

5.5

P(Z < -1.6) =

xZ

5.5 7.5

1.25

1.6

Page 61: 9.1 – Sampling Distributions
Page 62: 9.1 – Sampling Distributions

P(Z < -1.6) = 0.0548

Or: normalcdf(-1000000, 5.5, 7.5, 1.25) = 0.0548

Page 63: 9.1 – Sampling Distributions

Example #2The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds.b. You choose forty babies at random and compute their mean weight. What are the mean and standard deviation of the mean weight of the three babies?

Distribution approx normal because population is, also n 30

x x 7.5n

1.25

400.1976

Page 64: 9.1 – Sampling Distributions

Example #2The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds.c. What is the probability that the forty babies average birth weight is less than 5.5 pounds?

=7.5

= 0.1976

5.5

P(Z < -10.12) =

xZ

n

5.5 7.5

0.1976

10.12

Page 65: 9.1 – Sampling Distributions
Page 66: 9.1 – Sampling Distributions

P(Z < -10.12) = 0

Or: normalcdf(-1000000, 5.5, 7.5, 0.1976) = 0

Page 67: 9.1 – Sampling Distributions

Example #2The weights of newborn children in the United States vary according to the normal distribution with mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having low birth weight if the weight is less than 5.5 pounds.d. Would your answers to a, b, or c be affected if the distribution of birth weights in the population were distinctly nonnormal?

Yes, you couldn’t use the normal approximation for part a.

Part b and c are fine because n 30, and by the CLT, the distribution is approximately normal