39
Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.

Sampling distribution

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Sampling distribution

Business Statistics: Communicating with Numbers

By Sanjiv Jaggia and Alison Kelly

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.

Page 2: Sampling distribution

7-2

Chapter 7 Learning Objectives (LOs)LO 7.1: Differentiate between a population

parameter and a sample statistic.

LO 7.5: Describe the properties of the sampling distribution of the sample mean.

Page 3: Sampling distribution

7-3

Chapter 7 Learning Objectives (LOs)LO 7.6: Explain the importance of the central

limit theorem.

LO 7.7: Describe the properties of the sampling distribution of the sample proportion.

LO 7.8: Use a finite population correction factor.

Page 4: Sampling distribution

7-4

7.1 SamplingLO 7.1 Differentiate between a population parameter and sample statistic. Population—consists of all items of interest

in a statistical problem. Population Parameter is unknown.

Sample—a subset of the population. Sample Statistic is calculated from sample and

used to make inferences about the population. Bias—the tendency of a sample statistic to

systematically over- or underestimate a population parameter.

Page 5: Sampling distribution

7-5

7.1 Sampling

Classic Case of a “Bad” Sample: The Literary Digest Debacle of 1936 During the1936 presidential election, the Literary Digest

predicted a landslide victory for Alf Landon over Franklin D. Roosevelt (FDR) with only a 1% margin of error.

They were wrong! FDR won in a landslide election. The Literary Digest had committed selection bias by

randomly sampling from their own subscriber/ membership lists, etc.

In addition, with only a 24% response rate, the Literary Digest had a great deal of non-response bias.

LO 7.2 Explain common sample biases.

Page 6: Sampling distribution

7-6

7.1 SamplingLO 7.2

Selection bias—a systematic exclusion of certain groups from consideration for the sample. The Literary Digest committed selection bias by excluding

a large portion of the population (e.g., lower income voters).

Nonresponse bias—a systematic difference in preferences between respondents and non-respondents to a survey or a poll. The Literary Digest had only a 24% response rate. This

indicates that only those who cared a great deal about the election took the time to respond to the survey. These respondents may be atypical of the population as a whole.

Page 7: Sampling distribution

7-7

7.1 Sampling

Sampling Methods Simple random sample is a sample of n

observations which has the same probability of being selected from the population as any other sample of n observations. Most statistical methods presume simple

random samples. However, in some situations other sampling

methods have an advantage over simple random samples.

LO 7.3 Describe simple random sampling.

Page 8: Sampling distribution

7-8

7.1 SamplingLO 7.3

Example: In 1961, students invested 24 hours per week in their academic pursuits, whereas today’s students study an average of 14 hours per week. A dean at a large university in California wonders if this

trend is reflective of the students at her university. The university has 20,000 students and the dean would like a sample of 100. Use Excel to draw a simple random sample of 100 students.

In Excel, choose Formulas > Insert function > RANDBETWEEN and input the values shown here.

Page 9: Sampling distribution

7-9

Simple Random Sample: Using Table of Random NumbersA population consists of 845 employees of Nitra Industries. A sample of 52 employees is to be selected from that population.

A more convenient method of selecting a random sample is to use the identification number of each employee and a table of random numbers such as the one in Appendix B.6.

Page 10: Sampling distribution

7-10

Systematic Random Sampling

EXAMPLEA population consists of 845 employees of Nitra Industries. A sample

of 52 employees is to be selected from that population.

First, k is calculated as the population size divided by the sample size. For Nitra Industries, we would select every 16th (845/52) employee list. If k is not a whole number, then round down. Random sampling is used in the selection of the first name. Then, select every 16th name on the list thereafter.

Systematic Random Sampling: The items or individuals of the population are arranged in some order. A random starting point is selected and then every kth member of the population is selected for the sample.

Page 11: Sampling distribution

7-11

7.1 Sampling

Stratified Random Sampling Divide the population into mutually exclusive and

collectively exhaustive groups, called strata. Randomly select observations from each stratum,

which are proportional to the stratum’s size. Advantages:

Guarantees that the each population subdivision is represented in the sample.

Parameter estimates have greater precision than those estimated from simple random sampling.

LO 7.4 Distinguish between stratified random sampling and cluster sampling.

Page 12: Sampling distribution

7-12

Stratified Random Sampling

Stratified Random Sampling: A population is first divided into subgroups, called strata, and a sample is selected from each stratum. Useful when a population can be clearly divided in groups based on some characteristics

Suppose we want to study the advertising expenditures for the 352 largest companies in the United States to determine whether firms with high returns on equity (a measure of profitability) spent more of each sales dollar on advertising than firms with a low return or deficit.

To make sure that the sample is a fair representation of the 352 companies, the companies are grouped on percent return on equity and a sample proportional to the relative size of the group is randomly selected.

Page 13: Sampling distribution

7-13

7.1 SamplingLO 7.4

Cluster Sampling Divide population into mutually exclusive and

collectively exhaustive groups, called clusters. Randomly select clusters. Sample every observation in those randomly

selected clusters. Advantages and disadvantages:

Less expensive than other sampling methods. Less precision than simple random sampling or

stratified sampling. Useful when clusters occur naturally in the population.

Page 14: Sampling distribution

7-14

Cluster SamplingCluster Sampling: A population is divided into clusters using naturally occurring geographic or other boundaries. Then, clusters are randomly selected and a sample is collected by randomly selecting from each cluster.

Suppose you want to determine the views of residents in Oregon about state and federal environmental protection policies.

Cluster sampling can be used by subdividing the state into small units—either counties or regions, select at random say 4 regions, then take samples of the residents in each of these regions and interview them. (Note that this is a combination of cluster sampling and simple random sampling.)

Page 15: Sampling distribution

7-15

7.1 SamplingLO 7.4

Stratified versus Cluster Sampling Stratified Sampling

Sample consists of elements from each group.

Preferred when the objective is to increase precision.

Cluster Sampling

Sample consists of elements from the selected groups.

Preferred when the objective is to reduce costs.

Page 16: Sampling distribution

7-16

7.2 The Sampling Distribution of the Means

Population is described by parameters. A parameter is a constant, whose value may be unknown. Only one population.

Sample is described by statistics. A statistic is a random variable whose value depends on

the chosen random sample. Statistics are used to make inferences about the

population parameters. Can draw multiple random samples of size n.

LO 7.5 Describe the properties of the sampling distribution of the sample mean.

Page 17: Sampling distribution

7-17

Methods of Probability Sampling

In nonprobability sample inclusion in the sample is based on the judgment of the person selecting the sample.

The sampling error is the difference between a sample statistic and its corresponding population parameter.

Page 18: Sampling distribution

7-18

7.2 The Sampling Distribution of the Sample Mean Estimator

A statistic that is used to estimate a population parameter.

For example, , the mean of the sample, is an estimator of m, the mean of the population.

Estimate A particular value of the estimator. For example, the mean of the sample is an

estimate of m, the mean of the population.

LO 7.5

X

x

Page 19: Sampling distribution

7-19

7.2 The Sampling Distribution of the Sample Mean Sampling Distribution of the Mean

Each random sample of size n drawn from the population provides an estimate of m—the sample mean .

The sampling distribution of the sample mean is a probability distribution consisting of all possible sample means of a given sample size selected from a population.

LO 7.5

X

x

Page 20: Sampling distribution

7-20

7.2 The Sampling Distribution of the Sample Mean Example

LO 7.5

  Random Variable    

  X1   X2   X3   X4  Mean of

X6 10 8 4 5.575 10 4 3 5.711 8 4 3 6.364 1 6 2 4.076 6 8 47 7 8 61 5 10 55 5 9 14 6 4 27 4 9 58 5 8 69 2 7 79 1 2 36 10 2 6

Means 5.57   5.71   6.36   4.07   5.43

One simple random sample drawn from the population—a single distribution of values of X.

Means from each distribution (random draw) from the population.

A distribution of means from each random draw from the population.

Page 21: Sampling distribution

7-21

7.2 The Sampling Distribution of the Sample Mean Example: Draw a sampling distribution of

the sample mean from a population of X={1,2,3,4,5,6} from a sample of n=3 without replacement.

Process: List all possible samples Calculate each mean of all possible samples Construct the distribution of the sample means

LO 7.5

Page 22: Sampling distribution

7-22

7.2 The Sampling Distribution of the Sample Mean The Expected Value and Standard Deviation

of the Sample Mean Expected Value

The expected value of X,

The expected value of the sample mean,

LO 7.5

E X

E X E X

Page 23: Sampling distribution

7-23

7.2 The Sampling Distribution of the Sample Mean The Expected Value and Standard Deviation

of the Sample Mean Variance of X

Standard Deviation of X

of

LO 7.5

2Var X

X Where n is the sample size. Also known as the Standard Error of the Mean.

2SD X

SD Xn

Page 24: Sampling distribution

7-24

7.2 The Sampling Distribution of the Sample Mean

The Central Limit Theorem For any population X with expected value m and standard

deviation s, the sampling distribution of X will be approximately normal if the sample size n is sufficiently large.

As a general guideline, the normal distribution approximation is justified when n > 30.

As before, if is approximately normal, then we can transform it to

LO 7.6 Explain the importance of the central limit theorem.

X XZ

n

Page 25: Sampling distribution

7-25

Central Limit Theorem

If the population follows a normal probability distribution, then for any sample size the sampling distribution of the sample mean will also be normal.

If the population distribution is symmetrical (but not normal), the normal shape of the distribution of the sample mean emerge with samples as small as 10.

If a distribution that is skewed or has thick tails, it may require samples of 30 or more to observe the normality feature.

The mean of the sampling distribution equal to μ and the variance equal to σ2/n.

CENTRAL LIMIT THEOREM If all samples of a particular size are selected from any population, the sampling distribution of the sample mean is approximately a normal distribution. This approximation improves with larger samples.

Page 26: Sampling distribution

7-26

Page 27: Sampling distribution

7-27

Using the SamplingDistribution of the Sample Mean (Sigma Known)

If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution.

If the shape is known to be nonnormal, but the sample contains at least 30 observations, the central limit theorem guarantees the sampling distribution of the mean follows a normal distribution.

To determine the probability a sample mean falls within a particular region, use:

n

Xz

Page 28: Sampling distribution

7-28

If the population does not follow the normal distribution, but the sample is of at least 30 observations, the sample means will follow the normal distribution.

To determine the probability a sample mean falls within a particular region, use:

ns

Xt

Using the SamplingDistribution of the Sample Mean (Sigma Unknown)

Page 29: Sampling distribution

7-29

7.2 The Sampling Distribution of the Sample Mean Example: Given that m = 16 inches and s = 0.8

inches, determine the following: What is the expected value and the standard

deviation of the sample mean derived from a random sample of

2 pizzas

4 pizzas

LO 7.5

16E X 0.80.57

2SD X

n

16E X 0.80.40

4SD X

n

Page 30: Sampling distribution

7-30

7.2 The Sampling Distribution of the Sample Mean Sampling from a Normal Distribution

For any sample size n, the sampling distribution of is normal if the population X from which the sample is drawn is normally distributed.

If X is normal, then we can transform it into the standard normal random variable as:

LO 7.5

X

X E X XZ

SD X n

For a sampling distribution.

x E X xZ

SD X

For a distribution of the values of X.

Page 31: Sampling distribution

7-31

7.2 The Sampling Distribution of the Sample Mean

LO 7.5

Example: Given that m = 16 inches and s = 0.8 inches, determine the following: What is the probability that a randomly selected pizza is

less than 15.5 inches?

What is the probability that 2 randomly selected pizzas average less than 15.5 inches?

15.5 160.63

0.8

xZ

( 15.5) ( 0.63)

0.2643 or 26.43%

P X P Z

( 15.5) ( 0.88)

0.1894 or 18.94%

P X P Z

15.5 160.88

0.8 2

xZ

n

Page 32: Sampling distribution

7-32

Example: From the introductory case, Anne wants to determine if the marketing campaign has had a lingering effect on the amount of money customers spend on iced coffee.

Before the campaign, m = $4.18 and s = $0.84. Based on 50 customers sampled after the campaign, m = $4.26.

Let’s find . Since n > 30, the central limit theorem states that is approximately normal. So,

4.26P X X

7.2 The Sampling Distribution of the Sample Mean

LO 7.6

4.26 4.184.26

0.84 50

0.67 1 0.7486 0.2514

XP X P Z P Z

n

P Z

Page 33: Sampling distribution

7-33

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7 Describe the properties of the sampling distribution of the sample proportion.

Estimator Sample proportion is used to estimate the

population parameter p.

Estimate A particular value of the estimator .

P

p

Page 34: Sampling distribution

7-34

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

The Expected Value and Standard Deviation of the Sample Proportion Expected Value

The expected value of ,

The standard deviation of ,

E P p

P

P

1p pSD P

n

Page 35: Sampling distribution

7-35

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

The Central Limit Theorem for the Sample Proportion For any population proportion p, the sampling

distribution of is approximately normal if the sample size n is sufficiently large .

As a general guideline, the normal distribution approximation is justified when np > 5 and n(1 p) > 5.

P

Page 36: Sampling distribution

7-36

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

The Central Limit Theorem for the Sample Proportion If is normal, we can transform it into the

standard normal random variable as

Therefore any value on has a corresponding valuez on Z given by

P

1

P E P P pZ

SD P p p

n

Pp

1

p pZ

p p

n

Page 37: Sampling distribution

7-37

The Central Limit Theorem for the Sample Proportion

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

Sampling distribution of when the population proportionis p = 0.10.

P Sampling distribution of when the population proportionis p = 0.30.

P

Page 38: Sampling distribution

7-38

Example: From the introductory case, Anne wants to determine if the marketing campaign has had a lingering effect on the proportion of customers who are women and teenage girls. Before the campaign, p = 0.43 for women and p = 0.21 for

teenage girls. Based on 50 customers sampled after the campaign, p = 0.46 and p = 0.34, respectively.

Let’s find . Since n > 30, the central limit theorem states that is approximately normal.

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

0.46P P P

Page 39: Sampling distribution

7-39

7.3 The Sampling Distribution of the Sample Proportion

LO 7.7

0.46 0.430.46

1 0.43 1 0.43

50

0.43 1 0.6664 0.3336

p pP P P Z P Z

p p

n

P Z