09 Sampling Distributions

Embed Size (px)

Citation preview

Sampling Distribution

St. AndrewsSt. Andrews University receives 900 applications annually from prospective students. The application forms contain a variety of information including the individuals scholastic aptitude test (SAT) score and whether or not the individual desires oncampus housing.

St. Andrews To get numerical/statistical information from the population (for example, the mean scores of all the applicants) Census of all 900 applicants Survey of a portion of the applicants (ex. 30)

St. Andrews Taking a Census of the 900 Applicants SAT Scores Population Mean

x Q!

i

900

! 990

Population Standard Deviation

W!

( x i Q )2 900

! 80

Applicants Wanting On-Campus Housing Population Proportion 648 p! ! .72 900

St. Andrews Taking a survey of 30 people Random No. Number 1 744 2 436 3 865 4 790 5 835 . . 30 685 Applicant SAT Score Connie Reyman 1025 William Fox 950 Fabian Avante 1090 Eric Paxton 1120 Winona Wheeler 1015 . . Kevin Cossack 965 On-Campus Yes Yes No Yes No . No

St. Andrews Population Sample

x Q!W!

i

900i

! 9902

x x!s! 29

29,910 ! ! 997 30 30i

(x

Q)

900

! 80

( xi x )2

163,996 ! ! 75.2 29

648 p! ! .72 900

p ! 20 30 ! .68

Sampling Error The absolute value of the difference between an unbiased point estimate and the population parameter it estimates is called the sampling error. For the case of a sample mean estimating a population mean: Sampling Error = | x Q|

St. Andrews Population Sample

x Q!W!

i

900i

! 9902

x x!s! 29

29,910 ! ! 997 30 30i

(x

Q)

900

! 80

( xi x )2

163,996 ! ! 75.2 29

648 p! ! .72 900

p ! 20 30 ! .68

Sampling Distribution of the Sample Mean The probability distribution of the population of the sample means obtainable from all possible samples of size n from a population of size N

Example: Sampling Annual % Return of 6 StocksSTOCKS % RETURN A B C D E F 10% 20% 30% 40% 50% 60%

Assume that we have a population of 6 stocks (shown in the table) Computing for the population parameters, we get:N=6 = 35% W = 17.078%

Example: Sampling Annual % Return of 6 Stocks Lets try taking a random sample of size n = 1. We can take 6 samples (6C1) from the population, each with the same probability of being chosen. Thus, each would have a 1/6 chance of being chosen.

Example: Sampling Annual % Return of 6 Stocks

Example: Sampling Annual % Return of 6 Stocks

Example: Sampling Annual % Return of 6 Stocks Now, lets try taking samples of size n = 2. We can take a total of 15 samples (6C2) from the population of 6 stocks. Calculating the sample mean of each and every sample, we get

Example: Sampling Annual % Return of 6 StocksSample Mean 15 20 25 30 35 40 45 50 55 Relative Frequency Frequency 1 1/15 1 1/15 2 2/15 2 2/15 3 3/15 2 2/15 2 2/15 1 1/15 1 1/15

Example: Sampling Annual % Return of 6 Stocks

Observations Although the population of N = 6 stock returns has a uniform distribution, the histogram of n = 15 sample mean returns:1. Seem to be centered over the sample mean return of 35%, and 2. Appears to be bell-shaped and less spread out than the histogram of individual returns

Example: Sampling All Stocks Population of returns of all 1,815 stocks listed on NYSE for 1987 The mean rate of return Q was 3.5% with a standard deviation W of 26%

Example: Sampling All Stocks

Example: Sampling All Stocks Draw all possible random samples of size n = 5 and calculate the sample mean return of each

Example: Sampling All Stocks

Results from Sampling All Stocks Observations Both histograms appear to be bell-shaped and centered over the same mean of 3.5% The histogram of the sample mean returns looks less spread out than that of the individual returns

Statistics Mean of all sample means: Q x = Q = -3.5% Standard deviation of all possible means:

Wx !

W n

!

26 5

! 11.63%

And the Empirical Rule The empirical rule holds for the sampling distribution of the sample mean 68.26% = 1 Standard Deviation from the Mean 95.44% = 2 Standard Deviations from the Mean 99.73% = 3 Standard Deviations from the Mean

Properties of the Sampling Distribution of the Sample Mean If the population being sampled is normal, then so is the sampling distribution of the sample mean, x The mean Q x of the sampling distribution of x is Qx = That is, the mean of all possible sample means is the same as the population mean

Properties of the Sampling Distribution of the Sample Mean2 The variance W x of the sampling distribution of x is

W W ! n2 x

2

That is, the variance of the sampling distribution of x is directly proportional to the variance of the population, and inversely proportional to the sample size

Properties of the Sampling Distribution of the Sample Mean The standard deviation W x of the sampling distribution of is x

Wx !

W

n

That is, the standard deviation of the sampling distribution of x is o directly proportional to the standard deviation of the population, and o inversely proportional to the square root of the sample size

- W x is also called the standard error of the mean

Notes2 W x and W x hold if the sampled The formulas for population is infinite The formulas hold approximately if the sampled population is finite but if N is much larger (at least 20 times larger) than the n (N/n 20) x is the point estimate of Q, and the larger the sample

size n, the more accurate the estimate

If the population is finite:

W N n Wx ! ( ) n N 1

Where ( N n) / ( N 1) is the finite correction factor

Central Limit Theorem Now consider sampling a non-normal population Still have: Q x ! Q and W x ! W n Exactly correct if infinite population Approximately correct if population size N finite but much larger than sample size n Especially if N 20 v n

Central Limit Theorem But if population is non-normal, what is the shape of the sampling distribution of the sample mean? Is it normal, like it is if the population is normal?

Yes, the sampling distribution is approximately normal if the sample is large enough, even if the population is non-normal

This is the Central Limit Theorem

Central Limit Theorem For a population with mean Q and variance W2, the sampling distribution of the means of all possible samples of size n generated from the population will be approximately normally distributedwith the mean of the sampling distribution equal to Q and the variance W2/nassuming that the sample size is sufficiently large (n > 30)

Central Limit Theorem When the simple random sample is small (n < 30), the sampling distribution of x can be considered normal only if we assume the population has a normal probability distribution. Further, the larger the sample size n, the closer the sampling distribution of the sample mean is to being normal

Central Limit TheoremRandom Sample (x1, x2, , xn) X

xas n p largeSampling Distribution of Sample Meanx

Population Distribution

(Q, W)(right-skewed)

Q

! Q ,W x ! W(nearly normal)

n

Unbiased Estimates A sample statistic is an unbiased point estimate of a population parameter if the mean of all possible values of the sample statistic equals the population parameter x is an unbiased estimate of Q because Qx=Q In general, the sample mean is always an unbiased estimate of Q The sample median is often an unbiased estimate of Qbut not always

Minimum Variance Estimates Want the sample statistic to have a small standard deviation All values of the sample statistic should be clustered around the population parameter Then, the statistic from any sample should be close to the population parameter

Minimum Variance Estimates Given a choice between unbiased estimates, choose one with smallest standard deviation The sample mean and the sample median are both unbiased estimates of Q The sampling distribution of sample means generally has a smaller standard deviation than that of sample medians

Example Suppose that we will randomly select a sample of 64 measurements from a population having a mean equal to 20 and a standard deviation equal to 4. Describe the shape of the sampling distribution of the sample mean. Find the mean and the standard deviation of the sampling distribution of the sample mean. Calculate the probability that the sample mean is greater than 21. Calculate the probability that the sample mean is less than 19.385.

Exercise: Pizza DeliveryWhen a pizza restaurants delivery process is operating effectively, pizzas are delivered in an average of 45 minutes with a standard deviation of 6 minutes. To monitor its delivery process, the restaurant randomly selects five pizzas each night and records their delivery times.

Exercise: Pizza Delivery Assume that the population of all delivery times on a given evening is normally distributed with a mean of 45 minutes and a standard deviation of 6 minutes. Describe the shape of the population of all sample means. How do you know what the shape is? Find the mean and standard deviation of the possible sample means. Calculate an interval containing 99.73% of the sample means.

Exercise: Pizza Delivery Suppose that a sample gave a mean of 55 minutes. Using the interval, what would you conclude about whether the restaurants delivery process is operating effectively?

Exercise: Bank Customer Waiting Time Case A bank manager wants to show that the new system reduces typical customer waiting times to less than six minutes. One way to do this is to demonstrate that the mean of the population of all customer waiting times is less than 6. Letting this mean be , in this exercise we wish to investigate whether the sample of 100 waiting times provides evidence to support the claim that is less than 6. The mean of the sample of 100 waiting times is 5.46 and assume that of the population of all customer waiting times is known to be 2.47.

Exercise: Bank Customer Waiting Time Casea) Consider the population of all sample means obtained from samples of 100 waiting times. What is the shape of this population of sample means? b) Find the mean and standard deviation of the population of all possible sample means when we assume that is equal to 6. c) The sample mean that we have actually observed is 5.46. Assuming that = 6, find the probability of observing a sample mean that is less than or equal to 5.46. d) Is it more reasonable to believe that = 6 or is less than 6? What do you conclude about whether the new system has reduced the typical customer waiting time to less than 6 minutes?

Exercise: Aamco Heating and Cooling Aamco Heating and Cooling Inc advertises that any customer buying an air conditioner during the first 16 days of July will receive a 25 percent discount if the average high temperature for this 16-day period is more than five degrees above normal. If daily high temperature in July is normally distributed with a mean of 84 degrees and a standard deviation of 8 degrees, what is the probability that Aamco Heating and Cooling will have to give its customers the 25 percent discount? Based on the probability you computed above, do you think that Aamcos promotion is ethical?

Exercise: Supplier Defects A computer supply house receives a large shipment of floppy disks each week. Past experience has shown that the number of flaws per disk can be described by the following probability distribution:0a)

65 %

1

20%

2

10%

3

5%

b)

Calculate the mean and standard deviation of the number of flaws per floppy disk. Suppose that we randomly select a sample of 100 floppy disks. Compute the mean and standard deviation of the sampling distribution of the sample mean. Assume a random sample of 100 disks is drawn from each shipment from the supplier with the shipment being rejected if the average number of flaws per disk for the 100 sample disks is greater than 0.75. Suppose the mean number of flaws per disk for this weeks entire shipment is actually 0.55, what is the probability that the shipment will be rejected and sent back to the supplier?

c)

Sampling Distribution of the Sample ProportionThe probability distribution of all possible sample proportions is the sampling distribution of the p sample proportion If a random sample of size n is taken from a population then the sampling distribution of is p approximately normal, if n is large.p has mean Q ! p

has standard deviation W p !

p p 1 n

p where p is the population proportion and is a sampled proportion, and n could be considered large if both np and n(1 p) are at least 5

Sampling Distribution of the Sample Proportion If the population is finite:

Wp !

p(1 p) N n n N 1

W p is referred to as the standard error of

the proportion. n could be considered large if both np and n(1 p) are at least 5

Example Suppose that we will randomly select a sample of n = 100 units from a population and that we will compute the sample p proportion of these units that fall into a category of interest. If the true population proportion p equals 90%: Describe the shape of the sampling distribution of p Find the mean and the standard deviation of the sampling distribution of p

Example Calculate the following probabilities about p the sample proportion . In each case sketch the sampling distribution and the probability. p P( 0.96) p P(0.855 0.945) P( 0.915) p

Exercise: Bank of America Historically, the percentage of Bank of America customers expressing customer delight has been 48%. Suppose that we wish to use the results of a survey of 350 Bank of America customers to justify the claim that more than 48% of all current Bank of America customers express customer delight. If we assume that the proportion of customer delight is p = .48, calculate the probability of observing a sample proportion greater than or equal to 189/350 = 0.54.

Exercise: M&Ms Candy Bags Each bag of M&Ms contains 455 pieces of candy. There are six colors, and according to an old edition of the M&M website, an ideal bag of M&Ms containsColor Red Yellow Green Blue Orange Brown Percentage 12% 15% 15% 23% 23% 12% Number 55 67 68 105 105 55

Exercise: M&Ms Candy Bags Suppose we counted the number of blue M&Ms in 30 M&M packs. The mean proportion from our sample is 23.04%. Can we therefore say that the percentage of blue candies in M&M bags these days are in fact greater than 23%?

Exercise: M&Ms Candy Bags This problem is unsolvable if the standard deviation is not given since this is not a sample proportion problem but a sample mean problem. Solve the problem if the standard deviation of the population W = 0.0197.