Upload
thomasmcarter
View
238
Download
0
Embed Size (px)
Citation preview
7/28/2019 Doane Chapter 08a
1/77
7/28/2019 Doane Chapter 08a
2/77
Sampling Distributionsand Estimation
Chapter
8
Sampling Variation
Estimators and Sampling Distributions
Sample Mean and the Central Limit Theorem
Confidence Interval for a Mean (m) with Known s
Confidence Interval for a Mean (m) with Unknown s
Confidence Interval for a Proportion (p)
Part 1
7/28/2019 Doane Chapter 08a
3/77
Sampling Variation
Sample statistic a random variable whosevalue depends on which population itemshappen to be included in the random sample.
Depending on the sample size, the samplestatistic could either represent the populationwell or differ greatly from the population.
This sampling variation can easily beillustrated.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
7/28/2019 Doane Chapter 08a
4/77
Sampling Variation
Consider eight random samples of size n = 5 from alarge population of GMAT scores for MBAapplicants.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
The sample means (xi) tend to be close to thepopulation mean (m = 520.78).
7/28/2019 Doane Chapter 08a
5/77
Sampling Variation
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Dot plot of eight samples of size n = 5
Dot plot of eight sample means
7/28/2019 Doane Chapter 08a
6/77
Estimators and Sampling Distributions
Estimator a statistic derived from a sample toinfer the value of a populationparameter.
Estimate the value of the estimator in aparticular sample.
Population parametersare represented by
Greek letters and thecorresponding statisticby Roman letters.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Some Term inolog y
7/28/2019 Doane Chapter 08a
7/77
Estimators and Sampling Distributions
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Examples o f Est imators
7/28/2019 Doane Chapter 08a
8/77
Estimators and Sampling Distributions
The sampling distribution of an estimator is theprobability distribution of all possible values the
statistic may assume when a random sample ofsize n is taken.
An estimator is a random variable since samplesvary.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Sampl ing Distr ibut ion s
Sampling error= ^
7/28/2019 Doane Chapter 08a
9/77
Estimators and Sampling Distributions
Bias is the difference between the expectedvalue of the estimator and the true parameter.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Bias
Bias = E( )^
An estimator is unbiasedif E( ) = ^
On average, an unbiased estimator neitheroverstates nor understates the true parameter.
7/28/2019 Doane Chapter 08a
10/77
Estimators and Sampling Distributions
Sampling error is random whereas bias issystematic.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Bias
An unbiased estimator avoids systematic error.
7/28/2019 Doane Chapter 08a
11/77
Estimators and Sampling Distributions
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
7/28/2019 Doane Chapter 08a
12/77
Estimators and Sampling Distributions
Efficiencyrefers to the variance of the estimatorssampling distribution.
A more efficientestimator has smaller variance.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Eff ic iency
7/28/2019 Doane Chapter 08a
13/77
Estimators and Sampling Distributions
A consistentestimator converges toward theparameter being estimated as the sample sizeincreases.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Consis tency
7/28/2019 Doane Chapter 08a
14/77
Sample Mean and theCentral Limit Theorem
The sample mean is an unbiased estimator ofm,therefore,
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
E(X) = E(X) = m
The standard error of the mean is the standarddeviation of the sampling error ofx:
sx=s
n
7/28/2019 Doane Chapter 08a
15/77
Sample Mean and theCentral Limit Theorem
If the population is exactly normal, then thesample mean follows a normal distribution.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
7/28/2019 Doane Chapter 08a
16/77
Sample Mean and theCentral Limit Theorem
For example, the average price, m, of a 5 GBMP3 player is $80.00 with a standard deviation,s, equal to $10.00. What will be the mean andstandard error from a sample of 20 players?
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
E(X) = E(X) = m = $80.00
sx=s
n= 10
20
= $2.236
If the distribution of prices for these players is anormal distribution, then the sampling distributiononxis N(80.00, 2.236).
7/28/2019 Doane Chapter 08a
17/77
Sample Mean and theCentral Limit Theorem
If a random sample of size n is drawn from apopulation with mean m and standard deviation s,the distribution of the sample meanxapproachesa normal distribution with mean m and standarddeviation sx = s/ n as the sample size increase.
If the population is normal, the distribution of the
sample mean is normal regardless of samplesize.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Cent ral Lim it Theorem (CLT) fo r a Mean
7/28/2019 Doane Chapter 08a
18/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
7/28/2019 Doane Chapter 08a
19/77
Sample Mean and theCentral Limit Theorem
Rule of thumb: to obtain a normal distribution forthe sample mean, n > 30.
A much smallern will suffice if the population issymmetric.
For example,consider a
uniformpopulationU(500, 1000).
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Symmetr ic Popu lat ion: Uni form Distr ibu t ion
7/28/2019 Doane Chapter 08a
20/77
Sample Mean and theCentral Limit Theorem
The central limit theorem predicts that samplesdrawn from this population will have a mean of1000 and the standard error of the mean of:
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Symmetr ic Popu lat ion: Uni form Distr ibu t ion
= 288.7/ 1 = 288.7n = 1
Predicted S.E. for sx= s/ n
= 288.7/ 2 = 204.1n = 2
= 288.7/ 4 = 144.3n = 4
= 288.7/ 16 = 72.2n = 16
7/28/2019 Doane Chapter 08a
21/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Histograms o f Samp le Means from Uniform Popu lat ion
7/28/2019 Doane Chapter 08a
22/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Histograms o f Samp le Means from Uniform Popu lat ion
S l d h
7/28/2019 Doane Chapter 08a
23/77
Sample Mean and theCentral Limit Theorem
Consider a strongly skewed population forwaiting timesat airportsecurityscreeningwith m = 2.983and s = 2.451
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Skewed Populat ion: Wait ing Time
S l M d h
7/28/2019 Doane Chapter 08a
24/77
Sample Mean and theCentral Limit Theorem
The CLT predicts that samples drawn from thispopulation will have a mean of 2.983 minutesand standard error of the mean:
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Skewed Populat ion: Wait ing Time
= 2.451/ 1 = 2.451n = 1
Predicted S.E. for sx= s/ n
= 2.451/ 2 = 1.733n = 2
= 2.451/ 4 = 1.255n = 4
= 2.451/ 16 = 0.613n = 16
S l M d h
7/28/2019 Doane Chapter 08a
25/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Histograms of Sample Means from Skewed Populat ion
S l M d th
7/28/2019 Doane Chapter 08a
26/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Histograms of Sample Means from Skewed Populat ion
S l M d th
7/28/2019 Doane Chapter 08a
27/77
Sample Mean and theCentral Limit Theorem
The CLT permits a range or interval within whichthe sample means are expected to fall.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Range o f Sample Means
If we know mand s, the range of sample meansfor samples of size n are predicted to be:
m + zs
n
Where zis from thestandard normal table.
m + 1.645 sn
90% Interval
m + 1.960 sn
95% Interval
m + 2.576 sn
99% Interval
S l M d th
7/28/2019 Doane Chapter 08a
28/77
Sample Mean and theCentral Limit Theorem
For samples of size n = 5 applicants, within whatrange would GMAT means be expected to fall?
The parameters are m = 520.78 and s = 86.8.
The predicted range for 95% of the samplemeans is:
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Il lus trat ion : GMAT Sco res
m + 1.960 s
n
= 520.78 + 1.96086.8
5= 520.78 + 76.08
S l M d th
7/28/2019 Doane Chapter 08a
29/77
Make the intervalsmall by increasing n.
m + zs
n
Sample Mean and theCentral Limit Theorem
The standard error declines as n increases, butat a decreasing rate.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc All rights reserved
Sample Size and Standard Error
The distribution of
sample means collapsesat the true populationmean m as n increases.
Sample Mean and the
7/28/2019 Doane Chapter 08a
30/77
Consider a discrete uniform population consistingof the integers {0, 1, 2, 3}.
The population parameters are:m = 1.5, s = 1.118
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin
2007 The McGraw-Hill Companies Inc All rights reserved
Il lus trat ion: Al l Possible Samples from aUni form Popu lat ion
Sample Mean and the
7/28/2019 Doane Chapter 08a
31/77
Sample Mean and theCentral Limit Theorem
All possible samples of size n = 2, withreplacement, are given below along with theirmeans.
McGraw-Hill/Irwin
2007 The McGraw-Hill Companies Inc All rights reserved
Il lustrat ion : A l l Poss ible Samples from aUni form Popu lat ion
Sample Mean and the
7/28/2019 Doane Chapter 08a
32/77
Sample Mean and theCentral Limit Theorem
The population is uniform, yet the distribution ofall possible sample means has a peakedtriangular shape.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Il lustrat ion : A l l Poss ible Samples from aUni form Popu lat ion
Sample Mean and the
7/28/2019 Doane Chapter 08a
33/77
Sample Mean and theCentral Limit Theorem
The CLTs predictions for the mean and standard
error are
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Il lustrat ion : A l l Poss ible Samples from aUni form Popu lat ion
mx = m = 1.5and
sx= s/ n = 1.118/ 2 = 0.7906
Sample Mean and the
7/28/2019 Doane Chapter 08a
34/77
Sample Mean and theCentral Limit Theorem
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Il lustrat ion : A l l Poss ible Samples from aUni form Popu lat ion
x the mean of means is
x= 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = 1.516
The standard deviation of the means is
Confidence Interval for a
7/28/2019 Doane Chapter 08a
35/77
Confidence Interval for aMean (m) with Known s
A sample meanxis apoint estimate of thepopulation mean m.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
What is a Confidence Interval?
A confidence intervalfor the mean is a rangemlower< m < mupper
The confidence levelis the probability that theconfidence interval contains the true population
mean. The confidence level (usually expressed as a %)
is the area under the curve of the samplingdistribution.
Confidence Interval for a
7/28/2019 Doane Chapter 08a
36/77
Confidence Interval for aMean (m) with Known s
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
What is a Confidence Interval? The confidence interval form with known s is:
Confidence Interval for a
7/28/2019 Doane Chapter 08a
37/77
Confidence Interval for aMean (m) with Known s
A higher confidence level leads to a widerconfidence interval.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Choosing a Conf idence Level
Greaterconfidenceimplies loss ofprecision.
95% confidenceis most oftenused.
Confidence Interval for a
7/28/2019 Doane Chapter 08a
38/77
Confidence Interval for aMean (m) with Known s
A confidence interval eitherdoes ordoes notcontain m.
The confidence level quantifies the risk.
Out of 100 confidence intervals, approximately95% wouldcontain m, while approximately 5%would notcontain m.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Interpretat ion
Confidence Interval for a
7/28/2019 Doane Chapter 08a
39/77
Confidence Interval for aMean (m) with Known s
Yes, but not very often.
In quality control applications with ongoingmanufacturing processes, assume s stays thesame over time.
In this case, confidence intervals are used toconstruct control charts to track the mean of a
process over time.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Iss Ever Known?
Confidence Interval for a
7/28/2019 Doane Chapter 08a
40/77
Confidence Interval for aMean (m) with Unknown s
Use the Students t distribution instead of thenormal distribution when the population is normalbut the standard deviation s is unknown and thesample size is small.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Students t Distribution
x+ ts
n The confidence interval form (unknown s) is
x- ts
nx+ t
s
n< m 10.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
When is i t Safe to Assume Normal ity?
Sample size toassume normality:
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
64/77
p ( )
The standard error ofthe proportion sp
depends on p, as wellas n.
It is largest when p isnear .50 and smaller
when p is near 0 or 1.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Standard Error of the Proport ion
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
65/77
p ( )
The formula for the standard error is symmetric.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Standard Error of the Proport ion
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
66/77
p ( )
Enlarging n reduces the standard errorsp but ata diminishing rate.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Standard Error of the Proport ion
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
67/77
p ( )
The confidence interval forp is
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Con fidence Interval forp
p(1-p)n
p + z
Since p is unknown, the confidence interval forp =x/n (assuming a large sample) is
p(1-p)n
p + z
Where zis based on the desired confidence.
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
68/77
p ( )
zcan be chosen for any confidence level.For example,
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Con fidence Interval forp
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
69/77
A sample of 75 retail in-store purchases showedthat 24 were paid in cash. What isp?
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Example Aud i t ing
p =x/n = 24/75 = .32 Isp normally distributed?
np = (75)(.32) = 24
n(1-p)= (75)(.88) = 51Both are > 10, so we may conclude normality.
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
70/77
The 95% confidence interval for the proportion ofretail in-store purchases that are paid in cash is:
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Example Aud i t ing
p(1-p)np + z =
.32(1-.32)75.32 + 1.96
= .32 + .106
.214 < p < .426
We are 95% confident that this interval containsthe true population proportion.
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
71/77
The width of the confidence interval forpdepends on- the sample size
- the confidence level- the sample proportionp
To obtain a narrower interval (i.e., moreprecision) either
- increase the sample size- reduce the confidence level
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Narrow ing the Interval
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
72/77
To find a confidence interval for a proportion inExcel, use (for example)
=0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200)
=0.15+NORMSINV(.95)*SQRT(0.15*(1-0.15)/200)
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Using Excel and MegaStat
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
73/77
In MegaStat, enterp and n to obtain theconfidence interval for a proportion.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Using Excel and MegaStat
MegaStat always assumes normality.
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
74/77
If the sample is small, the distribution ofp maynot be well approximated by the normal.
Confidence limits aroundp can be constructed
by using the binomial distribution.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Using Excel and MegaStat
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
75/77
In polls and surveys, the confidence intervalwidth when p = .5 is called the margin of error.
Below are some margins of error for 95%
confidence interval assuming p = .50.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Pol ls and Margin of Error
Each reduction in the margin of error requires adisproportionately larger sample size.
Confidence Interval for a Proportion (p)
7/28/2019 Doane Chapter 08a
76/77
If in n independent trials, no events occur, theupper 95% confidence bound is approximately3/n.
McGraw-Hill/Irwin 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Rule of Three
A Very Quick Rule (VQR) for a 95% confidence
interval whenp is near .50 is
Very Quick Ru le
p + 1/ n
7/28/2019 Doane Chapter 08a
77/77
Applied Statistics inBusiness and Economics
End of Part 1 of Chapter 8