View
15
Download
0
Category
Preview:
Citation preview
Chapter 7Sampling and Sampling Distributions
xxSampling Distribution ofIntroduction to Sampling DistributionsPoint EstimationSimple Random Sampling
Example: St. Andrew’s
St. Andrew’s College receives900 applications annually fromprospective students. Theapplication form contains a variety of informationincluding the individual’sscholastic aptitude test (SAT) score and whether or notthe individual desires on-campus housing.
Example: St. Andrew’s
The director of admissionswould like to know thefollowing information:• the average SAT score for
the 900 applicants, and• the proportion of
applicants that want to live on campus.
Example: St. Andrew’s
We will now look at threealternatives for obtaining thedesired information.� Conducting a census of the
entire 900 applicants� Selecting a sample of 30
applicants, using a random number table� Selecting a sample of 30 applicants, using Excel
Conducting a Census
� If the relevant data for the entire 900 applicants were in the college’s database, the population parameters of interest could be calculated using the formulas presented in Chapter 3.
� We will assume for the moment that conducting a census is practical in this example.
990900
ixμ = =∑ 990900
ixμ = =∑
2( )80
900ix μσ−
= =∑2( )
80900
ix μσ−
= =∑
Conducting a Census
648 .72900
p = =648 .72900
p = =
� Population Mean SAT Score
� Population Standard Deviation for SAT Score
� Population Proportion Wanting On-Campus Housing
� as Point Estimator of μxx
� as Point Estimator of ppp
29,910 99730 30
ixx = = =∑ 29,910 99730 30
ixx = = =∑
2( ) 163,996 75.229 29ix xs−
= = =∑2( ) 163,996 75.2
29 29ix xs−
= = =∑
20 30 .68p = =20 30 .68p = =
Point Estimation
Note: Different random numbers would haveidentified a different sample which would haveresulted in different point estimates.
� s as Point Estimator of σ
PopulationParameter
PointEstimator
PointEstimate
ParameterValue
μ = Population meanSAT score
990 997
σ = Population std.deviation for SAT score
80 s = Sample std.deviation forSAT score
75.2
p = Population pro-portion wantingcampus housing
.72 .68
Summary of Point EstimatesObtained from a Simple Random Sample
= Sample meanSAT score
xx
= Sample pro-portion wantingcampus housing
pp
� Process of Statistical Inference
The value of is used tomake inferences about
the value of μ.
xx The sample data provide a value forthe sample mean .xx
A simple random sampleof n elements is selected
from the population.
Population with mean
μ = ?
Sampling Distribution of xx
The sampling distribution of is the probabilitydistribution of all possible values of the sample mean .
xx
xx
Sampling Distribution of xx
where: μ = the population mean
E( ) = μxx
xxExpected Value of
Sampling Distribution of xx
Finite Population Infinite Population
σ σx nN nN
=−−
( )1
σ σx nN nN
=−−
( )1
σ σx n=σ σx n=
• is referred to as the standard error of themean.σ xσ x
• A finite population is treated as beinginfinite if n/N < .05.
• is the finite correction factor.( ) / ( )N n N− −1( ) / ( )N n N− −1
xxStandard Deviation of
Form of the Sampling Distribution of xx
If we use a large (n > 30) simple random sample, thecentral limit theorem enables us to conclude that thesampling distribution of can be approximated bya normal distribution.
xx
When the simple random sample is small (n < 30),the sampling distribution of can be considerednormal only if we assume the population has anormal distribution.
xx
80 14.630x n
σσ = = =80 14.630x n
σσ = = =
( ) 990E x =( ) 990E x =xx
Sampling Distribution of for SAT Scoresxx
SamplingDistribution
of xx
With a mean SAT score of 990 and a standard deviation of80, what is the probability that a simple random sampleof 30 applicants will provide an estimate of thepopulation mean SAT score that is within +/−10 ofthe actual population mean μ ?
In other words, what is the probability that will bebetween 980 and 1000?
xx
Sampling Distribution of for SAT Scoresxx
Step 1: Calculate the z-value at the upper endpoint ofthe interval.
z = (1000 - 990)/14.6= .68
.2517
Step 2: Find the area under the curve between the meanand the upper endpoint.
Sampling Distribution of for SAT Scoresxx
Sampling Distribution of for SAT Scoresxx
Probabilities forthe Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .
Sampling Distribution of for SAT Scoresxx
xx990
SamplingDistribution
of xx14.6xσ = 14.6xσ =
1000
Area = .2517
Step 3: Calculate the z-value at the lower endpoint ofthe interval.
Step 4: Find the area under the curve between the m ean and the lower endpoint.
z = (980 - 990)/14.6= - .68
= .2517
Sampling Distribution of for SAT Scoresxx
Sampling Distribution of for SAT Scoresxx
xx990
SamplingDistribution
of xx14.6xσ = 14.6xσ =
980
Area = .2517
Sampling Distribution of for SAT Scoresxx
xx980 990
Area = .2517
SamplingDistribution
of xx14.6xσ = 14.6xσ =
1000
Area = .2517
Sampling Distribution of for SAT Scoresxx
Step 5: Calculate the area under the curve betweenthe lower and upper endpoints of the interval.
P(-.68 < z < .68) == .2517 + .2517= .5034
The probability that the sample mean SAT score willbe between 980 and 1000 is:
P(980 < < 1000) = .5034xx
xx1000980 990
Sampling Distribution of for SAT Scoresxx
Area = .5034
SamplingDistribution
of xx14.6xσ = 14.6xσ =
Relationship Between the Sample Sizeand the Sampling Distribution of xx
Suppose we select a simple random sample of 100applicants instead of the 30 originally considered.
E( ) = μ regardless of the sample size. In ourexample, E( ) remains at 990.
xxxx
Whenever the sample size is increased, the standarderror of the mean is decreased. With the increasein the sample size to n = 100, the standard error of themean is decreased to:
xσ xσ
80 8.0100x n
σσ = = =80 8.0100x n
σσ = = =
Relationship Between the Sample Sizeand the Sampling Distribution of xx
( ) 990E x =( ) 990E x =xx
14.6xσ = 14.6xσ =With n = 30,
8xσ = 8xσ =With n = 100,
Recall that when n = 30, P(980 < < 1000) = .5034.xx
Relationship Between the Sample Sizeand the Sampling Distribution of xx
We follow the same steps to solve for P(980 < < 1000)when n = 100 as we showed earlier when n = 30.
xx
Now, with n = 100, P(980 < < 1000) = .7888.xxBecause the sampling distribution with n = 100 has asmaller standard error, the values of have lessvariability and tend to be closer to the populationmean than the values of with n = 30.
xx
xx
Relationship Between the Sample Sizeand the Sampling Distribution of xx
xx1000980 990
Area = .7888
SamplingDistribution
of xx8xσ = 8xσ =
Chapter 7 Sampling and Sampling Distributions
Other Sampling Methods
ppSampling Distribution of
A simple random sampleof n elements is selected
from the population.
Population with proportion
p = ?
� Making Inferences about a Population Proportion
The sample data provide a value for thesample proportion .pp
The value of is usedto make inferences
about the value of p.
pp
Sampling Distribution of pp
E p p( ) =E p p( ) =
Sampling Distribution of pp
where:p = the population proportion
The sampling distribution of is the probabilitydistribution of all possible values of the sampleproportion .pp
pp
ppExpected Value of
σ pp p
nN nN
=− −
−( )1
1σ p
p pn
N nN
=− −
−( )1
1σ p
p pn
=−( )1σ p
p pn
=−( )1
is referred to as the standard error of theproportion.σ pσ p
Sampling Distribution of pp
Finite Population Infinite Population
ppStandard Deviation of
• A finite population is treated as beinginfinite if n/N < .05.
Recall that 72% of theprospective students applyingto St. Andrew’s College desireon-campus housing.
� Example: St. Andrew’s College
Sampling Distribution of pp
What is the probability thata simple random sample of 30 applicants will providean estimate of the population proportion of applicantdesiring on-campus housing that is within plus orminus .05 of the actual population proportion?
σ −= =p.72(1 .72) .082
30σ −= =p
.72(1 .72) .08230
( ) .72E p =( ) .72E p =pp
SamplingDistribution
of pp
Sampling Distribution of pp
Step 1: Calculate the z-value at the upper endpoint ofthe interval.
z = (.77 - .72)/.082 = .61
.2291
Step 2: Find the area under the curve between the m ean and upper endpoint.
Sampling Distribution of pp
Probabilities forthe Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .
Sampling Distribution of pp
.77.72
Area = .2291
pp
SamplingDistribution
of pp
.082pσ = .082pσ =
Sampling Distribution of pp
Step 3: Calculate the z-value at the lower endpoint ofthe interval.
Step 4: Find the area under the curve between the m ean and the lower endpoint.
z = (.67 - .72)/.082 = - .61
.2291
Sampling Distribution of pp
.67 .72
Area = .2291
pp
SamplingDistribution
of pp
.082pσ = .082pσ =
Sampling Distribution of pp
P(.67 < < .77) = .4582pp
Step 5: Calculate the area under the curve betweenthe lower and upper endpoints of the interval.
P(-.61 < z < .61) == .2291 + .2291= .4582
The probability that the sample proportion of applicantswanting on-campus housing will be within +/-.05 of theactual population proportion :
Sampling Distribution of pp
.77.67 .72
Area = .4582
pp
SamplingDistribution
of pp
.082pσ = .082pσ =
Sampling Distribution of pp
Other Sampling Methods
� Stratified Random Sampling� Cluster Sampling� Systematic Sampling� Convenience Sampling� Judgment Sampling
The population is first divided into groups ofelements called strata.
Stratified Random Sampling
Each element in the population belongs to one andonly one stratum.
Best results are obtained when the elements withineach stratum are as much alike as possible(i.e. a homogeneous group).
Stratified Random Sampling
A simple random sample is taken from each stratum.
Formulas are available for combining the stratumsample results into one population parameterestimate.
Advantage: If strata are homogeneous, this methodis as “precise” as simple random sampling but witha smaller total sample size.
Example: The basis for forming the strata might bedepartment, location, age, industry type, and so on.
Cluster Sampling
The population is first divided into separate groupsof elements called clusters.
Ideally, each cluster is a representative small-scaleversion of the population (i.e. heterogeneous group).
A simple random sample of the clusters is then taken.
All elements within each sampled (chosen) clusterform the sample.
Cluster Sampling
Advantage: The close proximity of elements can becost effective (i.e. many sample observations can beobtained in a short time).
Disadvantage: This method generally requires alarger total sample size than simple or stratifiedrandom sampling.
Example: A primary application is area sampling,where clusters are city blocks or other well-definedareas.
Systematic Sampling
If a sample size of n is desired from a populationcontaining N elements, we might sample oneelement for every n/N elements in the population.
We randomly select one of the first n/N elementsfrom the population list.
We then select every n/Nth element that follows inthe population list.
Systematic Sampling
This method has the properties of a simple randomsample, especially if the list of the populationelements is a random ordering.
Advantage: The sample usually will be easier toidentify than it would be if simple random samplingwere used.
Example: Selecting every 100th listing in a telephonebook after the first randomly selected listing
Convenience Sampling
It is a nonprobability sampling technique. Items areincluded in the sample without known probabilitiesof being selected.
Example: A professor conducting research might usestudent volunteers to constitute a sample.
The sample is identified primarily by convenience.
Advantage: Sample selection and data collection arerelatively easy.
Disadvantage: It is impossible to determine howrepresentative of the population the sample is.
Convenience Sampling
Judgment Sampling
The person most knowledgeable on the subject of thestudy selects elements of the population that he orshe feels are most representative of the population.
It is a nonprobability sampling technique.
Example: A reporter might sample three or foursenators, judging them as reflecting the generalopinion of the senate.
Judgment Sampling
Advantage: It is a relatively easy way of selecting asample.
Disadvantage: The quality of the sample resultsdepends on the judgment of the person selecting thesample.
Recommended