Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Chapter 8
Estimation
8.1 Introduction
• So far we have seen how to study the characteristics of samples (sampling distrib-utions)
• Now we can formalize that by discussing statistical inference, how to learn aboutpopulations from random samples.
1. Estimation (Chapter 8-9)—Using observed data to make informed “guesses” about
unknown parameters
2. Hypothesis Testing (Chapter 10)— Testing whether a population has some property,given what we observe in a sample.
8.2 Some Principles
• Suppose that we face a population with an unknown parameter.
• A sample statistic which we use to estimate that parameter is called an estimator,and when we apply this rule to the sample we have an estimate or a point estimate.[See Transparency 8.1 ]
• A simple example: Estimate μ by X̄.
• The estimator is X̄ and the estimate is a specific number we get when wecalculate the sample mean.
• Note the actual value we calculate for the sample mean (like 4.2) is a realizationof a random variable and is called the estimate
• The estimator, X̄ is a random variable (i.e. it has a distribution).
1
2 CHAPTER 8. ESTIMATION
Stati sti cs for Business and Economi cs, 6e © 2007 Pearson E ducation, Inc. Chap 8-5
Point and Interval Estimates
A point estimate is a single number, a confidence interval provides additional information about variability
Point EstimateLower Confidence Limit
UpperConfidence Limit
Width of confidence interval
Figure 8.1:
8.3 Desirable Properties in Choosing Estimators
8.3.1 Unbiasedness
• An estimator is unbiased if its expectation equals the population parameter.
• for instance, denote the true population parameter by θ and the estimator by θ̂, wesay θ̂ is an unbiased estimator of θ
E[θ̂] = θ
• Accordingly, we can define bias as
Bias(θ̂) = E[θ̂]− θ
• We have seen that:
E[X̄] = μ and E[s2] = σ2. (8.1)
• Clearly the sample mean and sample variance are unbiased estimators.
8.3. DESIRABLE PROPERTIES IN CHOOSING ESTIMATORS 3
Stati sti cs for Business and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 8-6
We can estimate a Population Parameter …
Point Estimates
with a SampleStatistic
(a Point Estimate)
Mean
Proportion P
xµ
p̂
Figure 8.2:
• The point of unbiasedness is not that we can check this directly, for we do not knowthe true values of μ and σ2.
• The point is that whatever values they take, the average of our estimators willequal those values.
• The sampling distribution of the estimator is centered over the population parame-ter.
4 CHAPTER 8. ESTIMATION
Stati sti cs for Business and Economi cs, 6e © 2007 Pearson E ducation, Inc. Chap 8-8
is an unbiased estimator, is biased:
1θ̂ 2θ̂
θ̂θ
1θ̂ 2θ̂
Unbiasedness(continued)
Figure 8.3:
8.3. DESIRABLE PROPERTIES IN CHOOSING ESTIMATORS 5
Stati sti cs for Business and Economi cs, 6e © 2007 Pearson E ducation, Inc. Chap 8-9
Bias
Let be an estimator of θ
The bias in is defined as the difference between its mean and θ
The bias of an unbiased estimator is 0
θ̂
θ̂
θ)θE()θBias( −= ˆˆ
Figure 8.4:
6 CHAPTER 8. ESTIMATION
Stati sti cs for Business and Economi cs, 6e © 2007 Pearson E ducation, Inc. Chap 8-10
Consistency
Let be an estimator of θ
is a consistent estimator of θ if the difference between the expected value of and θ decreases as the sample size increases
Consistency is desired when unbiased estimators cannot be obtained
θ̂
θ̂θ̂
Figure 8.5:
8.3.2 Efficiency: Minimum Variance
• A second criterion to apply in choosing an estimator is that it should have as smalla sample variance as possible.
Example:
• Suppose we want to estimate μ and we have two samples to choose from, one with100 observations and one with 200 observations.
• Because the variance of X̄ is σ2/n we will have a smaller variance by using thelarger sample, though both are unbiased.
• We say that the estimator using the larger number of observations is more efficient.
8.4 Minimum Variance Unbiased Estimator
• Let θ̂ be an unbiased estimator and let θ̃ be any other unbiased estimator of θ.
8.5. CONFIDENCE INTERVAL ESTIMATOR 7
• If V [θ̂] ≤ V [θ̃], for any θ̃, then θ̂ is a minimum variance unbiased estimatorof θ.
• We can show that X̄ is a minimum variance unbiased estimator of μ.
8.4.1 Relative Efficiency
• The relative effiiciency of θ̂ to θ̃
Relative Efficiency =V [θ̃]
V [θ̂]
• For the minimum variance unbiased estimator this ratio is always greater than 1.
• The median is less efficient than the sample mean in estimating the population mean
8.5 Confidence Interval Estimator
• So far we have seen simple examples of point estimates.
• But often we would like to estimate a range which might bracket the trueparameter.
• These ranges are called interval estimates or confidence intervals. [See Trans-parency 8.4 ].
• A confidence interval extimator for a population parameter is a rule for determining(based on sample information) a range, or interval that is likely to include theparameter.
• The corresponding estimate is called a confidence interval estimate.
8 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-3
Confidence Intervals
Content of this chapterConfidence Intervals for the Population Mean, µ
when Population Variance s2 is Knownwhen Population Variance s2 is Unknown
Confidence Intervals for the Population Proportion, (large samples)p̂
Figure 8.6:
8.5. CONFIDENCE INTERVAL ESTIMATOR 9
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-5
Point and Interval Estimates
A point estimate is a single number, a confidence interval provides additional information about variability
Point EstimateLower Confidence Limit
UpperConfidence Limit
Width of confidence interval
Figure 8.7:
10 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-6
We can estimate a Population Parameter …
Point Estimates
with a SampleStatistic
(a Point Estimate)
Mean
Proportion P
xµ
p̂
Figure 8.8:
8.5. CONFIDENCE INTERVAL ESTIMATOR 11
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-18
Confidence Intervals
Population Mean
s2 Unknown
ConfidenceIntervals
PopulationProportion
s2 Known
Figure 8.9:
12 CHAPTER 8. ESTIMATION
8.6 Example: Population Variance σ2 Known
• Let us take an unrealistic but simple example in which we know σ2 but do not
know μ.
8.6. EXAMPLE: POPULATION VARIANCE σ2 KNOWN 13
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-19
Confidence Interval for µ(s 2 Known)
AssumptionsPopulation variance s2 is knownPopulation is normally distributedIf population is not normal, use large sample
Confidence interval estimate:
(where zα/2 is the normal distribution value for a probability of α/2 in each tail)
nσ
zxμnσ
zx α/2α/2 +<<−
Figure 8.10:
14 CHAPTER 8. ESTIMATION
• Then we know that
Z =X̄ − μ
σ/√n∼ N(0, 1). (8.2)
• Because the estimator X̄ is unbiased, this statistic has a mean of zero.
• We can see from the tabulated, standard normal distribution that there is a prob-ability of .025 that Z < −1.96 and a probability of .025 that Z > 1.96. [SeeTransparency 8.3 ].
• Let us call the sum of those two cut-off probabilities α.
• And let us call the cut-off points −Zα/2 and Zα/2.
• Then the area between these points is 0.95 and α = .05 (so Z .052= 1.96)
P (−Zα/2 <X̄ − μ
σ/√n< Zα/2) = 1− α, (8.3)
• We can (after some careful thinking about inequalities) obtain:
P (X̄ − Zα/2σ√n< μ < X̄ + Zα/2
σ√n) = 1− α. (8.4)
• This gives us a 100 (1-α) % confidence interval for the population mean μ:
X̄ ± Zα/2σ√n
[See Transparencies 8.4 and 8.6 ].
• Margin of error (the sampling error, the bound, or the interval half width) is givenby
ME=Zα/2σ√n
8.6. EXAMPLE: POPULATION VARIANCE σ2 KNOWN 15
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-20
Margin of ErrorThe confidence interval,
Can also be written aswhere ME is called the margin of error
The interval width, w, is equal to twice the margin of error
nσzxμ
nσzx α/2α/2 +<<−
MEx ±
nσzME α/2=
Figure 8.11:
16 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-21
Reducing the Margin of Error
The margin of error can be reduced if
the population standard deviation can be reduced (s ?)
The sample size is increased (n?)
The confidence level is decreased, (1 – α) ?
nσ
zME α/2=
Figure 8.12:
8.7 Example of a Confidence Interval
• Ten patients are given a sleep inducing drug in clinical trials.The average increasein sleep is X̄ = 1.58 hours.
• Now suppose, unrealistically, that we know that σ2 = 1.66.
• Then with n = 10 our 95 percent confidence interval for μ is:
X̄ ± Zα/2σ√n= 1.58± 1.96
√1.66√10
= (.78, 2.38) (8.5)
8.8 Notes and Interpreting Confidence Intervals
• We know that
•
P (X̄ − Zα/2σ√n< μ < X̄ + Zα/2
σ√n) = 1− α. (8.6)
8.8. NOTES AND INTERPRETING CONFIDENCE INTERVALS 17
However from the above example we cannot say:
P (.78 ≤ μ ≤ 2.38) = .95 (8.7)
• Once we have calculated the confidence interval (the realization of a random vari-able) μ is either in or out (ie. probability is zero or 1)
8.8.1 Appropriate Interpretation of a Confidence Interval
• Imagine that we select another sample then work out another confidence intervaland if we keep taking additional samples (of the same size) then we obtain a set ofconfidence intervals.
• We can say that 95% of these confidence intervals contain the true μ.
• We do not know whether any particular interval contains μ or not.
8.8.2 Notes on Confidence Intervals
1. Notice that the for a given α, the confidence interval is smaller as n (sample size)increases.
2. If we wish to make a more confident statement (a smaller α) then the confidenceinterval must be wider (i.e. Zα/2 is larger)
3. If σ increases, the confidence interval increases.
18 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-22
Finding the Reliability Factor, zα/2
Consider a 95% confidence interval:
z = -1.96 z = 1.96
.951 =α−
.0252α= .025
2α=
Point EstimateLower Confidence Limit
UpperConfidence Limit
Z units:
X units: Point Estimate
0
Find z.025 = ±1.96 from the standard normal distribution table
Figure 8.13:
8.8. NOTES AND INTERPRETING CONFIDENCE INTERVALS 19
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-23
Common Levels of Confidence
Commonly used confidence levels are 90%, 95%, and 99%
Confidence Level
Confidence Coefficient, Zα/2 value
1.281.6451.962.332.583.083.27
.80
.90
.95
.98
.99
.998
.999
80%90%95%98%99%99.8%99.9%
α−1
Figure 8.14:
20 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-25
Example
A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
Figure 8.15:
8.9. UNKNOWNVARIANCEANDTHE STUDENT’S TDISTRIBUTION 21
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-26
2.4068μ1.9932
.2068 2.20
)11(.35/ 1.96 2.20
nσz x
<<
±=
±=
±
Example
A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is .35 ohms.
Solution:
(continued)
Figure 8.16:
8.9 Unknown Variance and the student’s t distribu-tion
• With this background, we can now take the usual applied situation where we donot know σ.
• If we replace σ by an unbiased estimate, the sample standard deviation in ourstandardized test statistic we get:
t =X̄ − μ
s/√n
(8.8)
22 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-29
Student’s t Distribution
Consider a random sample of n observationswith mean x and standard deviation s from a normally distributed population with mean µ
Then the variable
follows the Student’s t distribution with (n - 1) degrees of freedom
ns/μxt −
=
Figure 8.17:
8.9. UNKNOWNVARIANCEANDTHE STUDENT’S TDISTRIBUTION 23
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-30
If the population standard deviation s is unknown, we can substitute the sample standard deviation, s
This introduces extra uncertainty, since s is variable from sample to sample
So we use the t distribution instead of the normal distribution
Confidence Interval for µ(s2 Unknown)
Figure 8.18:
• This statistic is distributed as a t distrbution
8.9.1 Notes on the student’s t distribution
1. If we have many samples, this statistic varies across them for two reasons: becauseX̄ and s both will tend to differ from sample to sample.
2. This contrasts with the sample variation in Z which arose only because of variationin X̄.
3. This new statistic will be more variable and its distribution will be more dispersedthan the normal distribution and it is said to follow student’s t distribution. [SeeTransparency 8.7 ].
4. The t-distribution is tabulated (Table 8) just like the normal but depends on thedegrees of freedom, labelled ν = n− 1 for this problem. Hence we have adifferent value for each degrees of freedom.
5. The relationship between variables that are t-distributed and normally distributedis:
t⇒ Z as n→∞ (8.9)
That is the student’s t distribution becomes standard normal for large n. In fact
24 CHAPTER 8. ESTIMATION
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-1
Student’s t Distribution
t0
t (df = 5)
t (df = 13)t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal
Standard Normal
(t with df = 8 )
Note: t Z as n increases
Figure 8.19:
for n ≥ 30 the student’s t distribution is close to the normal.
6. The student’s t distribution is symmetric about 0 and like the normal distributionhas a single mean, median and mode (at 0).
7. The student’s t distribution has thickertails reflecting the added uncertainty fromestimating the variance.
8.10 Confidence Intervals when the Variance is Un-
known
• Now we can form confidence intervals for the population mean when we do notknow the population variance.
P (X̄ − tα/2,n−1s√n< μ < X̄ + tα/2,n−1
s√n) = 1− α. (8.10)
• The confidence interval is:
8.11. NOTESONCONFIDENCE INTERVALSWITH STUDENT’S TDISTRIBUTION 25
Stat isti cs for Business and Econom ics, 6e © 2007 Pearson Education, Inc. Chap 8-35
t distribution valuesWith comparison to the Z value
Confidence t t t ZLevel (10 d.f.) (20 d.f.) (30 d.f.) ____
.80 1.372 1.325 1.310 1.282
.90 1.812 1.725 1.697 1.645
.95 2.228 2.086 2.042 1.960
.99 3.169 2.845 2.750 2.576
Note: t Z as n increases
Figure 8.20:
X̄ ± tα/2,n−1s√n
(8.11)
8.11 Notes on Confidence Intervals with student’s tdistribution
1. The width of the confidence interval will vary with the sample because s varies.
2. We have indexed the t as tα/2,n−1 to remind you that this value depends on boththe confidence level 100(1− α) and the degrees of freedom n-1.
8.11.1 Confidence Interval Example with Unknown Variance
• Let us return to the example of drugs.
• Suppose that in our sample of ten trials X̄ = 1.58 and that s = 1.23. Thens/√n = .389.
26 CHAPTER 8. ESTIMATION
• The value for the degrees of freedom is ν = n− 1 = 9.
• A 95 percent confidence interval implies α = .05, and α/2 = .025 and from TableVI we find t.025,9 = 2.262.
• Thus the confidence interval is (.70, 2.46).
Questions and Points to Note
• Why is this wider even though the sample standard deviation is smaller thanthe population standard deviation which we used in the previous example? [SeeTransparency 8.8 ].
• Suppose that we had n = 20 clinical trials on the drug with the same samplestandard deviation.
• Calculate what our confidence interval be? Why should it be smaller, why?
• As we get more observations (n gets bigger) the fact that we do not know thevariance and need to estimate it becomes unimportant
• The interpretation of the confidence interval is the same as before: If we constructa large number of confidence intervals then we would expect that 95% ofthe intervals will bracket the true (unknown) population mean μ.