Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Chapter 8 – continued
Chapter 8: Sampling distributions of estimatorsSections
8.1 Sampling distribution of a statistic8.2 The Chi-square distributions8.3 Joint Distribution of the sample mean and sample variance
Skip: p. 476 - 4788.4 The t distributions
Skip: derivation of the pdf, p. 483 - 484
8.5 Confidence intervals8.6 Bayesian Analysis of Samples from a Normal Distribution8.7 Unbiased Estimators8.8 Fisher Information
Sampling Distributions 1 / 30
Review from Sections 8.1 - 8.4
Review from Sections 8.1 - 8.4
Chi-square distribution: χ2m, same as Gamma(α = m/2, β = 1/2)
The tm distribution: If Y ∼ χ2m and Z ∼ N(0,1) are independent
then Z√Y/m∼ tm.
Let X1, . . . ,Xn be a random sample from N(µ, σ2)If µ is known but σ is not:
n σ̂20
σ2 ∼ χ2n where σ̂2
0 =1n
n∑i=1
(Xi − µ)2
If both (µ, σ) are unknown:
nσ2 Sn ∼ χ2
n−1 where Sn =1n
n∑i=1
(Xi − X n)2
√n(X n − µ)
σ′∼ tn−1 where σ′ =
[∑ni=1(Xi − X n)
2
n − 1
]1/2
Sampling Distributions 2 / 30
8.5 Confidence intervals
Confidence Interval – A frequentist tool
Say we want to estimate θ, or in general g(θ)We also want to know “how good” that estimate is.
Def: Confidence Interval (CI)
Let X1, . . . ,Xn be a random sample from f (x |θ), where θ is unknown(but not random). Let g(θ) be a real-valued function and let A and B bestatistics where
P (A < g(θ) < B) ≥ γ ∀θ .
The random interval (A,B) is called a 100γ% confidence interval forg(θ). If “=”, the CI is exact .
After the random variables X1, . . . ,Xn have been observed and thevalues of A = a and B = b have been computed, the interval (a,b)is called the observed confidence interval .
Sampling Distributions 3 / 30
8.5 Confidence intervals
Confidence Interval - Mean of a Normal Distribution
Last time we saw the following exampleLet X1, . . . ,Xn be a random sample from N(µ, σ2)
Let
X n =1n
n∑i=1
Xi and σ′ =
(∑ni=1(Xi − X n)
2
n − 1
)1/2
Then we know that
U =
√n(X n − µ)
σ′
has the tn−1 distribution.We can therefore calculate γ = P(−c < U < c). Turning thisaround, we get
γ = P(
X n − cσ′√
n< µ < X n + c
σ′√n
)Sampling Distributions 4 / 30
8.5 Confidence intervals
Confidence Interval - Mean of a Normal Distribution
Let Tm(x) denote the cdf of the tm distribution.Given γ we can find c so that P(−c < U < c) = γ:
γ = P(−c < U < c) = 2Tn−1(c)− 1
since the t distribution is symmetric around 0. Solving for c we get
c = T−1n−1
(γ + 1
2
)where T−1
n−1 is the quantile function for the tn−1 distribution.So a 100γ% confidence interval for µ is(
X n − T−1n−1
(γ + 1
2
)σ′√
n, X n + T−1
n−1
(γ + 1
2
)σ′√
n
)Sampling Distributions 5 / 30
8.5 Confidence intervals
Example – HotdogsExercise 8.5.7 in the book
Data on calorie content in 20 different beef hot dogs from ConsumerReports (June 1986 issue):
186,181,176,149,184,190,158,139,175,148,152,111,141,153,190,157,131,149,135,132
Assume that these numbers are observed values from a randomsample of twenty independent N(µ, σ2) random variables, where µ andσ2 are unknown.
Observed sample mean and σ′ are
X n = 156.85 and σ′ = 22.64201
Find a 95% confidence interval for µ
Sampling Distributions 6 / 30
8.5 Confidence intervals
Interpretation of a confidence intervalConfidence intervals are a Frequentist tool
We know that
P(
X n − T−1n−1
(γ + 1
2
)σ′√
n< µ < X n + T−1
n−1
(γ + 1
2
)σ′√
n
)= γ
After observing the data we observe the random intervalFor example: (146.25,167.45) is an observed 95% confidenceinterval for µThat does NOT mean that P(146.25 < µ < 167.45) = 0.95.For this statement to make sense we need Bayesian thinking andBayesian methods.
Sampling Distributions 7 / 30
8.5 Confidence intervals
Interpretation of a confidence intervalConfidence intervals are a Frequentist tool
One way of thinking of this: Repeated samples.Take a random sample of size n from N(µ, σ2) and calculate the95% confidence intervalTake another random sample (of the same size n) and do thesame calculations.Repeat. Many times.Since there is a 95% chance that the random intervals cover thevalue of µ we expect 95% of the intervals to cover the actual valueof µ
Problem: We never take more than one sample!
Sampling Distributions 8 / 30
8.5 Confidence intervals
Properties of a confidence interval - Simulation Study
I simulated n=20r.v. from N(8,22) andcalculated the 95% CII repeated that 100times4 of the 100 intervalsdo not cover µ = 8(red intervals)
Sampling Distributions 9 / 30
8.5 Confidence intervals
Non-symmetric confidence intervalsMean of the normal distribution
More generally we want to find
P(c1 < U < c2) = γ
Symmetric confidence interval : Equal probability on either side:
P(U ≤ c1) = P(U ≥ c2) =1− γ
2
Since the distribution of U is symmetric around 0, the shortestpossible for µ is the symmetric confidence interval.One-sided confidence interval : All the extra probability is on oneside.That is, either c1 =∞ or c2 =∞
Sampling Distributions 10 / 30
8.5 Confidence intervals
One-sided Confidence Interval
Def: Lower boundLet A be a statistic so that
P(A < g(θ)) ≥ γ ∀θ
The random interval (A,∞) is a one-sided 100γ% confidenceinterval for g(θ)A is a 100γ% lower confidence limit for g(θ)
Sampling Distributions 11 / 30
8.5 Confidence intervals
One-sided Confidence Interval
Def: Upper boundLet B be a statistic so that
P(g(θ) < B) ≥ γ ∀θ
The random interval (−∞,B) is a one-sided 100γ% confidenceinterval for g(θ).B is a 100γ% upper confidence limit for g(θ)
Sampling Distributions 12 / 30
8.5 Confidence intervals
One-sided Confidence Interval - Mean of a normal
Let X1, . . . ,Xn be a random sample from N(µ, σ2), both µ and σ2
unknown.Find the one-sided 100γ% confidence intervals for µ
Find the observed 95% upper confidence limit for µ for the hotdogexample.
Sampling Distributions 13 / 30
8.5 Confidence intervals
Confidence intervals for other distributions
Def: PivotalLet X = (X1, . . . ,Xn) be a random sample from a distribution thatdepends on parameter θ. Let V (X, θ) be a random variable whosedistribution is the same for all θ. Then V is called a pivotal quantity .
To use this we need to be able to invert the pivotal relationship: find afunction r(v ,x) so that
r(V (X, θ),X) = g(θ).
If the function r is increasing in v for every x, V has a continuousdistribution with cdf F (v) and γ2 − γ1 = γ, then
A = r(
F−1(γ1),X)
and B = r(
F−1(γ2),X)
are the endpoints of an exact 100γ% confidence interval (Theorem8.5.3).
Sampling Distributions 14 / 30
8.5 Confidence intervals
Confidence interval using Pivotal quantities
Example: The rate parameter θ of the exponential distributionX1, . . . ,Xn i.i.d. Expo(θ)Find the γ% upper confidence limit for θFind a symmetric γ% confidence interval for θ
Example: Variance of the normal distributionX1, . . . ,Xn i.i.d. N(µ, σ2), both unknown.Find a symmetric γ% confidence interval for σ2
Find the observed symmetric γ% confidence interval for σ2 for thehotdog example
Sampling Distributions 15 / 30
8.5 Confidence intervals
Problems with interpretation of a confidence interval
Example 8.5.11 is an interesting example.Say X1,X2 are i.i.d. Uniform(θ − 0.5, θ + 0.5)Let Y1 = min(X1,X2) and Y2 = max(X1,X2).Then (Y1,Y2) is a 50% confidence interval for θHowever: If we observe Y1 and Y2 that are more than 0.5 apart,that is y2 − y1 > 0.5 then we know for a certainty that (y1, y2)contains θ! Yet we only assign 50% “confidence” to that interval,which ignores information we have.
Sampling Distributions 16 / 30
Chapter 8 – continued
Chapter 8: Sampling distributions of estimatorsSections
8.1 Sampling distribution of a statistic8.2 The Chi-square distributions8.3 Joint Distribution of the sample mean and sample variance
Skip: p. 476 - 4788.4 The t distributions
Skip: derivation of the pdf, p. 483 - 484
8.5 Confidence intervals8.6 Bayesian Analysis of Samples from a Normal Distribution8.7 Unbiased Estimators8.8 Fisher Information
Sampling Distributions 17 / 30
8.7 Unbiased Estimators
Unbiased Estimators
Suppose that we use an estimator δ(X) to estimate the parameterg(θ)Properties of an estimator (so far): consistency and sufficiencyAnother property of an estimator: unbiasedness
Def: Unbiased Estimator / BiasAn estimator δ(X) is an unbiased estimator of g(θ) if
E(δ(X)) = g(θ) ∀θ .
Otherwise it is called a biased estimator . The bias is defined as
E(δ(X))− g(θ)
Sampling Distributions 18 / 30
8.7 Unbiased Estimators
Examples
X1, . . . ,Xn i.i.d. N(µ, σ2). X n is an unbiased estimator of µ sinceE(X n) = µ for all µ
Unbiased estimators of mean and variance from any distribution:Let X1, . . . ,Xn be a random sample from f (x |θ). The mean andvariance of the distribution (if exist) are functions of θ.X n is an unbiased estimator of the mean E(X1)
Theorem 8.7.1: If variance is finite then σ̂21 is an unbiased
estimator of Var(X ) where
σ̂21 =
1n − 1
n∑i=1
(Xi − X n)2
Note: This means that the MLE of σ2 in N(µ, σ2) is a biasedestimator
Sampling Distributions 19 / 30
8.7 Unbiased Estimators
Mean Square Error (MSE)
Is unbiased good enough?Useless if the estimator has high varianceLook for unbiased estimators with lowest varianceMean squared error: E
((δ(X)− g(θ))2)
Want estimators with small MSE.
Corollary 8.7.1
Let δ(X) be an estimator with finite variance. Then
MSE(δ(X)) = Var(δ(X)) + bias(δ(X))2
⇒ the MSE of an unbiased estimator is equal to its variance.
Searching for unbiased estimator with small variance is equivalentto searching for unbiased estimators with small MSE.
Sampling Distributions 20 / 30
8.7 Unbiased Estimators
Example
Let X1, . . . ,Xn be a random sample from N(µ, σ2) (both µ and σ2 areunknown).
Consider two estimators of σ2
δ1 = Sn (the MLE of σ2)δ2 = σ̂2
1 (unbiased)
Find the MSE of each estimator.Which estimator has smaller MSE?Which estimator do you prefer?
Sampling Distributions 21 / 30
8.7 Unbiased Estimators
Why unbiased?
Sounds good - who wants to be “biased”?However, the variance or MSE are better evaluators of quality ofestimatorsIn many cases there exist biased estimators with smaller MSE
Sampling Distributions 22 / 30
8.8 Fisher Information
Let the pdf of X be f (x |θ)The Fisher information I(θ) in the random variable X is defined as
I(θ) = E
{[d log f (X |θ)
dθ
]2}
Under mild conditions, we have (Theorem 8.8.1)
I(θ) = Var[
d log f (X |θ)dθ
]= −E
[d2 log f (X |θ)
dθ2
]For a random sample X1, . . . ,Xn, the Fisher information In(θ)satisfies that
In(θ) = nI(θ)
Sampling Distributions 23 / 30
8.8 Fisher Information
Cramér-Rao InequalityLet X1, . . . ,Xn be a random sample from a distribution for which the pdfis f (x |θ). For any statistic T , let m(θ) = E(T ). Then under mildconditions, we have
Var(T ) ≥ [m′(θ)]2
nI(θ).
(Corollary 8.8.1) If T is unbiased estimator of θ, then
Var(T ) ≥ 1nI(θ)
.
Efficient estimator of its expectation: if an estimator achieves thelower bound in Cramér-Rao Inequality.Example: X1, . . . ,Xn is a random sample from Poisson(θ). Showthat the MLE is an efficient estimator of θ.
Sampling Distributions 24 / 30
8.8 Fisher Information
Asymptotic Distributions of MLE
Theorem 8.8.5
Let θ̂n be the MLE of θ, then under mild conditions, we have
[nI(θ)]1/2(θ̂n − θ)d→ N(0,1).
MLE is asymptotically efficient
Sampling Distributions 25 / 30
8.8 Fisher Information
Chapter 8: Sampling distributions of estimatorsSections
8.1 Sampling distribution of a statistic8.2 The Chi-square distributions8.3 Joint Distribution of the sample mean and sample variance
Skip: p. 476 - 4788.4 The t distributions
Skip: derivation of the pdf, p. 483 - 484
8.5 Confidence intervals8.6 Bayesian Analysis of Samples from a Normal Distribution8.7 Unbiased Estimators8.8 Fisher Information
Sampling Distributions 26 / 30
8.6 Bayesian Analysis of Samples from a Normal Distribution
Bayesian alternative to confidence intervals
Bayesian inference is based on the posterior distribution.Reporting a whole distribution may not be what you (or your client)wantPoint estimates: Bayesian estimators Minimize the expected lossInterval estimates: simply use quantiles of the posteriordistributionFor example: We can find constants c1 and c2 so that
P(c1 < θ < c1|X = x) ≥ γ
The interval (c1, c2) is called a 100γ% Credible interval for θNote: The interpretation is very different from interpretation ofconfidence intervals
Sampling Distributions 27 / 30
8.6 Bayesian Analysis of Samples from a Normal Distribution
Example: the normal distribution
Let X1, . . . ,Xn be a random sample for N(µ, σ2)
In Chapter 7.3 we saw:If σ2 is known, the normal distribution is a conjugate prior for µTheorem 7.3.3: If the prior is µ ∼ N(µ0, ν
20) the posterior of µ is
also normal with mean and variance
µ1 =σ2µ0 + nν2
0xn
σ2 + nν20
and ν21 =
σ2ν20
σ2 + nν20
We can obtain credible intervals for µ from this N(µ1, ν21) posterior
distribution
Sampling Distributions 28 / 30
8.6 Bayesian Analysis of Samples from a Normal Distribution
Example: the normal distribution
What if both µ and σ2 are unknown?Use the joint distribution of µ and σ2 as the prior;Conjugate priors are available: the Normal-Inverse Gammadistribution;To give credible intervals for µ and σ2 individually we need themarginal posterior distributions
Sampling Distributions 29 / 30
8.6 Bayesian Analysis of Samples from a Normal Distribution
END OF CHAPTER 8
Sampling Distributions 30 / 30