View
32
Download
0
Category
Tags:
Preview:
DESCRIPTION
Lecture :Apply Gauss Markov Modeling Regression with One Explanator. (Chapter 3.1–3.5, 3.7 Chapter 4.1–4.4). Agenda. Finding a good estimator for a straight line through the origin : Chapter 3.1–3.5, 3.7 Finding a good estimator for a straight line with an intercept : Chapter 4.1–4.4. - PowerPoint PPT Presentation
Citation preview
5-1
Lecture :Apply Gauss Markov Modeling
Regression with One Explanator
(Chapter 3.1–3.5, 3.7Chapter 4.1–4.4)
5-2
Agenda
• Finding a good estimator for a straight line through the origin: Chapter 3.1–3.5, 3.7
• Finding a good estimator for a straight line with an intercept: Chapter 4.1–4.4
5-3
Where Are We? ( 範例 )
• We wish to uncover quantitative features of an underlying process, such as the relationship between family income and financial aid.
• 更精準些 How much less aid will I receive on average for each dollar of additional family income?
• DATA, a sample of the process, for example observations on 10,000 students’ aid awards and family incomes.
5-4
隨機項
• Other factors (), such as number of siblings, influence any individual student’s aid, so we cannot directly observe the relationship between income and aid.
• We need a rule for making a good guess about the relationship between income and financial aid, based on the data.
5-5
Guess
• A good guess is a guess which is right on average.
• We also desire a guess which will have a low variance around the true value.
5-6
估計式
• Our rule is called an “estimator.”
• We started by brainstorming a number of estimators and then comparing their performances in a series of computer simulations.
• We found that the Ordinary Least Squares estimator dominated the other estimators.
• Why is Ordinary Least Squares so good?
5-7
工具
• To make more general statements, we need to move beyond the computer and into the world of mathematics.
• Last time, we reviewed a number of mathematical tools: summations, descriptive statistics, expectations, variances, and covariances.
5-8
DGP
• As a starting place, we need to write down all our assumptions about the way the underlying process works, and about how that process led to our data.
• These assumptions are called the “Data Generating Process.”
• Then we can derive estimators that have good properties for the Data Generating Process we have assumed.
5-9
Model
• The DGP is a model to approximate reality. We trade off realism to gain parsimony and tractability.
• Models are to be used, not believed.
5-10
DGP assumptions
• Much of this course focuses on different types of DGP assumptions that you can make, giving you many options as you trade realism for tractability.
5-11
Two Ways to Screw Up in Econometrics
– Your Data Generating Process assumptions missed a fundamental aspect of reality (your DGP is not a useful approximation); or
– Your estimator did a bad job for your DGP.
• Today we focus on picking a good estimator for your DGP.
5-12
GMT
• Today, we will focus on deriving the properties of an estimator for a simple DGP: the Gauss–Markov Assumptions.
• First we will find the expectations and variances of any linear estimator under the DGP.
• Then we will derive the Best Linear Unbiased Estimator (BLUE).
5-13
Our Baseline DGP: Gauss–Markov(Chapter 3)
• Y = X +• E(i ) = 0
• Var(i ) = 2
• Cov(i ,j ) = 0, for i ≠ j
• X ’s fixed across samples (so we can treat them like constants).
• We want to estimate
5-14
A Strategy for Inference
• The DGP tells us the assumed relationships between the data we observe and the underlying process of interest.
• Using the assumptions of the DGP and the algebra of expectations, variances, and covariances, we can derive key properties of our estimators, and search for estimators with desirable properties.
5-15
An Example: g1
YiX
i
i
E(i) 0
Var(i) 2
Cov(i,
j) 0, for i j
X 's fixed across samples (so we can treat it as a constant).
g11
n
Yi
Xii1
n
In our simulations, g
1 appeared to give estimates close to .
Was this an accident, or does g1 on average give us ?
5-16
An Example: g1 (OK on average)
E(g1) E(
1
n
Yi
Xii1
n
) 1
nE(
Yi
Xi
) i1
n
1
nE(
Xi
i
Xi
)i1
n
1
nE() 1
n
1
Xi
E(i)
i1
n
i1
n
1
nn 0
On average, g1.
E(g1)
Using the DGP and the algebra of expectations,
we conclude that g1 is unbiased.
5-17
Checking Understanding
E(g1) E(
1
n
Yi
Xii1
n
) 1
nE(
Yi
Xi
) i1
n
1
nE(
Xi
i
Xi
)i1
n
1
nE() 1
n
1
Xi
E(i)
i1
n
i1
n
1
nn 0
E(g1)
Question: which DGP assumptions did we need to use?
5-18
Which assumption used?
E(g1) E(
1
n
Yi
Xii1
n
) 1
nE(
Yi
Xi
) i1
n
1
nE(
Xi
i
Xi
)i1
n
Here we used Y
iX
i
i
1
nE() 1
n
1
Xi
E(i)
i1
n
i1
n
Here we used the assumption that X 's
are fixed across samples.
1
nn 0
Here we used E(i) 0
5-19
Checking Point 2:
We did NOT use the assumptions about
the variance and covariances of i.
We will use these assumptions when we
calculate the variance of the estimator.
5-20
Linear Estimators
• g1 is unbiased. Can we generalize?
• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.
ˆi iwY
5-21
Linear Estimators (weighted sum)
ˆi iwY
1
1
1
1
i
i
ii
i i
Yg
n X
wnX
g wY
• Linear estimator:
• Example: g1 is a linear estimator.
5-22
A class of Linear Estimators
1) Mean of Ratios: 3) Mean of Ratio of Changes:
g11
n
Yi
Xi
g3 1
n 1
Yi Y
i 1
Xi X
i 1
wi 1
nXi
wi 1
n 1
1
Xi X
i1
1
Xi 1
Xi
2) Ratio of Means: 4) Ordinary Least Squares:
g2
Yi
Xi g
4
YiX
iX
j2
wi 1
Xj
wi
Xi
Xj2
• All of our “best guesses” are linear estimators!
5-23
2
1
1 1 1
1 1
( ) 0
( ) ( , ) 0,
ˆ
ˆ( ) ( ) ( ) ( )
[ ( ) ( )]
i i i i
i i j
n
i ii
n n n
i i i i i i ii i i
n n
i i i i ii i
Y X E
Var Cov i j
X
wY
E E wY w E Y w E X
w E X E w X
for
's fixed across samples (so we can treat it as a constant).
Expectation of Linear Estimators
5-24
Condition for Unbias
1
1
1
ˆ
ˆ( )
1.
n
i ii
n
i ii
n
i ii
wY
E w X
w X
A linear estimator is unbiased if
5-25
Check others
• A linear estimator is unbiased if SwiXi = 1
• Are g2 and g4 unbiased?
2) Ratio of Means: 4) Ordinary Least Squares:
g2
Yi
Xi g
4
YiX
iX
j2
wi 1
Xj
wi
Xi
Xj2
wiX
i 1
Xj
Xi w
iX
i X
i
Xj2
Xi
1
Xj
Xi 1 1
Xj2
Xi2 1
5-26
Better unbiased estimator
• Similar calculations hold for g3
• All 4 of our “best guesses” are unbiased.
• But g4 did much better than g3. Not all unbiased estimators are created equal.
• We want an unbiased estimator with a low mean squared error.
5-27
First: A Puzzle…..
• Suppose n = 1
–Would you like a big X or a small X for that observation?
–Why?
5-28
What Observations Receive More Weight?
1) Mean of Ratios: 3) Mean of Ratio of Changes:
g11
n
Yi
Xi
g3 1
n 1
Yi Y
i 1
Xi X
i 1
wi 1
nXi
wi 1
n 1
1
Xi X
i1
1
Xi 1
Xi
2) Ratio of Means: 4) Ordinary Least Squares:
g2
Yi
Xi g
4
YiX
iX
j2
wi 1
Xj
wi
Xi
Xj2
5-29
(Stat. significant)?
g11
n
Yi
Xi
g3 1
n 1
Yi Y
i 1
Xi X
i 1
wi 1
nXi
wi 1
n 1
1
Xi X
i1
1
Xi 1
Xi
• g1 puts more weight on observations with low values of X.
• g3 puts more weight on observations with low values of X, relative to neighboring observations.
• These estimators did very poorly in the simulations.
5-30
What Observations Receive More Weight? (cont.)
2 4 2
2
1
i i i
i j
ii i
j j
Y Y Xg g
X X
Xw w
X X
• g2 weights all observations equally.
• g4 puts more weight on observations with high values of X.
• These observations did very well in the simulations.
5-31
Why Weight More Heavily Observations With High X ’s?
• Under our Gauss–Markov DGP the disturbances are drawn the same for all values of X….
• To compare a high X choice and a low X choice, ask what effect a given disturbance will have for each.
5-32
Figure 3.1 Effects of a Disturbance for Small and Large X
5-33
Linear Estimators and Efficiency
• For our DGP, good estimators will place more weight on observations with high values of X
• Inferences from these observations are less sensitive to the effects of the same
• Only one of our “best guesses” had this property.
• g4 (a.k.a OLS) dominated the other estimators.
• Can we do even better?
5-34
Min. MSE
• Mean Squared Error = Variance + Bias2
• To have a low Mean Squared Error, we want two things: a low bias and a low variance.
5-35
Need Variance
• An unbiased estimator with a low variance will tend to give answers close to the true value of
• Using the algebra of variances and our DGP, we can calculate the variance of our estimators.
5-36
Algebra of Variances
2
1 1 1 1
( ) 0
( ) · ( )
( ) ( )
( ) ( ) ( ) 2 ( , )
( ) ( ) ( , )n n n n
i i i ji i i j
j i
Var k
Var kY k Var Y
Var k Y Var Y
Var X Y Var X Var Y Cov X Y
Var Y Var Y Cov Y Y
(1)
(2)
(3)
(4)
(5)
• One virtue of independent observations is that Cov( Yi ,Yj ) = 0, killing all the cross-terms in the variance of the sum.
5-37
Back again to Our Baseline DGP: Gauss–Markov
• Our benchmark DGP: Gauss–Markov
• Y = X +• E(i ) = 0
• Var(i ) = 2
• Cov(i ,j ) = 0, for i ≠ j
• X ’s fixed across samples
We will refer to this DGP (very) frequently.
5-38
Variance of OLS
2
2 2 21 1,
2
2
2
2
ˆ( )
2 ( ,
0
i iOLS
i
n nj ji i i i
i ji k kj i
ii
k
ii i
k
X YVar Var
X
X YX Y X YVar Cov
X X X
XVar Y
X
XVar X
X
5-39
Variance of OLS (cont.)
2
2
2
2
2 22
2 2 22 2 2
ˆ( )
(0 0)
1
iOLS i i
k
ii
k
ii
k k k
XVar Var X
X
XVar
X
XX
X X X
• Note: the higher the Xk2 , the lower
the variance.
5-40
Variance of a Linear Estimator
• More generally:
2
2
2
2 2
( ) ( ) 2
( ) 0 ( )
( )
0 ( ) 0
i i i i
i i i i
i i i
i i
i
Var wY Var wY Covariance Terms
Var wY w Var Y
w Var X
w Var
w
5-41
Variance of a Linear Estimator (cont.)
• The algebras of expectations and variances allow us to get exact results where the Monte Carlos gave only approximations.
• The exact results apply to ANY model meeting our Gauss–Markov assumptions.
5-42
Variance of a Linear Estimator (cont.)
• We now know mathematically that g1–g4 are all unbiased estimators of under our Gauss–Markov assumptions.
• We also think from our Monte Carlo models that g4 is the best of these four estimators, in that it is more efficient than the others.
• They are all unbiased (we know from the algebra), but g4 appears to have a smaller variance than the other 3.
5-43
Variance of a Linear Estimator (cont.)
• Is there an unbiased linear estimator better (i.e., more efficient) than g4?
–What is the Best, Linear, Unbiased Estimator?
– How do we find the BLUE estimator?How do we find the BLUE estimator?
5-44
BLUE Estimators
• Mean Squared Error = Variance + Bias2
• An unbiased estimator is right “on average”
• In practice, we don’t get to average. We see only one draw from the DGP.
5-45
BLUE Estimators (Trade-off ??)
• Some analysts would prefer an estimator with a small bias, if it gave them a large reduction in variance
• What good is being right on average if you’re likely to be very wrong in your one draw?
5-46
BLUE Estimators (cont.)
• Mean Squared Error = Variance + Bias2
• In a particular application, there may be a favorable trade-off between accepting a little bias in return for a lot less variance.
• We will NOT look for these trade-offs.
• Only after we have made sure our estimator is unbiased will we try to make the variance small.
5-47
BLUE Estimators (cont.)
A Strategy for Finding the Best Linear Unbiased Estimator:
1. Start with linear estimators: wiYi
2. Impose the unbiasedness condition wiXi=1
3. Calculate the variance of a linear estimator: Var(wiYi) =2wi
2
1. Use calculus to find the wi that give the smallest variance subject to the unbiasedness condition
Result: the BLUE Estimator for Our DGP
5-48
BLUE Estimators (cont.)
2i
ij
Xw
X
Using calculus, we would find
This formula is OLS!
OLS is the Best Linear Unbiased Estimator for
the Gauss–Markov DGP.
This result is called the Gauss–Markov Theorem.
5-49
BLUE Estimators (cont.)
• OLS is a very good strategy for the Gauss–Markov DGP.
• OLS is unbiased: our guesses are right on average.
• OLS is efficient: it has a small variance (or at least the smallest possible variance for unbiased linear estimators).
• Our guesses will tend to be close to right (or at least as close to right as we can get; the minimum variance could still be pretty large!)
5-50
BLUE Estimator (cont.)
• According to the Gauss–Markov Theorem, OLS is the BLUE Estimator for the Gauss–Markov DGP.
• We will study other DGP’s. For any DGP, we can follow this same procedure:
– Look at Linear Estimators
– Impose the unbiasedness conditions
– Minimize the variance of the estimator
5-51
Example: Cobb–Douglas Production Functions (Chapter 3.7)
• A classic production function in economics is the Cobb–Douglas function.
• Y = aLK1-
• If firms pay workers and capital their marginal product, then worker compensation equals a fraction of total output (or national income).
5-52
Example: Cobb–Douglas
• To illustrate, we randomly pick 8 years between 1900 and 1995. For each year, we observe total worker compensation and national income.
• We use g1, g2, g3, and g4 to estimate Compensation = ·National Income +
5-53
TABLE 3.6 Estimates of the Cobb–Douglas Parameter , with Standard Errors
5-54
TABLE 3.7 Outputs from a Regression* of Compensation on National Income
5-55
Example: Cobb–Douglas
• All 4 of our estimators give very similar estimates.
• However, g2 and g4 have much smaller standard errors. (We will see the value of small standard errors when we cover hypothesis tests.)
• Using our estimate from g4, 0.738, a 1 billion dollar increase in National Income is predicted to increase total worker compensation by 0.738 billion dollars.
5-56
A New DGP
• Most lines do not go through the origin.
• Let’s add an intercept term and find the BLUE Estimator (from Chapter 4).
5-57
Gauss–Markov with an Intercept
Yi
0
1X
i
i (i 1...n)
E(i) 0
Var(i) 2
Cov(i,
j) 0, i j
X 's fixed across samples.
All we have done is add a 0.
5-58
Gauss–Markov with an Intercept (cont.)
• Example: let’s estimate the effect of income on college financial aid.
• Students whose families have 0 income do not receive 0 aid. They receive a lot of aid.
• E[financial aid | family income]
= 0 + 1(family income)
5-59
Gauss–Markov with an Intercept (cont.)
5-60
Gauss–Markov with an Intercept (cont.)
• How do we construct a BLUE Estimator?
• Step 1: focus on linear estimators.
• Step 2: calculate the expectation of a linear estimator for this DGP, and find the condition for the estimator to be unbiased.
• Step 3: calculate the variance of a linear estimator. Find the weights that minimize this variance subject to the unbiasedness constraint.
5-61
Expectation of a Linear Estimator
0 1
0 1
0 1
0 1
ˆ( ) ( )
( ) ( )
( ) ( ) ( )
0
i i i i
i i i i i
i i i i i
i i i
i i i
E E wY E wY
w E Y w E X
w E w E X w E
w w X
w w X
5-62
Checking Understanding
0 1ˆ( ) i i iE w w X
• Question: What are the conditions for an estimator of 1 to be unbiased? What are the conditions for an estimator of 0 to be unbiased?
5-63
0 1ˆ( ) i i iE w w X
Checking Understanding (cont.)
• When is the expectation equal to 1?– When wi = 0 and wiXi = 1
• What if we were estimating 0? When is the expectation equal to 0?– When wi = 1 and wiXi = 0
• To estimate 1 parameter, we needed 1 unbiasedness condition. To estimate 2 parameters, we need 2 unbiasedness conditions.
5-64
Variance of a Linear Estimator
20 1
2
2 2
ˆ( ) 0
0 0 ( ) 0
i i i i
i i i
i i
i
Var Var wY Var wY
w Var X
w Var
w
• Adding a constant to the DGP does NOT change the variance of the estimator.
5-65
BLUE Estimator
1
2 2
12
1
ˆ
0
1
( )( )ˆ
( )
i
i
i i
i in
jj
w
w
w X
X X Y Y
X X
To compute the BLUE estimator for , we want to
minimize
subject to the constraints
Solution:
5-66
BLUE Estimator of 1
12
1
( )( )ˆ ( )
i in
jj
X X Y Y
X X
• This estimator is OLS for the DGP with an intercept.
• It is the Best (minimum variance) Linear Unbiased Estimator for the Gauss–Markov DGP with an intercept.
5-67
BLUE Estimator of 1 (cont.)
• This formula is very similar to the formula for OLS without an intercept.
• However, now we subtract the mean values from both X and Y.
12
1
( )( )ˆ ( )
i in
jj
X X Y Y
X X
5-68
BLUE Estimator of 1 (cont.)
• OLS places more weight on high values of:
• Observations are more valuable if X is far away from its mean.
12
1
( )( )ˆ ( )
i in
jj
X X Y Y
X X
iX X
5-69
BLUE Estimator of 1(cont.)
2
2
2 2 21 2
2 222
2
2
ˆ( )
1( )
ii
j
ii
j
i
j
j
X Xw
X X
X XVar w
X X
X XX X
X X
5-70
0 1ˆ ˆY X
( , )X Y
BLUE Estimator of 0
• The easiest way to estimate the intercept:
• Notice that the fitted regression line always goes through the point
• Our fitted regression line passes through “the middle of the data.”
5-71
Example: The Phillips Curve
• Phillips argued that nations face a trade-off between inflation and unemployment.
• He used annual British data on wage inflation and unemployment from 1861–1913 and 1914–1957 to regress inflation on unemployment.
5-72
Example: The Phillips Curve (cont.)
• The fitted regression line for 1861–1913 did a good job predicting the data from 1914 to 1957.
• “Out of sample predictions” are a strong test of an econometric model.
5-73
0
1
ˆ 0.06
ˆ 0.55
Example: The Phillips Curve (cont.)
• The US data from 1958–1969 also suggest a trade-off between inflation and unemployment.
Unemploymentt 0.06 - 0.55·Inflationt
5-74
Example: The Phillips Curve (cont.)
• How do we interpret these numbers?
• If Inflation were 0, our best guess of Unemployment would be 0.06 percentage points.
• A one percentage point increase of Inflation decreases our predicted Unemployment level by 0.55 percentage points.
Unemploymentt 0.06 - 0.55·Inflationt
5-75
Figure 4.2 U.S. Unemployment and Inflation, 1958–1969
5-76
TABLE 4.1 The Phillips Curve
5-77
Example: The Phillips Curve
• We no longer need to assume our regression line goes through the origin.
• We have learned how to estimate an intercept.
• A straight line doesn’t seem to do a great job here. Can we do better?
5-78
Review
• As a starting place, we need to write down all our assumptions about the way the underlying process works, and about how that process led to our data.
• These assumptions are called the “Data Generating Process.”
• Then we can derive estimators that have good properties for the Data Generating Process we have assumed.
5-79
Review: The Gauss–Markov DGP
• Y = X +• E(i ) = 0
• Var(i ) = 2
• Cov(i ,j ) = 0, for i ≠ j
• X ’s fixed across samples (so we can treat them like constants).
• We want to estimate
5-80
Review
• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.
ˆi iwY
5-81
Review (cont.)
2
1
1
1
( ) 0
( )
( , ) 0,
ˆ
ˆ( )
1.
i i i
i
i
i j
n
i ii
n
i ii
n
i ii
Y X
E
Var
Cov i j
X
wY
E w X
w X
for
's fixed across samples (so we can treat it as a constant).
A linear estimator is unbiased if
5-82
Review (cont.)
YiX
i
i
E(i) 0
Var(i) 2
Cov(i,
j) 0, for i j
X 's fixed across samples (so we can treat it as a constant).
A linear estimator is unbiased if wiX
ii1
n
1.
Many linear estimators will be unbiased. How do I pick the "best"
linear unbiased estimator (BLUE)?
5-83
Review: BLUE Estimators
A Strategy for Finding the Best Linear Unbiased Estimator:
1. Start with linear estimators: wiYi
2. Impose the unbiasedness condition wiXi = 1
3. Use calculus to find the wi that give the smallest variance subject to the unbiasedness condition.
Result: The BLUE Estimator for our DGP
5-84
Review: BLUE Estimators (cont.)
• Ordinary Least Squares (OLS) is BLUE for our Gauss–Markov DGP.
• This result is called the “Gauss–Markov Theorem.”
5-85
Review: BLUE Estimators (cont.)
• OLS is a very good strategy for the Gauss–Markov DGP.
• OLS is unbiased: our guesses are right on average.
• OLS is efficient: the smallest possible variance for unbiased linear estimators.
• Our guesses will tend to be close to right (or at least as close to right as we can get).
• Warning: the minimum variance could still be pretty large!
5-86
Gauss–Markov with an Intercept
Yi
0
1X
i
i (i 1...n)
E(i) 0
Var(i) 2
Cov(i,
j) 0, i j
X 's fixed across samples.
All we have done is add a 0.
5-87
Review: BLUE Estimator of 1
12
1
( )( )ˆ ( )
i in
jj
X X Y Y
X X
• This estimator is OLS for the DGP with an intercept.
• It is the Best (minimum variance) Linear Unbiased Estimator for the Gauss–Markov DGP with an intercept.
5-88
0 1ˆ ˆY X
( , )X Y
BLUE Estimator of 0
• The easiest way to estimate the intercept:
• Notice that the fitted regression line always goes through the point
• Our fitted regression line passes through “the middle of the data.”
Recommended