Upload
hoangdung
View
215
Download
0
Embed Size (px)
Simple Regression Model 1
The Simple Regression Model
Simple Regression Model 2
Simple regression model: Objectives
Given the model:
- where y is earnings and x years of education
- Or y is sales and x is spending in advertising
- Or y is the market value of an apartment and x is its area
Our objectives for the moment will be:
- Estimate the unknown parameters β0 andβ1
- Assess “how much”x explains y
Simple Regression Model 3
The Simple Linear Regression (SLR) Model with Cross-Sectional Data
• y is the dependent variable (or explained variable, or response variable, or theregressand)
• x is the independent variable (or explanatory variable, or control variable, orthe regressor)
• u is the error term or disturbance (y is allowed to vary for the same level ofx)
• This is a Population equation. It islinear in the (unknown) parameters
Assumption SLR.1(Linearity in parameters)
Simple Regression Model 4
Further assumptionsAssumption SLR.2(Random Sampling)
We have a random sample from the population such that:
Assumption SLR.3(Variation in the Regressor)
xi, i=1,2,...,n don’t have all the same value
Simple Regression Model 5
Further assumptions
ueducationEarnings ++= 10 ββ
This is not a restrictive assumption, since we can always use β0 so that SLR.4´holds
Assumption SLR.4(Zero Conditional Mean)
Assumption SLR.4(́Zero Mean)
Not really necessary given that SLR.4 implies SLR.4´
Crucial assumption! Remember previous examples:
uPolicemanCrimeRate ++= 10 ββIn these cases SLR.4 most likely fails.
Simple Regression Model 6
Implication of Zero Conditional Mean
..
x1 x2
E(y|x) = β0 + β1x
y
f(y)
implies
For anyx, distribution ofy is centered at E(y|x)
Simple Regression Model 7
.
..
.
y4
y1
y2
y3
x1 x2 x3 x4
}
}
{
{
u1
u2
u3
u4
x
y
Population regression line, sample data pointsand the associated error terms
E(y|x) = β0 + β1x
Simple Regression Model 8
Estimation of unknown parameters by method of moments
• Want to estimate the population parameters from a sample
• Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population
• yi = β0 + β1xi + ui
implies
implies
since
i)
ii)
Simple Regression Model 9
Estimation of unknown parameters by method of moments (intuition)
Clever idea is to pick so that sample moments match the population moments
These are two equations in the two unknowns
The sample counterparts are:
Simple Regression Model 10
Estimation of unknown parameters by method of moments (solution)
Need sample variation of the x variable.Otherwise the slope parameter is notwell defined!
implies:
Plug-in in
to obtain:
this simplifies to:
Simple Regression Model 11
Slope estimator
• The slope estimate is the sample covariance between x and ydivided by the sample variance of x
• If x and y are positively correlated, the slope will be positive
• If x and y are negatively correlated, the slope will be negative
• Need x to vary in our sample
It is anestimator if we treat (xi,yi) as randomvariables. It is an estimate once we substitute(xi,yi) by the actual observations.
Simple Regression Model 12
Example and interpretation
• wageis net monthly wage andeducmeasures the number ofyears of schooling completed
• So, we estimate a positive relation between these twovariables; an additional year of schooling leads to an estimatedexpected increase of 52.513 euros in the wage
• Again, must be cautious about the interpretation of theseestimates
n=11064, Data from the Employment Survey (INE), 2003
Simple Regression Model 13
Definitions
Fitted value
Residual
Sum of Squared Residuals (RSS)
Simple Regression Model 14
.
..
.
y4
y1
y2
y3
x1 x2 x3 x4
}
}
{
{
û1
û2
û3
û4
x
y
Sample regression line, sample data pointsand the associated estimatederror terms
(called residuals); fitted values are on the line
xy 10ˆˆˆ ββ +=
Simple Regression Model 15
Alternate approach to derivationOLS – Ordinary Least Squares
• Can fit a line so as to minimise the sum of squared residuals
Equivalent to the equations we had before: same solution!
Simple Regression Model 16
Review – OLS estimators
OLS obtained by minimising:
Fitted value:
Residual:
Given a random sample:
with respect to
Alternatively, can exploit the fact that:
Simple Regression Model 17
Today
- Assess “how much”x explains y
- Assess “how close” our estimators
are from the true (population) values
Simple Regression Model 18
Looking for a Goodness of fit measure: Decomposition of the Total Sum of
Squares (SST) We know:
SSE, Explained Sum ofSquares
This implies: , to be used later
Also: Now define the Total Sum of SquaresSST
Simple Regression Model 19
Goodness-of-Fit
• How well does our sample regression line fit the data? If SSRis very small relative to SST(or SSEvery large relative to SST), then a lot of the variation in the dependent variable y is “explained” by the model
• Can compute the fraction of the total sum of squares (SST) that is explained by the model, denote this as the R-squared of the regression
• R2 = SSE/SST= 1 –SSR/SST
• The lower SSR, the higher will be R2
• OLS maximises R2 (minimisesSSR). We have always 0≤ R2 ≤1
Simple Regression Model 20
Example (continued)
• A lot of variation in the dependent variable (wage) is left unexplained by the model
• This isnot related to the fact thateducis most probably correlated withu
• Can have low R2 even if all the assumptions hold true
n=11064, Data from the Employment Survey (INE), 2003
Simple Regression Model 21
Assumptions again...Assumption SLR.1(Linearity in parameters)
Assumption SLR.2(Random Sampling from the population)
We have a random sample
Assumption SLR.3(Variation in the Regressor)
xi, i=1,2,...,n don’t have all the same value
Assumption SLR.4(Zero Conditional Mean)
This implies
Simple Regression Model 22
Properties of OLS EstimatorsThought experiment:
• Get a sample of sizen from the population many times (infinite times)• Compute the OLS estimates• Is the average of these estimates equal to the true parameter values?
• Under the previous assumption (SLR.1 to SLR.4) it turnsout that theanswer is positive
It will be the case that:
These are estimators if we treat (xi,yi) as random variables
Simple Regression Model 23
Unbiasedness of OLS (Proof)
∑ = −=n
i ix xxSST1
2)(
Let us rewrite our estimator as:
Put:
Now, note that: ( ) ( ) ( )2111
and 0 ∑∑∑ ===−=−=−
n
i ii
n
i i
n
i i xxxxxxx
Simple Regression Model 24
Unbiasedness of OLS (cont)
Now take expectationsConditional on the values of the x’s, that is, treat thex’s asconstants. To avoid heavy notation, I don´t make this explicit.
Now, it´s really easy to prove unbiasedness of
Try it yourself!
And if the conditional expectation is a constantthe unconditinal expectation is also that constant...
Simple Regression Model 25
Unbiasedness Summary
• The OLS estimators ofβ1 andβ0 are unbiased
• However, in a given sample we may be “close” or “far” from the true parameter, unbiasedness is a minimal requirement
• Unbiasedness depends on the 4 assumptions:if any assumption fails, then OLS is not necessarily unbiased
Simple Regression Model 26
Variance of the OLS Estimators
• We know already that the sampling distribution of our estimators is centered around the true parameter
• Is the distribution concentrated around the true parameter?
• To answer this question, we add an additional assumption
Assumption SLR.5(Homoskedasticity)
Simple Regression Model 27
Variance of OLS (cont.)
• Var(u|x) = E(u2|x)-[E(u|x)]2
• E(u|x) = 0, soσ2 = E(u2|x) = E(u2) = Var(u)
• So,σ2 is also the unconditional variance, called the variance of
the error
• σ, the square root of the error variance is called the standard deviation of the error
• Can write: E(y|x)=β0 + β1x and Var(y|x) = σ2
Assumption SLR.5(Homoskedasticity)
Simple Regression Model 28
..
x1 x2
Homoskedastic Case
E(y|x) = β0 + β1x
y
f(y|x)
Simple Regression Model 29
.
xx1 x2
yf(y|x)
Heteroskedastic Case
x3
..
E(y|x) = β0 + β1x
Simple Regression Model 30
Variance of OLS (cont.)
( )( ) ( )
1 1
2 22
2 222 2 2
ˆ 1 ( )
1 1( ) ( )
1 1( )
i ix
i i i ix x
i xx x x
Var Var x x uSST
Var x x u x x Var uSST SST
x x SSTSST SST SST
β β
σσ σ
= + − = = − = − = − = =
∑∑ ∑∑
This is a conditional variance!
∑ = −=n
i ix xxSST1
2)(
)()()(
)()()( 2
XVarYVarXYVar
XVarbbXVarbXaVar
+=+
==+
Note that:
- a andb are constants andX a Random Variable
- If X andYare independent
Simple Regression Model 31
Variance of OLS Summary
• The larger the error variance, σ2, the larger the variance of the slope estimator
• The larger the variability in the xi, the smaller the variance of the slope estimate
• SSTx tends to increase with the sample size n, so variance tends to decrease with the sample size
• Can also show:
( )xSST
Var2
1̂σβ = ∑ = −=
n
i ix xxSST1
2)(
Simple Regression Model 32
Estimating the Error Variance• In practice, the error variance σ2 is unknown, must estimate
it...
• Can show that this is an unbiased estimator of the error variance
• The so-called standar error of the regression is given by:
Simple Regression Model 33
Estimated standard error (se) of the parameters
And similarly for
In our leading example: