Upload
bootsectorz
View
229
Download
0
Embed Size (px)
Citation preview
7/31/2019 REGRESSION ANALYSIS - lecture
1/67
Multiple linear regression model with K explanatory variables:
Yt = 0+ 1X1t + 2X2t + + KXKt + t (t = 1,2, ,n)
Deterministic component: (complex), K explanatory variables
t : stochastic / random component
REGRESSION ANALYSIS
- MULTIPLE REGRESSION MODEL -
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
2/67
7/31/2019 REGRESSION ANALYSIS - lecture
3/67
7/31/2019 REGRESSION ANALYSIS - lecture
4/67
DEMONSTRATION : SALES FUNCTION
REGRESSION OUTPUT (Dependent variable: Sales Volume)
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
5/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
indication of significance of parameters in function.
each different data set different estimates of the s
how do estimated coefficients vary depending on the sample? measure of variability = standard error of estimated coefficients =
estimate of square root of variance of distribution of
the larger the standard error the more the estimates will vary
standard error of the slope coefficient of is given by:
(t=1,2,,n).
s
Basics of Regression Analysis
2t )XX(
SEE)(SE
7/31/2019 REGRESSION ANALYSIS - lecture
6/67
DEMONSTRATION : SALES FUNCTION
REGRESSION OUTPUT (Dependent variable: Sales Volume)
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
7/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
t-statistic (much easier to evaluate):
(k=1,2,,K)
Rule of thumb:
If t-value of coefficient i > 1.96: variable i significant
If t-value of coefficient i < 1.96: drop variable i
OR use t-table to derive critical value to compare t-statistic to
Reject H0 (where H0: i = 0) if |ti | > t crit
Basics of Regression Analysis
)(SE
tk
kk
7/31/2019 REGRESSION ANALYSIS - lecture
8/67
DEMONSTRATION : SALES FUNCTION
REGRESSION OUTPUT (Dependent variable: Sales Volume)
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
9/67
7/31/2019 REGRESSION ANALYSIS - lecture
10/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
close to one: much variation in dependent variable explainedRule of thumb:
> 0.8: acceptable for time-series data, lower value acceptable forcross-section data
Basics of Regression Analysis
2R
2
R
7/31/2019 REGRESSION ANALYSIS - lecture
11/67
DEMONSTRATION : SALES FUNCTION
REGRESSION OUTPUT (Dependent variable: Sales Volume)
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
12/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
test overall significance of equation (joint significance)
Rule of thumb:
If F > 4: coefficients are jointly significant
OR use F table to derive critical value to compare F-statistic to
Reject H0 (where H0: 1 = 2 = = K = 0) if F > F crit
)1Kn/(e
K/)YY(
F 2i
2
i
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
13/67
DEMONSTRATION : SALES FUNCTION
REGRESSION OUTPUT (Dependent variable: Sales Volume)
Basics of Regression Analysis
7/31/2019 REGRESSION ANALYSIS - lecture
14/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
Durbin-Watson statistic (DW) (1)
tests for the existence of first order serial correlation
first order serial correlation will occur when et's are correlated with each
other
Rule of thumb: 0
7/31/2019 REGRESSION ANALYSIS - lecture
15/67
EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -
Durbin-Watson statistic (DW) (2)
Critical values indicating the upper (dU) and lower (dL) values for
combinations of n (number of observations) and K' (number of
independent variables excluding the constant) may be read off in the
DW table.
6 possible results for the Durbin-Watson statistic:
Basics of Regression Analysis
VALUE OF DW RESULT
(4-dL) < DW < 4 Negative serial correlation
(4-dU) < DW < (4 -dL) Result undetermined
2 < DW < (4 -dU) No serial correlationdU < DW < 2 No serial correlation
dL < DW < dU Result undetermined
0 < DW < dL Positive serial correlation
7/31/2019 REGRESSION ANALYSIS - lecture
16/67
STATIONARITY VERSUSNON-STATIONARITY
Stationary series: mean-reverting and constant variance
DGP (data-generating process): AR(1) yt=yt-1 + vt v ~IID(0,2)
ywill be: Weakly stationary where < 1
Non-stationary where = 1
Explosive where > 1
Often we do not consider the latter a possible d.g.p.
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
17/67
STATIONARITY VERSUSNON-STATIONARITY
-3-2-1012
10 20 30 40 50 60 70 80 90y = 0.6*y(-1) + v
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
18/67
STATIONARITY VERSUSNON-STATIONARITY
0
2
4
6
8
10 20 30 40 50 60 70 80 90
y = y(-1) + v
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
19/67
STATIONARITY VERSUSNON-STATIONARITY
2000
0
2000
4000
6000
8000
10 20 30 40 50 60 70 80 90
y = 1.1y(-1) + v
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
20/67
STATIONARITY VERSUSNON-STATIONARITY
Proof:
yt=yt-1 + vt
y0
y1 =y0 + v1
y2 =y1 + v2
=(y0 + v1 )+ v2
=2y0 +v1 + v2
y3 =y2+ v3=(2y0 +v1 + v2) + v3
=3y0 +2v1+v2 + v3
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
21/67
STATIONARITY VERSUSNON-STATIONARITY
therefore
and
2
0
30
3
3
i
i
iyy
1
0
0
t
i
it
it
tyy
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
22/67
STATIONARITY VERSUSNON-STATIONARITY
Conclusion: What are the properties of stationary data?
E(yt) = E(yt+h) = < , i.e. constant finite mean over time
E(y2
t) = E(y2
t+h) = , i.e. constant finite variance over time E(ytyt-j) = E(yt+hyt+h-j) = ij < , i.e. constant finite covariance over
time
2
2
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
23/67
Properties of stationary vs. non-stationary data
Stationary Non-stationary
Variance Finite Unbounded, grows to
t2
Memory Temporary Permanent
Expected timebetween crossingsof y
Finite
(mean reverting)
Infinite (mean is
changing over time)
STATIONARITY VERSUSNON-STATIONARITY
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
24/67
STATIONARITY VERSUSNON-STATIONARITY
How are data series transformed?
Stationary variables are obtained by differencing (not necessarily
only once) the non-stationary series. Such a DGP is referred to as
difference stationary.
yt= yt-1 + vt = 1
yt = yt yt-1 = vt
yt~ I(1)
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
25/67
General:
A variable is called I(p) if it must be differenced p times in order to renderit stationary.
We can also say the series contains p unit roots.
For yt=yt-1+ut with ut=white noise
If ||
7/31/2019 REGRESSION ANALYSIS - lecture
26/67
TRANSFORMING A SERIES I(1)
-2
0
2
4
6
8
10 20 30 40 50 60 70 80 90
y
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
27/67
0
20
40
60
80
100
120
140
70 75 80 85 90 95 00
CPI
TRANSFORMING A SERIES I(2)
Stationarity
0.00
0.04
0.08
0.12
0.16
0.20
70 75 80 85 90 95 00
D_LCPI
-0.04
-0.02
0.00
0.02
0.04
0.06
70 75 80 85 90 95 00
DD_LCPI y y y
7/31/2019 REGRESSION ANALYSIS - lecture
28/67
CONSEQUENCES OF NON-STATIONARITY
Non-stationary data series could result in spurious regression
Example:
Two non-stationary data series (yandx)
By construction they have nothing in common
yt= + yt-1 + vt
xt= + xt-1 + ut
Stationarity
),0(~2
vt IIDv
),0(~2ut IIDu
7/31/2019 REGRESSION ANALYSIS - lecture
29/67
Variable Coefficient Std. Error t-Statistic Prob.
X 5.494891 0.091884 59.80278 0
R-squared 0.895314 Mean dependent var 51.97351
Adjusted R-squared 0.895314 S.D. dependent var 29.73505
S.E. of regression 9.620824 Akaike info criterion 7.376207
Sum squared resid 8700.663 Schwarz criterion 7.40309
Log likelihood -349.3698 Durbin-Watson stat 0.059368
Included observations: 95
Dependent Variable: Y
Method: Least Squares
Sample: 1901 1995
Stationarity
CONSEQUENCES OF NON-STATIONARITY
7/31/2019 REGRESSION ANALYSIS - lecture
30/67
CONSEQUENCES OF NON-STATIONARITY
-30
-20
-10
0
10
20
-20
0
2040
60
80
100
120
10 20 30 40 50 60 70 80 90
Residual Actual Fitted
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
31/67
CONSEQUENCES OF NON-STATIONARITY
Stationarity
Test Test Statistic p-value Conclusion
Jarque-Bera 2.4383 [0.2955] Errors are normally
distributed
ARCH LM 78.8189 [0.0000] Heteroscedasticity
White 30.8526 [0.0000] Heteroscedasticity
Breush-Godfrey 89.5272 [0.0000] Serial Correlation
Lung-Box -0.35 [0.0000] Serial Correlation
Ramsey Reset 9.1089 [0.0105] Misspecification
7/31/2019 REGRESSION ANALYSIS - lecture
32/67
CONSEQUENCES OF NON-STATIONARITY
This non-stationarity of the residuals provides us with important
information concerning the relationship betweenytandxt.
Effectively, where residuals are:
Non-stationary: no causal relationship
Stationary: some evidence that suggests causal relationship
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
33/67
CONSEQUENCES OF NON-STATIONARITY
Given the problems associated with non-stationary data series, it is of theutmost importance to identify the univariate properties of the databeing employed in regression analysis.
Stationarity
7/31/2019 REGRESSION ANALYSIS - lecture
34/67
7/31/2019 REGRESSION ANALYSIS - lecture
35/67
Tests available in EViews:
Dickey-Fuller (DF)
Augmented Dickey-Fuller (ADF)
Phillips Perron (PP)
Null hypothesis for all the above tests:
H0:= 1 (series contains unit root)
Unit root testing
UNIT ROOT TESTING
7/31/2019 REGRESSION ANALYSIS - lecture
36/67
Testing for presence of unit roots not straightforward:
Underlying (unknown) d.g.p. may include a time trend
D.g.p. may contain MA terms in addition to being a simple AR
process Small samplebias: Standard tests for unit roots biased towards
accepting nullof non-stationarity (low power of test)
(conclude I(1) when I(0))
Undetected structural breakmay cause under-rejecting of the null
(too often conclude I(1))
Quarterly data may be tested for seasonal unit roots in addition
Unit root testing
UNIT ROOT TESTING
7/31/2019 REGRESSION ANALYSIS - lecture
37/67
7/31/2019 REGRESSION ANALYSIS - lecture
38/67
7/31/2019 REGRESSION ANALYSIS - lecture
39/67
Example: Consider a random walk process,
yt=yt-1+ut with ut=white noise:
It is difference stationary, since the difference is stationary:
yt-yt-1 = (1-L)yt = ut
It is also integrated, denoted I(d) where d is the order of integration.
Order of integration= number of unit roots contained in series / numberof differencing operations necessary to render the series stationary
yt =I(1) above
STATIONARITY
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
40/67
Presence of a stochastic trend(non-stationary) vs. deterministic trend(stationary) complicates unit root testing
Source: Harris 1995:18
DIFFERENCE-STATIONARITY vs. TREND-STATIONARARITY
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
41/67
COINTEGRATION THE CONCEPT
An econometric concept which mimics the existence of a long-run
equilibrium among economic time series
A drunk and herdog
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
42/67
COINTEGRATION THE CONCEPT
Data series, although non-stationary, can be combined (linear
combination) into a single series which is itself stationary
+
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
43/67
7/31/2019 REGRESSION ANALYSIS - lecture
44/67
Example
yt~I(1); xt~I(1):
If ut~I(0), then two series are cointegrated of order CI(1,1)
COINTEGRATION
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
45/67
Example: A cointegrated stable (equilibrium) relationship possibly exist
between mt and pt (=1.1085; t=41.92)
Source: Harris 1995:22
COINTEGRATION
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
46/67
L d l d h ff
7/31/2019 REGRESSION ANALYSIS - lecture
47/67
COINTEGRATION
Consider, for example, a three variable case
yt=1 x1t+2x2t+ et
Now it is possible for the variables to be integrated of different orders
and for the error term, to be stationary (et~ I(0))
Suppose that yt~ I(0), x1t~ I(1) and x2t~ I(1)
May suspect that et~ I(1)
However, there could exist a cointegrating vector [1,2] such that
(1 x1t+2x2t) ~ I(0)
If this is the case, et
will be stationary, since yt
~ I(0) and also
(1 x1t+2x2t) ~ I(0).
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
48/67
L d l d h t ff t
7/31/2019 REGRESSION ANALYSIS - lecture
49/67
COINTEGRATION
For example
wt is I(1) xt is I(1) yt is I(2) zt is I(2)
wt may be I(1)
linear combinationt=1wt+2xt+3(a1yt+a2zt)
linear combinationwt =a1yt+a1zt
t may be I(0)
Long-run models and short-run effects
L d l d h t ff t
7/31/2019 REGRESSION ANALYSIS - lecture
50/67
Regression Model: Yt = a + Xt + t
Case 1:
Both Yt and Xt are stationary: classical OLS is valid
Case 2:Yt and Xt are integrated of different orders: regression = meaningless
Case 3:
Yt and Xt integrated of same order, residuals non-stationary:
SPURIOUS REGRESSION PROBLEM Case 4:
Yt and Xt integrated of same order, residuals stationary:
COINTEGRATION
COINTEGRATION
Long-run models and short-run effects
L d l d h t ff t
7/31/2019 REGRESSION ANALYSIS - lecture
51/67
Cointegration implies:
Two or more series are linked to form an equilibriumrelationshipspanning the long run(even if series contain stochastic trends/arenon-stationary).
Series move closely together over time
Difference between them is stable (stationary)
Cointegration mimics the existence of LR equilibrium towards an
economic system converging over time
ut may be called disequilibrium error (distance that system is awayfrom equilibrium at time t)
COINTEGRATION:ECONOMIC INTERPRETATION
Long-run models and short-run effects
Long run models and short run effects
7/31/2019 REGRESSION ANALYSIS - lecture
52/67
DEMONSTRATION: COINTEGRATION
Estimate demand function for skilled labour in SA:
Long-run equation: LNS = f(LGDPFAC, LWSCPR)
Long-run models and short-run effects
Long run models and short run effects
7/31/2019 REGRESSION ANALYSIS - lecture
53/67
DEMONSTRATION: COINTEGRATION
Cointegration test:
Long-run models and short-run effects
-.08
-.04
.00
.04
.08
.12
1970 1975 1980 1985 1990 1995 2000
RES_COINT
7/31/2019 REGRESSION ANALYSIS - lecture
54/67
7/31/2019 REGRESSION ANALYSIS - lecture
55/67
Long run models and short run effects
7/31/2019 REGRESSION ANALYSIS - lecture
56/67
RESIDUAL TESTS- NORMALITY -
JARQUE-BERA test statistic:
Tests whether variable is normally distributed
Measures the difference of the skewness and kurtosis of aseries from those of a normal distribution
with S = skewness, K = kurtosis, k = # estimated coeff
H0: residuals are normally distributed
Reject H0 if JB > or if p0.05
Long-run models and short-run effects
)2(~)( 24
)3(2
6
2
KkN SJB
991.5)2(2
7/31/2019 REGRESSION ANALYSIS - lecture
57/67
7/31/2019 REGRESSION ANALYSIS - lecture
58/67
7/31/2019 REGRESSION ANALYSIS - lecture
59/67
7/31/2019 REGRESSION ANALYSIS - lecture
60/67
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
61/67
RESIDUAL TESTS- SERIAL CORRELATION -
Long run models and short run effects
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
62/67
RESIDUAL TESTS- HETEROSKEDASTICITY -
ENGLES ARCH LM-test:
Belongs to class of asymptotic (large sample), Lagrange Multiplier(LM) tests
This specification of heteroskedasticity is motivated by theobservation that for many finite time series, the magnitude of the
residuals appears to be related to the magnitude of recent residuals Test stat based on auxiliary regression
e2t = 0+1e2t-1+2e2t-2++qe2t-q+vtnR2~2(q)
H0: No ARCH up to order q in residuals
Reject H0 if nR2 > 2(q) or if p0.05
Long run models and short run effects
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
63/67
RESIDUAL TESTS- HETEROSKEDASTICITY -
Long run models and short run effects
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
64/67
RESIDUAL TESTS- HETEROSKEDASTICITY -
WHITES HETEROSKEDASTICITY LM-test:
Belongs to class of asymptotic (large sample), LagrangeMultiplier (LM) tests
Test stat based on auxiliary regression (of say
yt=b1+b2xt+b3zt+ut):e2t = a0+ a1xt+a2zt+ a3x2t+ a4z2t+ a5xtzt+vt
nR2~2(# slope coeffs in test regression)
H0: No Heteroskedasticity up to order q in residuals
Reject H0 if nR2 > 2 or if p0.05
g ff
7/31/2019 REGRESSION ANALYSIS - lecture
65/67
Long-run models and short-run effects
7/31/2019 REGRESSION ANALYSIS - lecture
66/67
STABILITY TESTS- RAMSEY RESET -
RAMSEYS RESET (Regression specification error test):
General test for the following types of misspecification:
Inclusion of irrelevant variables
Exclusion of relevant variables
Test based on augmented regression y=X+Z+u
H0: equation is correctly specified
LR and F-test both test H0
Reject H0 if p0.05
g ff
7/31/2019 REGRESSION ANALYSIS - lecture
67/67