REGRESSION ANALYSIS - lecture

7/31/2019 REGRESSION ANALYSIS - lecture

1/67

Multiple linear regression model with K explanatory variables:

Yt = 0+ 1X1t + 2X2t + + KXKt + t (t = 1,2, ,n)

Deterministic component: (complex), K explanatory variables

t : stochastic / random component

REGRESSION ANALYSIS

- MULTIPLE REGRESSION MODEL -

Basics of Regression Analysis


2/67


3/67


4/67

DEMONSTRATION : SALES FUNCTION

REGRESSION OUTPUT (Dependent variable: Sales Volume)



5/67

EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

indication of significance of parameters in function.

each different data set different estimates of the s

how do estimated coefficients vary depending on the sample? measure of variability = standard error of estimated coefficients =

estimate of square root of variance of distribution of

the larger the standard error the more the estimates will vary

standard error of the slope coefficient of is given by:

(t=1,2,,n).

s


2t )XX(

SEE)(SE


6/67





7/67


t-statistic (much easier to evaluate):

(k=1,2,,K)

Rule of thumb:

If t-value of coefficient i > 1.96: variable i significant

If t-value of coefficient i < 1.96: drop variable i

OR use t-table to derive critical value to compare t-statistic to

Reject H0 (where H0: i = 0) if |ti | > t crit


)(SE

tk

kk


8/67





9/67


10/67


close to one: much variation in dependent variable explainedRule of thumb:

> 0.8: acceptable for time-series data, lower value acceptable forcross-section data


2R

2

R


11/67





12/67


test overall significance of equation (joint significance)

Rule of thumb:

If F > 4: coefficients are jointly significant

OR use F table to derive critical value to compare F-statistic to

Reject H0 (where H0: 1 = 2 = = K = 0) if F > F crit

)1Kn/(e

K/)YY(

F 2i

2

i



13/67





14/67


Durbin-Watson statistic (DW) (1)

tests for the existence of first order serial correlation

first order serial correlation will occur when et's are correlated with each

other

Rule of thumb: 0


15/67


Durbin-Watson statistic (DW) (2)

Critical values indicating the upper (dU) and lower (dL) values for

combinations of n (number of observations) and K' (number of

independent variables excluding the constant) may be read off in the

DW table.

6 possible results for the Durbin-Watson statistic:


VALUE OF DW RESULT

(4-dL) < DW < 4 Negative serial correlation

(4-dU) < DW < (4 -dL) Result undetermined

2 < DW < (4 -dU) No serial correlationdU < DW < 2 No serial correlation

dL < DW < dU Result undetermined

0 < DW < dL Positive serial correlation


16/67

STATIONARITY VERSUSNON-STATIONARITY

Stationary series: mean-reverting and constant variance

DGP (data-generating process): AR(1) yt=yt-1 + vt v ~IID(0,2)

ywill be: Weakly stationary where < 1

Non-stationary where = 1

Explosive where > 1

Often we do not consider the latter a possible d.g.p.

Stationarity


17/67


-3-2-1012

10 20 30 40 50 60 70 80 90y = 0.6*y(-1) + v

Stationarity


18/67


0

2

4

6

8

10 20 30 40 50 60 70 80 90

y = y(-1) + v

Stationarity


19/67


2000

0

2000

4000

6000

8000

10 20 30 40 50 60 70 80 90

y = 1.1y(-1) + v

Stationarity


20/67


Proof:

yt=yt-1 + vt

y0

y1 =y0 + v1

y2 =y1 + v2

=(y0 + v1 )+ v2

=2y0 +v1 + v2

y3 =y2+ v3=(2y0 +v1 + v2) + v3

=3y0 +2v1+v2 + v3

Stationarity


21/67


therefore

and

2

0

30

3

3

i

i

iyy

1

0

0

t

i

it

it

tyy

Stationarity


22/67


Conclusion: What are the properties of stationary data?

E(yt) = E(yt+h) = < , i.e. constant finite mean over time

E(y2

t) = E(y2

t+h) = , i.e. constant finite variance over time E(ytyt-j) = E(yt+hyt+h-j) = ij < , i.e. constant finite covariance over

time

2

2

Stationarity


23/67

Properties of stationary vs. non-stationary data

Stationary Non-stationary

Variance Finite Unbounded, grows to

t2

Memory Temporary Permanent

Expected timebetween crossingsof y

Finite

(mean reverting)

Infinite (mean is

changing over time)


Stationarity


24/67


How are data series transformed?

Stationary variables are obtained by differencing (not necessarily

only once) the non-stationary series. Such a DGP is referred to as

difference stationary.

yt= yt-1 + vt = 1

yt = yt yt-1 = vt

yt~ I(1)

Stationarity


25/67

General:

A variable is called I(p) if it must be differenced p times in order to renderit stationary.

We can also say the series contains p unit roots.

For yt=yt-1+ut with ut=white noise

If ||


26/67

TRANSFORMING A SERIES I(1)

-2

0

2

4

6

8

10 20 30 40 50 60 70 80 90

y

Stationarity


27/67

0

20

40

60

80

100

120

140

70 75 80 85 90 95 00

CPI

TRANSFORMING A SERIES I(2)

Stationarity

0.00

0.04

0.08

0.12

0.16

0.20

70 75 80 85 90 95 00

D_LCPI

-0.04

-0.02

0.00

0.02

0.04

0.06

70 75 80 85 90 95 00

DD_LCPI y y y


28/67

CONSEQUENCES OF NON-STATIONARITY

Non-stationary data series could result in spurious regression

Example:

Two non-stationary data series (yandx)

By construction they have nothing in common

yt= + yt-1 + vt

xt= + xt-1 + ut

Stationarity

),0(~2

vt IIDv

),0(~2ut IIDu


29/67

Variable Coefficient Std. Error t-Statistic Prob.

X 5.494891 0.091884 59.80278 0

R-squared 0.895314 Mean dependent var 51.97351

Adjusted R-squared 0.895314 S.D. dependent var 29.73505

S.E. of regression 9.620824 Akaike info criterion 7.376207

Sum squared resid 8700.663 Schwarz criterion 7.40309

Log likelihood -349.3698 Durbin-Watson stat 0.059368

Included observations: 95

Dependent Variable: Y

Method: Least Squares

Sample: 1901 1995

Stationarity



30/67


-30

-20

-10

0

10

20

-20

0

2040

60

80

100

120

10 20 30 40 50 60 70 80 90

Residual Actual Fitted

Stationarity


31/67


Stationarity

Test Test Statistic p-value Conclusion

Jarque-Bera 2.4383 [0.2955] Errors are normally

distributed

ARCH LM 78.8189 [0.0000] Heteroscedasticity

White 30.8526 [0.0000] Heteroscedasticity

Breush-Godfrey 89.5272 [0.0000] Serial Correlation

Lung-Box -0.35 [0.0000] Serial Correlation

Ramsey Reset 9.1089 [0.0105] Misspecification


32/67


This non-stationarity of the residuals provides us with important

information concerning the relationship betweenytandxt.

Effectively, where residuals are:

Non-stationary: no causal relationship

Stationary: some evidence that suggests causal relationship

Stationarity


33/67


Given the problems associated with non-stationary data series, it is of theutmost importance to identify the univariate properties of the databeing employed in regression analysis.

Stationarity


34/67


35/67

Tests available in EViews:

Dickey-Fuller (DF)

Augmented Dickey-Fuller (ADF)

Phillips Perron (PP)

Null hypothesis for all the above tests:

H0:= 1 (series contains unit root)

Unit root testing

UNIT ROOT TESTING


36/67

Testing for presence of unit roots not straightforward:

Underlying (unknown) d.g.p. may include a time trend

D.g.p. may contain MA terms in addition to being a simple AR

process Small samplebias: Standard tests for unit roots biased towards

accepting nullof non-stationarity (low power of test)

(conclude I(1) when I(0))

Undetected structural breakmay cause under-rejecting of the null

(too often conclude I(1))

Quarterly data may be tested for seasonal unit roots in addition

Unit root testing

UNIT ROOT TESTING


37/67


38/67


39/67

Example: Consider a random walk process,

yt=yt-1+ut with ut=white noise:

It is difference stationary, since the difference is stationary:

yt-yt-1 = (1-L)yt = ut

It is also integrated, denoted I(d) where d is the order of integration.

Order of integration= number of unit roots contained in series / numberof differencing operations necessary to render the series stationary

yt =I(1) above

STATIONARITY

Long-run models and short-run effects


40/67

Presence of a stochastic trend(non-stationary) vs. deterministic trend(stationary) complicates unit root testing

Source: Harris 1995:18

DIFFERENCE-STATIONARITY vs. TREND-STATIONARARITY



41/67

COINTEGRATION THE CONCEPT

An econometric concept which mimics the existence of a long-run

equilibrium among economic time series

A drunk and herdog



42/67

COINTEGRATION THE CONCEPT

Data series, although non-stationary, can be combined (linear

combination) into a single series which is itself stationary

+



43/67


44/67

Example

yt~I(1); xt~I(1):

If ut~I(0), then two series are cointegrated of order CI(1,1)

COINTEGRATION



45/67

Example: A cointegrated stable (equilibrium) relationship possibly exist

between mt and pt (=1.1085; t=41.92)

Source: Harris 1995:22

COINTEGRATION



46/67

L d l d h ff


47/67

COINTEGRATION

Consider, for example, a three variable case

yt=1 x1t+2x2t+ et

Now it is possible for the variables to be integrated of different orders

and for the error term, to be stationary (et~ I(0))

Suppose that yt~ I(0), x1t~ I(1) and x2t~ I(1)

May suspect that et~ I(1)

However, there could exist a cointegrating vector [1,2] such that

(1 x1t+2x2t) ~ I(0)

If this is the case, et

will be stationary, since yt

~ I(0) and also

(1 x1t+2x2t) ~ I(0).



48/67

L d l d h t ff t


49/67

COINTEGRATION

For example

wt is I(1) xt is I(1) yt is I(2) zt is I(2)

wt may be I(1)

linear combinationt=1wt+2xt+3(a1yt+a2zt)

linear combinationwt =a1yt+a1zt

t may be I(0)


L d l d h t ff t


50/67

Regression Model: Yt = a + Xt + t

Case 1:

Both Yt and Xt are stationary: classical OLS is valid

Case 2:Yt and Xt are integrated of different orders: regression = meaningless

Case 3:

Yt and Xt integrated of same order, residuals non-stationary:

SPURIOUS REGRESSION PROBLEM Case 4:

Yt and Xt integrated of same order, residuals stationary:

COINTEGRATION

COINTEGRATION


L d l d h t ff t


51/67

Cointegration implies:

Two or more series are linked to form an equilibriumrelationshipspanning the long run(even if series contain stochastic trends/arenon-stationary).

Series move closely together over time

Difference between them is stable (stationary)

Cointegration mimics the existence of LR equilibrium towards an

economic system converging over time

ut may be called disequilibrium error (distance that system is awayfrom equilibrium at time t)

COINTEGRATION:ECONOMIC INTERPRETATION


Long run models and short run effects


52/67

DEMONSTRATION: COINTEGRATION

Estimate demand function for skilled labour in SA:

Long-run equation: LNS = f(LGDPFAC, LWSCPR)




53/67

DEMONSTRATION: COINTEGRATION

Cointegration test:


-.08

-.04

.00

.04

.08

.12

1970 1975 1980 1985 1990 1995 2000

RES_COINT


54/67


55/67



56/67

RESIDUAL TESTS- NORMALITY -

JARQUE-BERA test statistic:

Tests whether variable is normally distributed

Measures the difference of the skewness and kurtosis of aseries from those of a normal distribution

with S = skewness, K = kurtosis, k = # estimated coeff

H0: residuals are normally distributed

Reject H0 if JB > or if p0.05


)2(~)( 24

)3(2

6

2

KkN SJB

991.5)2(2


57/67


58/67


59/67


60/67



61/67

RESIDUAL TESTS- SERIAL CORRELATION -




62/67

RESIDUAL TESTS- HETEROSKEDASTICITY -

ENGLES ARCH LM-test:

Belongs to class of asymptotic (large sample), Lagrange Multiplier(LM) tests

This specification of heteroskedasticity is motivated by theobservation that for many finite time series, the magnitude of the

residuals appears to be related to the magnitude of recent residuals Test stat based on auxiliary regression

e2t = 0+1e2t-1+2e2t-2++qe2t-q+vtnR2~2(q)

H0: No ARCH up to order q in residuals

Reject H0 if nR2 > 2(q) or if p0.05




63/67





64/67


WHITES HETEROSKEDASTICITY LM-test:

Belongs to class of asymptotic (large sample), LagrangeMultiplier (LM) tests

Test stat based on auxiliary regression (of say

yt=b1+b2xt+b3zt+ut):e2t = a0+ a1xt+a2zt+ a3x2t+ a4z2t+ a5xtzt+vt

nR2~2(# slope coeffs in test regression)

H0: No Heteroskedasticity up to order q in residuals

Reject H0 if nR2 > 2 or if p0.05

g ff


65/67



66/67

STABILITY TESTS- RAMSEY RESET -

RAMSEYS RESET (Regression specification error test):

General test for the following types of misspecification:

Inclusion of irrelevant variables

Exclusion of relevant variables

Test based on augmented regression y=X+Z+u

H0: equation is correctly specified

LR and F-test both test H0

Reject H0 if p0.05

g ff


67/67

Documents

REGRESSION ANALYSIS - lecture