REGRESSION ANALYSIS - lecture

Embed Size (px)

Citation preview

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    1/67

    Multiple linear regression model with K explanatory variables:

    Yt = 0+ 1X1t + 2X2t + + KXKt + t (t = 1,2, ,n)

    Deterministic component: (complex), K explanatory variables

    t : stochastic / random component

    REGRESSION ANALYSIS

    - MULTIPLE REGRESSION MODEL -

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    2/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    3/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    4/67

    DEMONSTRATION : SALES FUNCTION

    REGRESSION OUTPUT (Dependent variable: Sales Volume)

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    5/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    indication of significance of parameters in function.

    each different data set different estimates of the s

    how do estimated coefficients vary depending on the sample? measure of variability = standard error of estimated coefficients =

    estimate of square root of variance of distribution of

    the larger the standard error the more the estimates will vary

    standard error of the slope coefficient of is given by:

    (t=1,2,,n).

    s

    Basics of Regression Analysis

    2t )XX(

    SEE)(SE

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    6/67

    DEMONSTRATION : SALES FUNCTION

    REGRESSION OUTPUT (Dependent variable: Sales Volume)

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    7/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    t-statistic (much easier to evaluate):

    (k=1,2,,K)

    Rule of thumb:

    If t-value of coefficient i > 1.96: variable i significant

    If t-value of coefficient i < 1.96: drop variable i

    OR use t-table to derive critical value to compare t-statistic to

    Reject H0 (where H0: i = 0) if |ti | > t crit

    Basics of Regression Analysis

    )(SE

    tk

    kk

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    8/67

    DEMONSTRATION : SALES FUNCTION

    REGRESSION OUTPUT (Dependent variable: Sales Volume)

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    9/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    10/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    close to one: much variation in dependent variable explainedRule of thumb:

    > 0.8: acceptable for time-series data, lower value acceptable forcross-section data

    Basics of Regression Analysis

    2R

    2

    R

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    11/67

    DEMONSTRATION : SALES FUNCTION

    REGRESSION OUTPUT (Dependent variable: Sales Volume)

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    12/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    test overall significance of equation (joint significance)

    Rule of thumb:

    If F > 4: coefficients are jointly significant

    OR use F table to derive critical value to compare F-statistic to

    Reject H0 (where H0: 1 = 2 = = K = 0) if F > F crit

    )1Kn/(e

    K/)YY(

    F 2i

    2

    i

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    13/67

    DEMONSTRATION : SALES FUNCTION

    REGRESSION OUTPUT (Dependent variable: Sales Volume)

    Basics of Regression Analysis

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    14/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    Durbin-Watson statistic (DW) (1)

    tests for the existence of first order serial correlation

    first order serial correlation will occur when et's are correlated with each

    other

    Rule of thumb: 0

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    15/67

    EVALUATING THE GOODNESS OF FIT- STATISTICAL CRITERIA -

    Durbin-Watson statistic (DW) (2)

    Critical values indicating the upper (dU) and lower (dL) values for

    combinations of n (number of observations) and K' (number of

    independent variables excluding the constant) may be read off in the

    DW table.

    6 possible results for the Durbin-Watson statistic:

    Basics of Regression Analysis

    VALUE OF DW RESULT

    (4-dL) < DW < 4 Negative serial correlation

    (4-dU) < DW < (4 -dL) Result undetermined

    2 < DW < (4 -dU) No serial correlationdU < DW < 2 No serial correlation

    dL < DW < dU Result undetermined

    0 < DW < dL Positive serial correlation

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    16/67

    STATIONARITY VERSUSNON-STATIONARITY

    Stationary series: mean-reverting and constant variance

    DGP (data-generating process): AR(1) yt=yt-1 + vt v ~IID(0,2)

    ywill be: Weakly stationary where < 1

    Non-stationary where = 1

    Explosive where > 1

    Often we do not consider the latter a possible d.g.p.

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    17/67

    STATIONARITY VERSUSNON-STATIONARITY

    -3-2-1012

    10 20 30 40 50 60 70 80 90y = 0.6*y(-1) + v

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    18/67

    STATIONARITY VERSUSNON-STATIONARITY

    0

    2

    4

    6

    8

    10 20 30 40 50 60 70 80 90

    y = y(-1) + v

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    19/67

    STATIONARITY VERSUSNON-STATIONARITY

    2000

    0

    2000

    4000

    6000

    8000

    10 20 30 40 50 60 70 80 90

    y = 1.1y(-1) + v

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    20/67

    STATIONARITY VERSUSNON-STATIONARITY

    Proof:

    yt=yt-1 + vt

    y0

    y1 =y0 + v1

    y2 =y1 + v2

    =(y0 + v1 )+ v2

    =2y0 +v1 + v2

    y3 =y2+ v3=(2y0 +v1 + v2) + v3

    =3y0 +2v1+v2 + v3

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    21/67

    STATIONARITY VERSUSNON-STATIONARITY

    therefore

    and

    2

    0

    30

    3

    3

    i

    i

    iyy

    1

    0

    0

    t

    i

    it

    it

    tyy

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    22/67

    STATIONARITY VERSUSNON-STATIONARITY

    Conclusion: What are the properties of stationary data?

    E(yt) = E(yt+h) = < , i.e. constant finite mean over time

    E(y2

    t) = E(y2

    t+h) = , i.e. constant finite variance over time E(ytyt-j) = E(yt+hyt+h-j) = ij < , i.e. constant finite covariance over

    time

    2

    2

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    23/67

    Properties of stationary vs. non-stationary data

    Stationary Non-stationary

    Variance Finite Unbounded, grows to

    t2

    Memory Temporary Permanent

    Expected timebetween crossingsof y

    Finite

    (mean reverting)

    Infinite (mean is

    changing over time)

    STATIONARITY VERSUSNON-STATIONARITY

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    24/67

    STATIONARITY VERSUSNON-STATIONARITY

    How are data series transformed?

    Stationary variables are obtained by differencing (not necessarily

    only once) the non-stationary series. Such a DGP is referred to as

    difference stationary.

    yt= yt-1 + vt = 1

    yt = yt yt-1 = vt

    yt~ I(1)

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    25/67

    General:

    A variable is called I(p) if it must be differenced p times in order to renderit stationary.

    We can also say the series contains p unit roots.

    For yt=yt-1+ut with ut=white noise

    If ||

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    26/67

    TRANSFORMING A SERIES I(1)

    -2

    0

    2

    4

    6

    8

    10 20 30 40 50 60 70 80 90

    y

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    27/67

    0

    20

    40

    60

    80

    100

    120

    140

    70 75 80 85 90 95 00

    CPI

    TRANSFORMING A SERIES I(2)

    Stationarity

    0.00

    0.04

    0.08

    0.12

    0.16

    0.20

    70 75 80 85 90 95 00

    D_LCPI

    -0.04

    -0.02

    0.00

    0.02

    0.04

    0.06

    70 75 80 85 90 95 00

    DD_LCPI y y y

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    28/67

    CONSEQUENCES OF NON-STATIONARITY

    Non-stationary data series could result in spurious regression

    Example:

    Two non-stationary data series (yandx)

    By construction they have nothing in common

    yt= + yt-1 + vt

    xt= + xt-1 + ut

    Stationarity

    ),0(~2

    vt IIDv

    ),0(~2ut IIDu

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    29/67

    Variable Coefficient Std. Error t-Statistic Prob.

    X 5.494891 0.091884 59.80278 0

    R-squared 0.895314 Mean dependent var 51.97351

    Adjusted R-squared 0.895314 S.D. dependent var 29.73505

    S.E. of regression 9.620824 Akaike info criterion 7.376207

    Sum squared resid 8700.663 Schwarz criterion 7.40309

    Log likelihood -349.3698 Durbin-Watson stat 0.059368

    Included observations: 95

    Dependent Variable: Y

    Method: Least Squares

    Sample: 1901 1995

    Stationarity

    CONSEQUENCES OF NON-STATIONARITY

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    30/67

    CONSEQUENCES OF NON-STATIONARITY

    -30

    -20

    -10

    0

    10

    20

    -20

    0

    2040

    60

    80

    100

    120

    10 20 30 40 50 60 70 80 90

    Residual Actual Fitted

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    31/67

    CONSEQUENCES OF NON-STATIONARITY

    Stationarity

    Test Test Statistic p-value Conclusion

    Jarque-Bera 2.4383 [0.2955] Errors are normally

    distributed

    ARCH LM 78.8189 [0.0000] Heteroscedasticity

    White 30.8526 [0.0000] Heteroscedasticity

    Breush-Godfrey 89.5272 [0.0000] Serial Correlation

    Lung-Box -0.35 [0.0000] Serial Correlation

    Ramsey Reset 9.1089 [0.0105] Misspecification

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    32/67

    CONSEQUENCES OF NON-STATIONARITY

    This non-stationarity of the residuals provides us with important

    information concerning the relationship betweenytandxt.

    Effectively, where residuals are:

    Non-stationary: no causal relationship

    Stationary: some evidence that suggests causal relationship

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    33/67

    CONSEQUENCES OF NON-STATIONARITY

    Given the problems associated with non-stationary data series, it is of theutmost importance to identify the univariate properties of the databeing employed in regression analysis.

    Stationarity

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    34/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    35/67

    Tests available in EViews:

    Dickey-Fuller (DF)

    Augmented Dickey-Fuller (ADF)

    Phillips Perron (PP)

    Null hypothesis for all the above tests:

    H0:= 1 (series contains unit root)

    Unit root testing

    UNIT ROOT TESTING

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    36/67

    Testing for presence of unit roots not straightforward:

    Underlying (unknown) d.g.p. may include a time trend

    D.g.p. may contain MA terms in addition to being a simple AR

    process Small samplebias: Standard tests for unit roots biased towards

    accepting nullof non-stationarity (low power of test)

    (conclude I(1) when I(0))

    Undetected structural breakmay cause under-rejecting of the null

    (too often conclude I(1))

    Quarterly data may be tested for seasonal unit roots in addition

    Unit root testing

    UNIT ROOT TESTING

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    37/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    38/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    39/67

    Example: Consider a random walk process,

    yt=yt-1+ut with ut=white noise:

    It is difference stationary, since the difference is stationary:

    yt-yt-1 = (1-L)yt = ut

    It is also integrated, denoted I(d) where d is the order of integration.

    Order of integration= number of unit roots contained in series / numberof differencing operations necessary to render the series stationary

    yt =I(1) above

    STATIONARITY

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    40/67

    Presence of a stochastic trend(non-stationary) vs. deterministic trend(stationary) complicates unit root testing

    Source: Harris 1995:18

    DIFFERENCE-STATIONARITY vs. TREND-STATIONARARITY

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    41/67

    COINTEGRATION THE CONCEPT

    An econometric concept which mimics the existence of a long-run

    equilibrium among economic time series

    A drunk and herdog

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    42/67

    COINTEGRATION THE CONCEPT

    Data series, although non-stationary, can be combined (linear

    combination) into a single series which is itself stationary

    +

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    43/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    44/67

    Example

    yt~I(1); xt~I(1):

    If ut~I(0), then two series are cointegrated of order CI(1,1)

    COINTEGRATION

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    45/67

    Example: A cointegrated stable (equilibrium) relationship possibly exist

    between mt and pt (=1.1085; t=41.92)

    Source: Harris 1995:22

    COINTEGRATION

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    46/67

    L d l d h ff

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    47/67

    COINTEGRATION

    Consider, for example, a three variable case

    yt=1 x1t+2x2t+ et

    Now it is possible for the variables to be integrated of different orders

    and for the error term, to be stationary (et~ I(0))

    Suppose that yt~ I(0), x1t~ I(1) and x2t~ I(1)

    May suspect that et~ I(1)

    However, there could exist a cointegrating vector [1,2] such that

    (1 x1t+2x2t) ~ I(0)

    If this is the case, et

    will be stationary, since yt

    ~ I(0) and also

    (1 x1t+2x2t) ~ I(0).

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    48/67

    L d l d h t ff t

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    49/67

    COINTEGRATION

    For example

    wt is I(1) xt is I(1) yt is I(2) zt is I(2)

    wt may be I(1)

    linear combinationt=1wt+2xt+3(a1yt+a2zt)

    linear combinationwt =a1yt+a1zt

    t may be I(0)

    Long-run models and short-run effects

    L d l d h t ff t

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    50/67

    Regression Model: Yt = a + Xt + t

    Case 1:

    Both Yt and Xt are stationary: classical OLS is valid

    Case 2:Yt and Xt are integrated of different orders: regression = meaningless

    Case 3:

    Yt and Xt integrated of same order, residuals non-stationary:

    SPURIOUS REGRESSION PROBLEM Case 4:

    Yt and Xt integrated of same order, residuals stationary:

    COINTEGRATION

    COINTEGRATION

    Long-run models and short-run effects

    L d l d h t ff t

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    51/67

    Cointegration implies:

    Two or more series are linked to form an equilibriumrelationshipspanning the long run(even if series contain stochastic trends/arenon-stationary).

    Series move closely together over time

    Difference between them is stable (stationary)

    Cointegration mimics the existence of LR equilibrium towards an

    economic system converging over time

    ut may be called disequilibrium error (distance that system is awayfrom equilibrium at time t)

    COINTEGRATION:ECONOMIC INTERPRETATION

    Long-run models and short-run effects

    Long run models and short run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    52/67

    DEMONSTRATION: COINTEGRATION

    Estimate demand function for skilled labour in SA:

    Long-run equation: LNS = f(LGDPFAC, LWSCPR)

    Long-run models and short-run effects

    Long run models and short run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    53/67

    DEMONSTRATION: COINTEGRATION

    Cointegration test:

    Long-run models and short-run effects

    -.08

    -.04

    .00

    .04

    .08

    .12

    1970 1975 1980 1985 1990 1995 2000

    RES_COINT

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    54/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    55/67

    Long run models and short run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    56/67

    RESIDUAL TESTS- NORMALITY -

    JARQUE-BERA test statistic:

    Tests whether variable is normally distributed

    Measures the difference of the skewness and kurtosis of aseries from those of a normal distribution

    with S = skewness, K = kurtosis, k = # estimated coeff

    H0: residuals are normally distributed

    Reject H0 if JB > or if p0.05

    Long-run models and short-run effects

    )2(~)( 24

    )3(2

    6

    2

    KkN SJB

    991.5)2(2

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    57/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    58/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    59/67

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    60/67

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    61/67

    RESIDUAL TESTS- SERIAL CORRELATION -

    Long run models and short run effects

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    62/67

    RESIDUAL TESTS- HETEROSKEDASTICITY -

    ENGLES ARCH LM-test:

    Belongs to class of asymptotic (large sample), Lagrange Multiplier(LM) tests

    This specification of heteroskedasticity is motivated by theobservation that for many finite time series, the magnitude of the

    residuals appears to be related to the magnitude of recent residuals Test stat based on auxiliary regression

    e2t = 0+1e2t-1+2e2t-2++qe2t-q+vtnR2~2(q)

    H0: No ARCH up to order q in residuals

    Reject H0 if nR2 > 2(q) or if p0.05

    Long run models and short run effects

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    63/67

    RESIDUAL TESTS- HETEROSKEDASTICITY -

    Long run models and short run effects

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    64/67

    RESIDUAL TESTS- HETEROSKEDASTICITY -

    WHITES HETEROSKEDASTICITY LM-test:

    Belongs to class of asymptotic (large sample), LagrangeMultiplier (LM) tests

    Test stat based on auxiliary regression (of say

    yt=b1+b2xt+b3zt+ut):e2t = a0+ a1xt+a2zt+ a3x2t+ a4z2t+ a5xtzt+vt

    nR2~2(# slope coeffs in test regression)

    H0: No Heteroskedasticity up to order q in residuals

    Reject H0 if nR2 > 2 or if p0.05

    g ff

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    65/67

    Long-run models and short-run effects

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    66/67

    STABILITY TESTS- RAMSEY RESET -

    RAMSEYS RESET (Regression specification error test):

    General test for the following types of misspecification:

    Inclusion of irrelevant variables

    Exclusion of relevant variables

    Test based on augmented regression y=X+Z+u

    H0: equation is correctly specified

    LR and F-test both test H0

    Reject H0 if p0.05

    g ff

  • 7/31/2019 REGRESSION ANALYSIS - lecture

    67/67