DADM-Correlation and Regression

Embed Size (px)

Citation preview

  • 8/3/2019 DADM-Correlation and Regression

    1/138

    ACE INSTITUTEOF MANAGEMENTMASTEROF BUSINESS ADMINISTRATION (MBAE)

    Semester: III

    Credits: 2

    Course Name: DATA ANALYSIS AND DECISION MODELINGEffective Date: April, 2011

    Class Schedule: Wednesday and Thursday

    Time: (6:00-9:00 P.M.)

  • 8/3/2019 DADM-Correlation and Regression

    2/138

    CORRELATION ANALYSIS

  • 8/3/2019 DADM-Correlation and Regression

    3/138

    PURPOSEOF CORRELATION ANALYSIS Population Correlation Coefficient (Rho) is Used

    to Measure the Strength between the Variables

    Sample Correlation Coefficientr is an Estimate of and is Used to Measure the Strength of the Linear

    Relationship in the Sample Observations

    3

  • 8/3/2019 DADM-Correlation and Regression

    4/138

    CORRELATION

    Mutual relationship between two or more than two variable

    Variables under consideration are said to be correlated if theeffect of change in one variable tends to change in anothervariable

    Example: height/weight of persons weight/blood pressure

    price and supply

    Demand and commodity

    sales of a company and Earning per Share or Price-Earning Ratio of itsstock

    income/house value,

    We are interested to know what kind of relationship exist andwhat is the degree (strength) of relationship between thevariables

  • 8/3/2019 DADM-Correlation and Regression

    5/138

    TYPESOF CORRELATION

    Positive and Negative

    Simple correlation

    Partial correlation

    Multiple correlation Linear and Non-linear

  • 8/3/2019 DADM-Correlation and Regression

    6/138

    MEASUREMENTOFCORRELATION

    SCATTERDIAGRAMMETHOD:A scatter plot is a graph of the ordered pairs (x,y) of numbersconsisting of the independent variables, x, and the dependentvariables, y.

    KARL PEARSONSCOEFFICIENTOFCORRELATION

    RANKMETHODfor finding the qualitative coefficient of correlation

    beauty, intelligence, honesty..

  • 8/3/2019 DADM-Correlation and Regression

    7/138

    SCATTERDIAGRAMMETHOD

    The scatter shows the joint variation among the pairsof values and gives an idea about the degree anddirection of the relationship between the variables xand y

    Greater the scatter of points over the graph, thelesser the relationship between the variables

    If all the points lie in a straight line, there is eitherperfect positive or perfect negative correlation

  • 8/3/2019 DADM-Correlation and Regression

    8/138

    The nearer the points are to the straight line thehigh degree of correlation and the farther the pointsare to the straight line the low degree of correlation

    If the points are widely scatted and no trend arerevealed, the variables may be uncorrelated

    It does not provide an exact measure of the extentof the relationship between the variables

  • 8/3/2019 DADM-Correlation and Regression

    9/138

    GRAPHICAL EXPLORE:

    SCATTER

    PLOT

    (THE

    COLLECTION

    OF

    DOT

    CORRESPONDING

    TO

    (X

    I,Y

    I))

  • 8/3/2019 DADM-Correlation and Regression

    10/138

    PERFECT POSITIVE CORRELATION

    20100

    60

    50

    40

    30

    20

    x

    y

    r=1

    PERFECT NEGATIVE CORRELATION

    20100

    120

    110

    100

    90

    80

    x

    y

    r = -1

  • 8/3/2019 DADM-Correlation and Regression

    11/138

    EXAMPLESOFRVALUES:

  • 8/3/2019 DADM-Correlation and Regression

    12/138

    EXAMPLE

    Independent variable inthis example is thenumber of hours studied.

    The grade the student

    receives is a dependentvariable.

    The grade studentreceives depend upon thenumber of hours he or she

    will study. Are these two variables

    related?

    Student Hoursstudied

    % Grade

    A 6 82

    B 2 63

    C 1 57

    D 5 88

    E 3 68

    F 2 75

  • 8/3/2019 DADM-Correlation and Regression

    13/138

    SCATTER PLOT

    the independent variable is plotted on the horizontal x-axis. The dependent variable is plotted on the verticaly-axis.

    Scatter Plot

    0

    20

    40

    60

    80

    100

    0 1 2 3 4 5 6 7

    Hours Studied

    Grade(%)

  • 8/3/2019 DADM-Correlation and Regression

    14/138

    RANGEOFCORRELATIONCOEFFICIENT

    In case of exactpositive linearrelationship the value

    of r is +1. In case of a strong

    positive linearrelationship, the value

    of rwill be close to +1.

    Correlation = +1

    15

    20

    25

    10 12 14 16 18 20

    Independent variable

    Dependentvariable

  • 8/3/2019 DADM-Correlation and Regression

    15/138

    RANGEOFCORRELATIONCOEFFICIENT

    In case of exactnegative linearrelationship the

    value of ris1. In case of a strong

    negative linearrelationship, the

    value of rwill beclose to 1.

    Correlation = -1

    15

    20

    25

    10 12 14 16 18 20

    Independent variable

    Dependen

    t

    variable

  • 8/3/2019 DADM-Correlation and Regression

    16/138

    RANGEOFCORRELATIONCOEFFICIENT

    In case of a weakrelationship the valueof rwill be close to 0i.e. absence of linearrelationship.

    the low or zero valueof r means that therelationship is notlinear but there couldbe other type ofrelationship.

    Correlation = 0

    10

    15

    20

    25

    30

    0 2 4 6 8 10 12

    Independent variable

    Dependentvariable

    x y

    1 0

    0 1

    -1 0

    0 -1

    122

    yx

  • 8/3/2019 DADM-Correlation and Regression

    17/138

    RANGEOFCORRELATIONCOEFFICIENT

    In case of nonlinearrelationship the valueof rwill be close to 0.

    Correlation = 0

    0

    10

    20

    30

    0 2 4 6 8 10 12

    Independent variable

    Dependentvariable

  • 8/3/2019 DADM-Correlation and Regression

    18/138

    KARL PEARSON CORRELATIONCOEFFICIENT

    (rho), for population values rfor sample values

    usually denoted by r(x,y), or rxy, simply r

    r =is a numerical measure of relationshipbetween them

  • 8/3/2019 DADM-Correlation and Regression

    19/138

    (PEARSONPRODUCT-MOMENT)SAMPLECORRELATION

    )()(

    ),(

    yVarxVar

    yxCovr

    yyxx

    xy

    SS

    Sr

  • 8/3/2019 DADM-Correlation and Regression

    20/138

    FEATURESOFANDrUnit Free

    Range between -1 and 1

    The Closer to -1, the Stronger the Negative

    Linear Relationship

    The Closer to 1, the Stronger the Positive

    Linear Relationship

    The Closer to 0, the Weaker the Linear

    Relationship

    20

  • 8/3/2019 DADM-Correlation and Regression

    21/138

    EXAMPLE:

    Numbers of weeks

    (in the program)

    Speed gain

    (words per minute)

    3 86

    5 118

    2 49

    8 493

    6 164

    9 232

  • 8/3/2019 DADM-Correlation and Regression

    22/138

    r =0.991

  • 8/3/2019 DADM-Correlation and Regression

    23/138

    EXAMPLE:COMPUTECOEFFICIENTOFCORRELATION

    X Y

    6 9

    2 1110 ?

    4 8

    8 7

    Arithmetic mean of X and Y-series are 6 and 8

    EXAMPLE

  • 8/3/2019 DADM-Correlation and Regression

    24/138

    EXAMPLETHEFOLLOWINGDATAPERTAINTOTHEDEMANDFORAPRODUCT (INTHOUSANDSOFUNITS) ANDITSPRICE (IN RS.)CHARGEDINFIVEDIFFERENTAREAS;

    Price

    x

    Demand

    y

    20 22

    16 41

    10 141

    11 89

    14 56

    Draw a scatter diagram

    Calculate the coefficient of correlation.

  • 8/3/2019 DADM-Correlation and Regression

    25/138

    EXAMPLETHEANNUALLABORWELFAREFUNDS (LAKHSOFRUPEES) ANDTHECORRESPONDINGANNUALPRODUCTION (INCORESOFRUPEES) FORTHEPAST 8 YEARSOFACOMPANYAREPRESENTEDBELOW.

    Year Price

    x

    Demand

    y

    1 8 18

    2 10 28

    3 12 35

    4 14 45

    5 16 50

    6 18 70

    7 20 858 22 95

    Draw a scatter diagram

    Calculate the coefficient of correlation annual labor welfare funds and thecorresponding annual production. Also test the significance of the correlationcoefficient at a significance level of 0.05

  • 8/3/2019 DADM-Correlation and Regression

    26/138

    HYPOTHESIS TESTING

    Null hypothesis: =0 (two variables are not associated)

    Alternative hypothesis: 0 (two variables are associated)

    Level of significance =0.05

    Test statistic

    Decision : if null hypothesis is rejected there is arelationship between the two variables.

    2n

    r1

    -rt

    2

  • 8/3/2019 DADM-Correlation and Regression

    27/138

    t- TESTFOR CORRELATION

    Hypotheses

    H0:= 0 (No Correlation)

    H1: 0 (Correlation)

    Test Statistic

    2

    2 1

    2 2

    1 1

    where

    2

    n

    i i

    i

    n n

    i i

    i i

    rt

    r

    n

    X X Y Y

    r r

    X X Y Y

    27

  • 8/3/2019 DADM-Correlation and Regression

    28/138

    HYPOTHESIS TESTING

    )1,0(~1111

    ln2

    3

    31

    Nr

    rn

    n

    zZ z

    r

    rz

    1

    1ln

    2

    1

    3

    1

    n

    Null hypothesis: = 0 (two variables are not associated)

    Alternative hypothesis: 0 (two variables areassociated)

    Level of significance =0.05

    Test statistic

    Decision : if null hypothesis is rejected there is arelationship between the two variables.

    1

    1ln

    2

    1z

  • 8/3/2019 DADM-Correlation and Regression

    29/138

    EXAMPLECOEFFICIENTOFCORRELATION BASEDONASAMPLEOFSIZE18 WASCOMPUTEDTOBE 0.32. CANWECONCLUDEAT

    SIGNIFICANCELEVELSOFA) 0.05 B) 0.01

    Null hypothesis: =0 (two variables are not associated)

    Alternative hypothesis: > 0 One tail test Alternative hypothesis: 0 Two tail test

  • 8/3/2019 DADM-Correlation and Regression

    30/138

    EXAMPLECOEFFICIENTOFCORRELATION BASEDONASAMPLEOFSIZE24 WASCOMPUTEDTOBE 0.75. CANWECONCLUDEATSIGNIFICANCELEVELSOFA) 0.05 B) 0.01

    Null hypothesis: =0.60 (two variables are not associated)

    Alternative hypothesis: > 0.60 One tail test Alternative hypothesis: 0.60 Two tail test

  • 8/3/2019 DADM-Correlation and Regression

    31/138

    CONFIDENCEINTERVALFOR

    33

    22

    n

    zz

    n

    zz z

    z

    EXAMPLE:

  • 8/3/2019 DADM-Correlation and Regression

    32/138

    EXAMPLE:IFR = 0.7 FORTHEMATHEMATICSANDSTATISTICSGRADESOF 30STUDENTS, CONSTRUCT 95% CONFIDENCEINTERVALFORTHEPOPULATIONCORRELATIONCOEFFICIENT.

    r = 0.70, n = 30, andz0.025=1.96

    z that correspond to r =0.70 from table is 0.867

    95% confidence interval for the population correlationcoefficient

    85.045.0

    27

    96.1867.0

    27

    96.1867.0

    33

    22

    z

    z

    n

    z

    z

    n

    z

    z

  • 8/3/2019 DADM-Correlation and Regression

    33/138

    construct 95% confidence interval for thepopulation correlation coefficient when

    a) r = 0.72, n = 30

    b) r = 0.35, n = 40

    c) r = -0.87, n = 35,

    d) r = 0.16, n = 42,

  • 8/3/2019 DADM-Correlation and Regression

    34/138

    construct 99% confidence interval for thepopulation correlation coefficient when

    a) r = 0.72, n = 30

    b) r = 0.35, n = 40

    c) r = -0.87, n = 35,

    d) r = 0.16, n = 42,

    S

  • 8/3/2019 DADM-Correlation and Regression

    35/138

    STRENGTHVS. SIGNIFICANCEOFTHECORRELATION:

    the significance, given by P-value, depends on thestatistical evidence. When small, the correlationexists.

    the strength, given by the r value, is meaningful only

    it is supported by statistical significance.

  • 8/3/2019 DADM-Correlation and Regression

    36/138

    R2=12.70%

    Means that the variables in the model explainsabout 12.70% of the total variation in that age

  • 8/3/2019 DADM-Correlation and Regression

    37/138

  • 8/3/2019 DADM-Correlation and Regression

    38/138

    r = .6 r = 1

    SAMPLEOF OBSERVATIONSFROM VARIOUSr

    VALUES

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r = -1 r = -.6 r = 0

    38

    EXAMPLE: PRODUCE STORES

  • 8/3/2019 DADM-Correlation and Regression

    39/138

    EXAMPLE: PRODUCE STORES

    R eg ressio n S tatistics

    M u l t ip le R 0 . 9 7 0 5 5 7 2

    R S q u a re 0 . 9 4 1 9 8 1 2 9A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4

    S t a n d a rd E r ro r 6 1 1 . 7 5 1 5 1 7

    O b s e rva t io n s 7

    From Excel Printout r

    Is there any

    evidence of linearrelationship betweenAnnual Sales of astore and its Square

    Footage at .05 level

    H0:= 0 (No association)

    H1: 0 (Association)

    .05df7 - 2 = 5

    39

    AnnualStore Square Sales

    Feet ($000)

    1 1,726 3,681

    2 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,3186 2,208 5,563

    7 1,313 3,760

  • 8/3/2019 DADM-Correlation and Regression

    40/138

    EXAMPLE: PRODUCE STORES SOLUTION

    0 2.5706-2.5706

    .025Reject Reject.025

    Critical Value(s):

    Conclusion:There is evidence of a linearrelationship at 5% level ofsignificance

    Decision:Reject H0

    2

    .97069.0099

    1 .9420

    52

    rt

    r

    n

    The value of the t statistic isexactly the same as the t statisticvalue for test on the slopecoefficient 40

  • 8/3/2019 DADM-Correlation and Regression

    41/138

    SIMPLE REGRESSION

  • 8/3/2019 DADM-Correlation and Regression

    42/138

    TOPICS

    Introduction

    Types of Regression Models

    Determining the Simple Linear Regression

    Equation

    Interpretation of regression coefficients

    42

  • 8/3/2019 DADM-Correlation and Regression

    43/138

    INTRODUCTION

    Decisions based on forecast Relationship between variables between what is

    known and what is to be estimated

    e.g. relationship between annual sales and size of store

    e.g. relationship between annual profits and investmentin R&D

    Regression and Correlation Analyses Determine nature and strength of relationship

    Simple Regression Model develops relationship

    between a response variable and ONE explanatoryvariable (independent variable)

    Simple Regression Analysis determines degreeto which variables are related, how best the modeldescribes the relationship

    43

  • 8/3/2019 DADM-Correlation and Regression

    44/138

    PURPOSEOF REGRESSION ANALYSIS

    Regression Analysis is Used Primarily to Model

    Causality and Provide Prediction

    Predict the values of a dependent (response) variable

    based on values of at least one independent

    (explanatory) variable e.g. predict annual sales based

    on expenditure in advertising

    Explain the effect of the independent variables on the

    dependent variable

    44

  • 8/3/2019 DADM-Correlation and Regression

    45/138

    Positive Linear Relationship

    Negative Linear Relationship

    Relationship NOT Linear

    No Relationship

    TYPESOF RELATIONSHIPS

    45

  • 8/3/2019 DADM-Correlation and Regression

    46/138

    SIMPLE LINEAR REGRESSION MODEL

    Relationship Between Variables is Described by

    a Linear Function

    The Change of One Variable Causes the Other

    Variable to Change

    A Dependency of One Variable on the Other

    46

  • 8/3/2019 DADM-Correlation and Regression

    47/138

    PopulationRegressionLine(conditional mean)

    Population regression line is a straight line thatdescribes the dependence of the averagevalue (conditional mean) of one variable on the

    otherPopulationY intercept

    PopulationSlopeCoefficient

    RandomError

    Dependent(Response)Variable

    Independent(Explanatory)Variable

    ii iY X |Y X

    SIMPLE LINEAR REGRESSION MODEL(continued)

    47

  • 8/3/2019 DADM-Correlation and Regression

    48/138

    ii iY X

    = Random Error

    Y

    X

    (Observed Value of Y) =

    Observed Value of Y

    |Y X iX

    i

    (Conditional Mean)

    SIMPLE LINEAR REGRESSION MODEL(continued)

    48

  • 8/3/2019 DADM-Correlation and Regression

    49/138

    Sample regression line provides an estimateofthe population regression line as well as apredicted value of Y

    SampleY Intercept

    SampleSlopeCoefficient

    Residual0 1i ii

    b bY X e

    0 1

    Y b b X Simple Regression Equation(Fitted Regression Line, Predicted Value)

    LINEAR REGRESSION EQUATION

    49

  • 8/3/2019 DADM-Correlation and Regression

    50/138

    b0

    andb1

    are obtained by finding the values ofb0

    andb1

    that minimizes the sum of the squared

    residuals (Least Squares Method)

    b0provides an estimateof 0

    b1

    provides and estimateof 1

    (continued)

    2

    2

    1 1

    n n

    i i i

    i i

    Y Y e

    LINEAR REGRESSION EQUATION

    50

  • 8/3/2019 DADM-Correlation and Regression

    51/138

    LEAST SQUARES METHOD

    b0

    = Y -b1X

    b1

    =xy (xy)/n

    x2- (x)2/n

    Y =y/n

    X =x/n51

  • 8/3/2019 DADM-Correlation and Regression

    52/138

    Y

    X

    Observed Value

    |Y X iX

    i

    ii iY X

    0 1i iY b b X

    ie

    0 1i iib bY X e

    1b

    0b

    (continued)

    LINEAR REGRESSION EQUATION

    52

  • 8/3/2019 DADM-Correlation and Regression

    53/138

    is the average value of Y when the value of

    X is zero.

    measures the change in the average

    value of Y as a result of a one-unit change in X.

    | 0Y X

    |

    1

    Y X

    X

    INTERPRETATIONOFTHE SLOPEANDINTERCEPT

    53

  • 8/3/2019 DADM-Correlation and Regression

    54/138

    You wish to examinethe linear dependencyof the annual sales ofproduce stores on theirsizes in squarefootage. Sample data

    for 7 stores wereobtained. Find theequation of the straightline that fits the data

    best.

    AnnualStore Square Sales

    Feet ($1000)

    1 1,726 3,6812 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760

    LINEAR REGRESSION EQUATION: EXAMPLE

    54

  • 8/3/2019 DADM-Correlation and Regression

    55/138

    0

    2 0 0 0

    4 0 0 0

    6 0 0 0

    8 0 0 0

    1 0 0 0 0

    1 2 0 0 0

    0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0

    S q u a re F e e t

    AnnualSale

    s($000)

    Excel Output

    SCATTER DIAGRAM: EXAMPLE

    55

    S L R E

  • 8/3/2019 DADM-Correlation and Regression

    56/138

    0 1

    1636.415 1.487

    i i

    i

    Y b b X

    X

    From Excel Printout:

    Coef f ic ients

    I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6

    X V a r i a b l e 1 . 4 8 6 6 3 3 6 5 7

    SIMPLE LINEAR REGRESSION EQUATION:EXAMPLE

    56

  • 8/3/2019 DADM-Correlation and Regression

    57/138

    GRAPHOFTHE SIMPLE LINEAR REGRESSIONEQUATION: EXAMPLE

    0

    2 0 0 0

    4 0 0 0

    6 0 0 0

    8 0 0 0

    1 0 0 0 0

    1 2 0 0 0

    0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0

    S q u a r e F e e t

    AnnualSales($000)

    57

  • 8/3/2019 DADM-Correlation and Regression

    58/138

    INTERPRETATIONOF RESULTS: EXAMPLE

    The slope of 1.487 means that each increase of one unitin X, we predict the average of Y to increase by anestimated 1.487 units.

    The equationestimatesthat foreach increase of 1square footin the size of the store, theexpectedannualsales are predictedto increase by $1487.

    1636.415 1.487i i

    Y X

    58

  • 8/3/2019 DADM-Correlation and Regression

    59/138

    TOPICS

    Measures of Variation

    Coefficient of Determination

    Coefficient of Correlation

    59

    M V

  • 8/3/2019 DADM-Correlation and Regression

    60/138

    MEASURESOF VARIATION:

    THE SUMOF SQUARES

    SST = SSR + SSETotal

    Sample

    Variability

    Explained

    Variability

    Unexplained

    Variability

    To examine the ability of the independent variable topredict the dependant variable

    60

    M V

  • 8/3/2019 DADM-Correlation and Regression

    61/138

    MEASURESOF VARIATION:

    THE SUMOF SQUARES

    SST = Total Sum of Squares

    Measures the variation of the Yi values around theirmean,

    SSR = Regression Sum of Squares

    Explained variation attributable to the relationshipbetween Xand Y, between predicted value and meanvalue

    SSE = Error Sum of Squares

    Variation attributable to factors other than therelationship between Xand Y, between observed valueand predicted value

    (continued)

    Y

    61

    M V

  • 8/3/2019 DADM-Correlation and Regression

    62/138

    MEASURESOF VARIATION:

    THE SUMOF SQUARES (continued)

    SST=(Yi- Y)2=Yi2 (Yi)2/n_

    SSR=(Yi- Y)2= b0Yi+ b1XiYi- (Yi)2/n_

    SSE=(Yi- Y)2=Y

    i

    2- b0Y

    i- b

    1X

    iY

    i

    62

  • 8/3/2019 DADM-Correlation and Regression

    63/138

    MEASURESOF VARIATION:THE SUMOF SQUARES

    (continued)

    Xi

    Y

    X

    Y

    SST=(Yi-Y)2SSE=(Yi-Yi)2

    SSR=(Yi-Y)2_

    _

    _

    63

  • 8/3/2019 DADM-Correlation and Regression

    64/138

    MEASURES OF VARIATIONTHE SUM OF SQUARES: EXAMPLE

    ANOVAdf SS M S F Significance F

    Regressio 1 30380456.12 30380456.1 81.1790902 0.000281201

    Residual 5 1871199.595 374239.919

    Total 6 32251655.71

    Excel Output for Produce Stores

    SSR

    SSERegression (explained) df

    Degrees of freedom

    Error (residual) df

    Total df

    SST

    64

  • 8/3/2019 DADM-Correlation and Regression

    65/138

    THE COEFFICIENT OF DETERMINATION

    Measures the proportion of variation in Y that is

    explained by the independent variable X in the

    regression model

    2 Regression Sum of Squares

    Total Sum of Squares

    SSRr

    SST

    65

  • 8/3/2019 DADM-Correlation and Regression

    66/138

    COEFFICIENTS OF DETERMINATION (R2)AND CORRELATION (R)

    r2 = 1, r2 = 1,

    r2 = .81, r2 = 0,Y

    Yi= b

    0+ b

    1X

    i

    X

    ^

    YY

    i= b

    0+ b

    1X

    i

    X

    ^Y

    Yi= b0 + b1Xi

    X^

    Y

    Yi= b

    0+ b

    1X

    i

    X

    ^

    r= +1 r= -1

    r= +0.9 r= 0

    66

    T

  • 8/3/2019 DADM-Correlation and Regression

    67/138

    TOPICS

    Standard Error of Estimate

    Assumptions of Simple Linear Regression

    Model

    Residual Analysis

    67

  • 8/3/2019 DADM-Correlation and Regression

    68/138

    STANDARD ERROROF ESTIMATE

    The standard deviation of the variation of

    observations around the regression equation

    2

    1

    2 2

    n

    i

    iYX

    Y YSSE

    Sn n

    68

  • 8/3/2019 DADM-Correlation and Regression

    69/138

    69

    21

    102

    n

    XYbXbY

    S

    n

    iYX

    X = values of the independent variableY = values of the dependent variable

    b0= Y-intercept

    b1= slope of the estimating equation

    n = number of data points

    Finding the Standard Error of Estimate

    I S

  • 8/3/2019 DADM-Correlation and Regression

    70/138

    INFERENCEABOUTTHE SLOPE:t- TEST

    t Test for a Population Slope Is there a linear dependency of Yon X?

    Null and Alternative Hypotheses

    H0: 1 = 0 (No Linear Dependency)

    H1: 1 0 (Linear Dependency)

    Test Statistic

    1

    1

    1 1

    2

    1

    where

    ( )

    YXb

    nb

    i

    i

    b St S

    S X X

    . . 2d f n

    70

  • 8/3/2019 DADM-Correlation and Regression

    71/138

    EXAMPLE: PRODUCE STORE

    Data for 7 Stores:

    Estimated Regression Equation:

    The slope of thismodel is 1.487.

    Does SquareFootage AffectAnnual Sales?

    AnnualStore Square Sales

    Feet ($000)

    1 1,726 3,681

    2 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760

    1636.415 1.487 iY X

    71

    I S

  • 8/3/2019 DADM-Correlation and Regression

    72/138

    INFERENCESABOUTTHE SLOPE:T TEST EXAMPLE

    H0: 1 = 0

    H1: 1 0

    .05df7 - 2 = 5Critical Value(s):

    Test Statistic:

    Decision:

    Conclusion:

    There is evidence thatsquare footage affects

    annual sales.

    t0 2.5706-2.5706

    .025

    Reject Reject

    .025

    Reject H0

    72

    INFERENCES ABOUT THE SLOPE

  • 8/3/2019 DADM-Correlation and Regression

    73/138

    INFERENCESABOUTTHE SLOPE:F TEST

    F Test for a Population Slope

    Is there a linear dependency of Yon X?

    Null and Alternative Hypotheses

    H0: 1 = 0 (No Linear Dependency) H1: 1 0 (Linear Dependency)

    Test Statistic

    Numerator d.f.=1, denominator d.f.=n-2

    1

    2

    SSR

    F SSE

    n

    73

    INFERENCES ABOUT THE SLOPE

  • 8/3/2019 DADM-Correlation and Regression

    74/138

    INFERENCESABOUTTHE SLOPE:CONFIDENCE INTERVAL EXAMPLE

    Confidence Interval Estimate of the Slope:

    11 2n bb t S

    At 95% level of confidence the confidence interval for theslope is (1.062, 1.911). Does not include 0.

    Conclusion: There is a significant linear dependency of

    annual sales on the size of the store.

    74

  • 8/3/2019 DADM-Correlation and Regression

    75/138

    ESTIMATIONOF MEAN VALUES

    Confidence Interval Estimate for :

    The Mean of Ygiven a particular Xi

    2

    22

    1

    ( )1

    ( )

    i

    i n YX n

    i

    i

    X X

    Y t S nX X

    t value from table with

    df=n-2

    Standard error of theestimate

    Size of interval vary according to distanceaway from mean,X

    | iY X X

    75

  • 8/3/2019 DADM-Correlation and Regression

    76/138

    PREDICTIONOF INDIVIDUAL VALUES

    Prediction Interval for Individual ResponseYi at a Particular Xi

    Addition of 1 increases width of interval from that for the

    mean of Y

    2

    22

    1

    ( )1 1

    ( )

    ii n YX n

    i

    i

    X XY t Sn

    X X

    76

  • 8/3/2019 DADM-Correlation and Regression

    77/138

    EXAMPLE: PRODUCE STORES

    Data for 7 Stores:

    Regression Equation

    Obtained:

    Consider a storewith 2000 square

    feet.

    AnnualStore Square Sales

    Feet ($000)

    1 1,726 3,681

    2 1,542 3,395

    3 2,816 6,653

    4 5,555 9,543

    5 1,292 3,318

    6 2,208 5,563

    7 1,313 3,760 1636.415 1.487 iY X 77

  • 8/3/2019 DADM-Correlation and Regression

    78/138

    ESTIMATIONOF MEAN VALUES: EXAMPLE

    Find the 95% confidence interval for the averageannual sales for stores of 2,000 square feet

    2

    22

    1

    ( )1 4610.45 612.66

    ( )

    i

    i n YX n

    i

    i

    X XY t S

    nX X

    Predicted Sales

    Confidence Interval Estimate for| iY X X

    1636.415 1.487 4610.45 $000iY X

    2 52350.29 611.75 2.5706YX nX S t t

    78

  • 8/3/2019 DADM-Correlation and Regression

    79/138

    PREDICTION INTERVALFORY: EXAMPLE

    Find the 95% prediction interval for annual sales ofone particular store of 2,000 square feet

    Predicted Sales)

    2

    22

    1

    ( )1 1 4610.45 1687.68

    ( )

    i

    i n YX n

    i

    i

    X XY t S

    nX X

    Prediction Interval for IndividualiX X

    Y

    1636.415 1.487 4610.45 $000iY X

    2 52350.29 611.75 2.5706YX nX S t t

    79

  • 8/3/2019 DADM-Correlation and Regression

    80/138

    MULTIPLE REGRESSION

    TOPICS

  • 8/3/2019 DADM-Correlation and Regression

    81/138

    TOPICS

    The Multiple Regression Model

    Residual Analysis

    Coefficient of Multiple Determination

    81

  • 8/3/2019 DADM-Correlation and Regression

    82/138

    THE MULTIPLE REGRESSION MODEL

    Relationship between 1 dependent & 2 or moreindependent variables is a linear function

    Population Y-

    intercept

    Population slopes Random

    Error

    Dependent (Response) variable Independent (Explanatory) variables

    1 2i i i k ki iY X X X

    82

  • 8/3/2019 DADM-Correlation and Regression

    83/138

    MULTIPLE REGRESSION EQUATION

    The coefficients of the multiple regression model are estimated using

    sample data

    kik2i21i10i XbXbXbbY

    Estimated(or predicted)value of Y

    Estimated slope coefficients

    Multiple regression equation with k independent variables:

    Estimatedintercept

  • 8/3/2019 DADM-Correlation and Regression

    84/138

    MULTIPLE REGRESSION EQUATION

    Example with

    two independent

    variables

    Y

    X1

    X2

    22110 XbXbbY

    INTERPRETATION OF ESTIMATED

  • 8/3/2019 DADM-Correlation and Regression

    85/138

    INTERPRETATIONOF ESTIMATEDCOEFFICIENTS

    Slope (bi) Estimated that the average value of Ychanges by

    bi for each 1 unit increase in Xiholding all othervariables constant

    Example: If b1 = -2, then fuel oil usage (Y) isexpected to decrease by an estimated 2 gallons foreach 1 degree increase in temperature (X1) giventhe inches of insulation (X2)

    Y-Intercept (b0

    )

    The estimated average value of Ywhen all Xi= 0

    85

  • 8/3/2019 DADM-Correlation and Regression

    86/138

    MULTIPLE REGRESSION MODEL: EXAMPLEOil (Gal) Temp Insulation

    275.30 40 3363.80 27 3

    164.30 40 10

    40.80 73 6

    94.30 64 6

    230.90 34 6366.70 9 6

    300.60 8 10

    237.80 23 10

    121.40 63 3

    31.40 65 10203.50 41 6

    441.10 21 3

    323.00 38 3

    52.50 58 10

    (0F)

    Develop a model for estimatingheating oil used for a single familyhome in the month of January basedon average temperature and amountof insulation in inches.

    86

  • 8/3/2019 DADM-Correlation and Regression

    87/138

    1 2 562.151 5.437 20.012i i iY X X

    MULTIPLE REGRESSION EQUATION: EXAMPLE

    Coefficients

    Intercept 562.1510092

    X Variable 1 -5.436580588X Variable 2 -20.01232067

    ExcelOutput

    For each degree increase in temperature,the estimated average amount of heatingoil used is decreased by 5.437 gallons,holding insulation constant.

    For each increase in one inch ofinsulation, the estimated average useof heating oil is decreased by 20.012gallons, holding temperatureconstant.

    0 1 1 2 2i i i k kiY b b X b X b X

    87

    STANDARD ERROR OF ESTIMATE FOR MULTIPLE

  • 8/3/2019 DADM-Correlation and Regression

    88/138

    STANDARDERROROFESTIMATEFOR MULTIPLEREGRESSION

    The standard error of estimate of dependent variableY on independent variables

    1

    2

    kn

    YYs e

  • 8/3/2019 DADM-Correlation and Regression

    89/138

    COEFFICIENTOFMULTIPLEDETERMINATION

    Proportion of Total Variation in Y Explained by All XVariables Taken Together

    Never Decreases When a New X Variable is Addedto Model

    2

    12

    Explained Variation

    Total VariationY k

    SSRr

    SST

    89

    COEFFICIENTOF MULTIPLE DETERMINATION

  • 8/3/2019 DADM-Correlation and Regression

    90/138

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F Significance FRegression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coefficients

    StandardError t Stat P-value Lower 95%

    Upper95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.4640

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.7088

    .5214856493.3

    29460.0

    SST

    SSRR2

    52.1% of the variation in pie sales is

    explained by the variation in price

    and advertising

    ADJUSTED COEFFICIENT OF MULTIPLE

  • 8/3/2019 DADM-Correlation and Regression

    91/138

    ADJUSTEDCOEFFICIENTOF MULTIPLEDETERMINATION

    Adding additional variables will necessarily reduce the SSE and increase the r2.To account for this, the

    adjusted coefficient of determination given by

    Proportion of Variation in Y Explained by All XVariables Adjusted for the Number of XVariables

    Used and Sample Size Penalizes Excessive Use of Independent Variables

    Smaller than

    Useful in Comparing among Models having different

    exploratory variables

    2 2 12 11 11

    adj Y k nr r

    n k

    2

    12Y kr

    91

    ADJUSTED R2

  • 8/3/2019 DADM-Correlation and Regression

    92/138

    Regression Statistics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS F SignificanceFRegression 2 29460.027 14730.01 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coefficients

    StandardError t Stat P-value Lower 95%

    Upper95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    .44172r2adj

    44.2% of the variation in pie sales is explained by the

    variation in price and advertising, taking into account

    the sample size and number of independent variables

  • 8/3/2019 DADM-Correlation and Regression

    93/138

    COEFFICIENTOF MULTIPLE DETERMINATION

    R eg ressio n S tatist ics

    M u l t ip le R 0 . 9 8 2 6 5 4 7 5 7

    R S q u a re 0 . 9 6 5 6 1 0 3 7 1

    A d ju s t e d R S q u a re 0 . 9 5 9 8 7 8 7 6 6

    S t a n d a rd E rro r 2 6 . 0 1 3 7 8 3 2 3

    O b s e rva t io n s 1 5

    Excel Output2

    12Y

    SSRr

    SST

    Adjusted r2

    reflects the numberof explanatoryvariables and sample

    size

    is smaller than r2

    93

    INTERPRETATION OF COEFFICIENT OF

  • 8/3/2019 DADM-Correlation and Regression

    94/138

    INTERPRETATIONOF COEFFICIENTOFMULTIPLE DETERMINATION

    96.56% of the total variation in heating oil can beexplained by temperature and amount of insulation

    95.99% of the total fluctuation in heating oil can beexplained by temperature and amount of insulationafter adjusting for the number of explanatoryvariables and sample size

    212 .9656Y SSRr

    SST

    2

    adj .9599r

    94

    USING THE REGRESSION EQUATION TO

  • 8/3/2019 DADM-Correlation and Regression

    95/138

    USING THE REGRESSION EQUATIONTOMAKE PREDICTIONS

    Predict the amount of heating oil used for ahome if the average temperature is 300 and theinsulation is 6 inches.

    The predicted heatingoil used is 278.97

    gallons

    1 2 562.151 5.437 20.012

    562.151 5.437 30 20.012 6

    278.969

    i i iY X X

    95

  • 8/3/2019 DADM-Correlation and Regression

    96/138

    RESIDUAL PLOTS

    Residuals Vs

    Residuals Vs

    Residuals Vs

    Residuals Vs Time

    May have autocorrelation

    Y

    1X

    2X

    96

  • 8/3/2019 DADM-Correlation and Regression

    97/138

    RESIDUAL PLOTS: EXAMPLE

    Insulation R esidual Plot

    0 2 4 6 8 10 1 2

    No Discernible Pattern

    Temperature Residual Plot

    -60

    -40

    -20

    0

    20

    40

    60

    0 20 40 60 80Residual

    s

    Maybe some non-linear relationship

    97

  • 8/3/2019 DADM-Correlation and Regression

    98/138

    TESTINGFOR OVERALL SIGNIFICANCE

    Shows if there is a Linear Relationship between all ofthe X Variables together and Y

    Use F test Statistic

    Hypotheses: H0: k=0 (No linear relationship)

    H1: At least onei ( At least one independent variableaffects Y)

    The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected

    98

  • 8/3/2019 DADM-Correlation and Regression

    99/138

    TESTINGFOR OVERALL SIGNIFICANCE

    Test Statistic:

    where F has k numerator and (n-k-1)

    denominator degrees of freedom

    (continued)

    all /

    all

    SSR k MSR

    F MSE MSE

    99

    TESTFOR OVERALL SIGNIFICANCE

  • 8/3/2019 DADM-Correlation and Regression

    100/138

    EXCEL OUTPUT: EXAMPLE

    ANOVA

    df SS MS F Significance F

    Regression 2 228014.6 114007.3 168.4712 1.65411E-09

    Residual 12 8120.603 676.7169

    Total 14 236135.2

    k= 2, the number ofexplanatory variables n- 1

    p value

    Test StatisticMSR

    FMSE

    100

    TESTFOR OVERALL SIGNIFICANCE

  • 8/3/2019 DADM-Correlation and Regression

    101/138

    EXAMPLE SOLUTION

    F0 3.89

    H0:1 =2= =k=0H1: At least onei0

    = .05

    df = 2 and 12

    Critical Value:

    Test Statistic:

    Decision:

    Conclusion:

    Reject at = 0.05

    There is evidence that atleast one independentvariable affects Y

    = 0.05

    F 168.47(Excel Output)

    101

    TESTFOR SIGNIFICANCE:

  • 8/3/2019 DADM-Correlation and Regression

    102/138

    INDIVIDUAL VARIABLES

    Shows if There is a Linear Relationship Between

    the Variable Xiand Y

    Use t Test Statistic

    Hypotheses:

    H0:i 0 (No linear relationship)

    H1:i 0 (Linear relationship between Xiand Y)

    102

  • 8/3/2019 DADM-Correlation and Regression

    103/138

    t TEST STATISTIC OUTPUT: EXAMPLE

    Coefficients Standard Error t Stat

    Intercept 562.1510092 21.09310433 26.65093769

    X Variable 1 -5.436580588 0.336216167 -16.16989642

    X Variable 2 -20.01232067 2.342505227 -8.543127434

    tTest Statistic for X1(Temperature)

    tTest Statistic for X2(Insulation)

    i

    i

    b

    bt

    S

    103

  • 8/3/2019 DADM-Correlation and Regression

    104/138

    T TEST : EXAMPLE SOLUTION

    H0: 1 = 0H1: 1 0df = 12

    Critical Values:

    Test Statistic:

    Decision:

    Conclusion:

    Reject H0 at = 0.05

    There is evidence of asignificant effect oftemperature on oilconsumption.

    t0 2.1788-2.1788

    .025

    Reject H0 Reject H0

    .025

    Does temperature have a significant effect onmonthly consumption of heating oil? Test at =

    0.05.

    tTest Statistic = -16.1699

    104

    CONFIDENCE INTERVAL ESTIMATE

  • 8/3/2019 DADM-Correlation and Regression

    105/138

    FORTHE SLOPE

    Confidence interval for the population slope i

    Example: Form a 95% confidence interval for the effect of changes inprice (X1) on pie sales, holding constant the effects of advertising:

    -24.975 (2.1788)(10.832): So the interval is (-48.576, -1.374)

    ib1kniStb

    Coefficients Standard Error

    Intercept 306.52619 114.25389

    Price -24.97509 10.83213

    Advertising 74.13096 25.96732

    where t has(n k 1) d.f.

    Here, t has

    (15 2 1) = 12 d.f.

    ASSUMPTIONSOF REGRESSION

  • 8/3/2019 DADM-Correlation and Regression

    106/138

    1

    06

    Linearity

    The relationship between X and Y is linear

    Independence of Errors

    Error values are statistically independent

    Normality of ErrorError values are normally distributed for any givenvalue of X

    Equal Variance (also called homoscedasticity)

    The probability distribution of the errors has constantvariance

    L.I.N.E

    VARIATION OF ERRORS AROUND THE

  • 8/3/2019 DADM-Correlation and Regression

    107/138

    1

    07

    VARIATION OF ERRORS AROUND THEREGRESSION LINE

    Y values are normally distributedaround the regression line.

    For each Xvalue, the spread orvariance around the regression line is

    the same.

    X1

    X2

    f()

    Sample Regression Line

    PURPOSESOFRESIDUALANALYSIS

  • 8/3/2019 DADM-Correlation and Regression

    108/138

    1

    08

    Examine for linearity assumption

    Examine for constant variance for all levels of x

    Evaluate normal distribution assumption

    GRAPHICAL ANALYSISOF RESIDUALS

    Can plot residuals vs. x

    Can create histogram of residuals to check for

    normality

    RESIDUAL ANALYSIS

  • 8/3/2019 DADM-Correlation and Regression

    109/138

    1

    09

    RESIDUAL ANALYSIS

    The residual for observation i, ei, is the difference between

    its observed and predicted value

    Check the assumptions of regression by examining theresiduals

    Examine for Linearity assumption

    Evaluate Independence assumption

    Evaluate Normal distribution assumption Examine Equal variance for all levels of X

    Graphical Analysis of Residuals

    Can plot residuals vs. X

    iii YYe

  • 8/3/2019 DADM-Correlation and Regression

    110/138

    1

    10

    RESIDUAL ANALYSISFOR LINEARITY

    Not Linear Linear

    x

    residu

    als

    x

    Y

    x

    Y

    x

    residu

    als

  • 8/3/2019 DADM-Correlation and Regression

    111/138

    1

    11

    RESIDUAL ANALYSISFOR INDEPENDENCE

    Not Independent Independent

    X

    Xresidua

    ls

    residuals

    X

    residuals

  • 8/3/2019 DADM-Correlation and Regression

    112/138

    1

    12

    CHECKINGFOR NORMALITY

    Examine the Stem-and-Leaf Display of the Residuals Examine the Box-and-Whisker Plot of the Residuals

    Examine the Histogram of the Residuals

    Construct a Normal Probability Plot of the Residuals

  • 8/3/2019 DADM-Correlation and Regression

    113/138

    1

    13

    RESIDUAL ANALYSISFOR EQUAL VARIANCE

    Unequal variance Equal variance

    x x

    Y

    x x

    Y

    residu

    als

    residu

    als

    LINEAR REGRESSION EXAMPLE EXCEL RESIDUALO

  • 8/3/2019 DADM-Correlation and Regression

    114/138

    1

    14

    OUTPUT

    House Price Model Residual Plot

    -60

    -40

    -20

    0

    20

    40

    60

    80

    0 1000 2000 3000

    Square Feet

    Residuals

    RESIDUAL OUTPUT

    PredictedHousePrice Residuals

    1 251.92316 -6.923162

    2 273.87671 38.12329

    3 284.85348 -5.853484

    4 304.06284 3.937162

    5 218.99284 -19.99284

    6 268.38832 -49.38832

    7 356.20251 48.79749

    8 367.17929 -43.17929

    9 254.6674 64.33264

    10 284.85348 -29.85348

    Does not appear to violate

    any regression assumptions

    AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    115/138

    One of the assumption on regression model is that the errorsEi and Ej, associated with the ith and jth observation areuncorrelated

    Autocorrelation is correlation of the errors (residuals) overtime

    115

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

    T

  • 8/3/2019 DADM-Correlation and Regression

    116/138

    The standard test statistic for autocorrelation of the AR(1) type is theDurbinWatson dstatistic, computed from the residuals as shown above.Most regression applications calculate it automatically and present it asone of the standard regression diagnostics.

    T

    t

    t

    t

    tt

    e

    ee

    d

    1

    2

    2

    2

    1)(

    116

    T

    2

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    117/138

    In large samples

    2

    It can be shown that in large samples dtends to 2 2, whereis theparameter in the AR(1) relationship ut=ut1 + t.

    T

    t

    t

    t

    tt

    e

    ee

    d

    1

    2

    2

    2

    1)(

    22d

    117

    T

    2

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    118/138

    In large samples

    No autocorrelation

    If there is no autocorrelation,is 0 and dshould be distributed randomlyaround 2.

    T

    t

    t

    t

    tt

    e

    ee

    d

    1

    2

    2

    2

    1)(

    22d2d

    118

    T

    2

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    119/138

    In large samples

    No autocorrelation

    Severe positive autocorrelation

    If there is severe positive autocorrelation,will be near 1 and dwill benear 0.

    T

    t

    t

    t

    tt

    e

    ee

    d

    1

    2

    2

    2

    1)(

    22d2d

    0d

    119

    T

    2

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    120/138

    In large samples

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    Likewise, if there is severe positive autocorrelation,will be near1 anddwill be near 4.

    T

    t

    t

    t

    tt

    e

    ee

    d

    1

    2

    2

    2

    1)(

    22d2d

    0d

    4d

    120

    i i i

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    121/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    Thus dbehaves as illustrated graphically above.

    2d

    0d

    4d

    2 40

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    121

    iti ti

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    122/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    To perform the DurbinWatson test, we define critical values of d. The nullhypothesis is H0:= 0 (no autocorrelation). If dlies between these values,we do not reject the null hypothesis.

    2d

    0d

    4d

    2 40

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit dcrit

    122

    iti ti

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    123/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    The critical values, at any significance level, depend on the number ofobservations in the sample and the number of explanatory variables.

    2d

    0d

    4d

    2 40

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit dcrit

    123

    iti ti

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    124/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    Unfortunately, they also depend on the actual data for the explanatoryvariables in the sample, and thus vary from sample to sample.

    2d

    0d

    4d

    2 40 dcrit

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit

    124

    positi e negati eno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    125/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    However Durbin and Watson determined upper and lower bounds, dUanddL, for the critical values, and these are presented in standard tables.

    2d

    0d

    4d

    2 40 dL dUdcrit

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit

    125

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    126/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If dis less than dL, it must also be less than the critical value of dforpositive autocorrelation, and so we would reject the null hypothesis andconclude that there is positive autocorrelation.

    2d

    0d

    4d

    2 40 dL dUdcrit

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit

    126

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    127/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If dis above than dU, it must also be above the critical value of d, and so wewould not reject the null hypothesis. (Of course, if it were above 2, weshould consider testing for negative autocorrelation instead.)

    2d

    0d

    4d

    2 40 dL dUdcrit

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit

    127

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    128/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If dlies between dL and dU, we cannot tell whether it is above or below thecritical value and so the test is indeterminate.

    2d

    0d

    4d

    2 40 dL dUdcrit

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    dcrit

    128

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    129/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    Here are dL and dUfor 45 observations and two explanatory variables, atthe 5% significance level.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62

    (n= 45, k= 3, 5% level)

    129

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    130/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    There are similar bounds for the critical value in the case of negativeautocorrelation. They are not given in the standard tables becausenegative autocorrelation is uncommon, but it is easy to calculate them

    because are they are located symmetrically to the right of 2.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    2.38 2.571.43 1.62

    (n= 45, k= 3, 5% level)

    130

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    131/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    So if d< 1.43, we reject the null hypothesis and conclude that there ispositive autocorrelation.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62 2.38 2.57

    (n= 45, k= 3, 5% level)

    131

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    132/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If 1.43 < d< 1.62, the test is indeterminate and we do not come to anyconclusion.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62 2.38 2.57

    (n= 45, k= 3, 5% level)

    132

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    133/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If 1.62 < d< 2.38, we do not reject the null hypothesis of no autocorrelation.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62 2.38 2.57

    (n= 45, k= 3, 5% level)

    133

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    134/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If 2.38 < d< 2.57, we do not come to any conclusion.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62 2.38 2.57

    (n= 45, k= 3, 5% level)

    134

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    135/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    If d> 2.57, we conclude that there is significant negative autocorrelation.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.43 1.62 2.38 2.57

    (n= 45, k= 3, 5% level)

    135

    positive negativeno

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    136/138

    No autocorrelation

    Severe positive autocorrelation

    Severe negative autocorrelation

    Here are the bounds for the critical values for the 1% test, again with 45observations and two explanatory variables.

    2d

    0d

    4d

    2 40 dL dU

    positiveautocorrelation

    negativeautocorrelation

    noautocorrelation

    1.24 1.42 2.58 2.76

    (n= 45, k= 3, 1% level)

    136

    0.04

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

  • 8/3/2019 DADM-Correlation and Regression

    137/138

    Here is a plot of the residuals from a logarithmic regression of expenditure on housing services onincome and the relative price of housing services. The residuals exhibit strong positiveautocorrelation.

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    1959 1963 1967 1971 1975 1979 1983 1987 1991 1995 1999 2003

    137

    ============================================================

    Dependent Variable: LGHOUS

    DURBINWATSON TEST FOR AR(1) AUTOCORRELATION

    d d

  • 8/3/2019 DADM-Correlation and Regression

    138/138

    Dependent Variable: LGHOUS

    Method: Least Squares

    Sample: 1959 2003

    Included observations: 45============================================================

    Variable Coefficient Std. Error t-Statistic Prob.

    ============================================================

    C 0.005625 0.167903 0.033501 0.9734

    LGDPI 1.031918 0.006649 155.1976 0.0000

    LGPRHOUS -0.483421 0.041780 -11.57056 0.0000

    ============================================================

    R-squared 0.998583 Mean dependent var 6.359334

    Adjusted R-squared 0.998515 S.D. dependent var 0.437527

    S.E. of regression 0.016859 Akaike info criter-5.263574

    Sum squared resid 0.011937 Schwarz criterion -5.143130

    Log likelihood 121.4304 F-statistic 14797.05

    Durbin-Watson stat 0.633113 Prob(F-statistic) 0.000000============================================================

    dL dU1.24 1.42

    (n= 45, k= 3, 1% level)