8 Identifying Relationships

Embed Size (px)

Citation preview

  • 7/28/2019 8 Identifying Relationships

    1/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R

    2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Applied Marketing(Market Research Methods)

    Topic 8:

    Identifying relationships

    Dr James Abdey

    http://find/
  • 7/28/2019 8 Identifying Relationships

    2/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R

    2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Overview

    We consider regression analysis which is used for

    explaining variation in market share, sales, brand

    preference etc.

    This may use explanatory variables such as

    advertising, price, distribution and product quality

    Starting with correlation, we proceed to the simple

    linear model followed by multiple linear regression

    http://find/
  • 7/28/2019 8 Identifying Relationships

    3/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R

    2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables

    We now investigate the relationship between two

    variables

    When we have data on two variables (X and Y), we

    have bivariate data

    We will consider how to:

    measure the strength of the relationship

    model the relationship

    predict the value of one variable on the basis of theother

    http://find/
  • 7/28/2019 8 Identifying Relationships

    4/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R

    2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables

    First thing to do with data is to provide a graphical

    representation

    For one variable this might be a histogram,

    stem-and-leaf diagram etc.

    For two variables we produce a scatter diagram

    This must include the following:

    title axis labels units and be accurate!

    http://find/
  • 7/28/2019 8 Identifying Relationships

    5/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R

    2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables

    Assume that we have some data in paired form:

    (xi, yi), i= 1,2, . . . , n

    An example might be unemployment and crime

    figures for 12 areas of a city, of interest to insurers in

    setting policy premia for people insuring against theft

    Unemp., x 2614 1160 1055 1199 2157 2305

    Offences, y 6200 4610 5336 5411 5808 6004

    Unemp., x 1687 1287 1869 2283 1162 1201

    Offences, y 5420 5588 5719 6336 5103 5268

    http://find/
  • 7/28/2019 8 Identifying Relationships

    6/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables We plot X on the horizontal axis, and Y on the

    vertical axis

    This emphasises any relationship between thevariables

    x

    x

    xx

    x

    x

    x

    x

    x

    x

    x

    x

    1000 1500 2000 2500

    5000

    5500

    6000

    Scatter plot of Crime against Unemployment

    Unemployment

    Numberofoffences

    http://find/
  • 7/28/2019 8 Identifying Relationships

    7/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables

    A positive, linear relationship is apparent

    X and Y increase together, roughly linearly

    Hence the implied linear relationship is not exact

    The points do not lie exactly on a straight line

    Such an upward shape is termed positive

    correlation

    We will see later how to quantify correlation

    http://find/
  • 7/28/2019 8 Identifying Relationships

    8/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Relationship between two variables

    Other examples of scatter plots include:

    LHS: Negative correlation (Y decreases as X increases)RHS: Uncorrelated data (no obvious (linear) relationshipbetween X and Y)

    xx

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    2 4 6 8

    2

    4

    6

    8

    Scatter plot

    x

    y

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    0 2 4 6 8

    2

    4

    6

    8

    Scatter plot

    x

    y

    http://find/
  • 7/28/2019 8 Identifying Relationships

    9/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    Correlation measures the strength of the linearrelationship between two variables, each measured

    on an interval scale

    Positive correlation the two variables tend to

    vary in the same direction

    Negative correlation the two variables tend to

    vary in the opposite direction

    Perfect correlation the two variables have points

    which all lie exactly on a straight line

    http://find/
  • 7/28/2019 8 Identifying Relationships

    10/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    If there exists a perfect linear relationship between

    X and Y, we can represent them using an equationof the form

    Y = + X

    represents the intercept of the line

    represents the slope or gradient of the line Examples of anticipated correlation:

    Variables Correlation

    Height & weight Positive

    Rainfall & sunshine hours NegativeIce cream sales & sun cream sales Positive

    Hours of study & exam mark Positive

    Cars petrol consumption & goals scored Zero

    http://goforward/http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    11/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    Positive correlation: large X with large Y; small Xwith small Y

    Negative correlation: large X with small Y; small X

    with large Y

    However, since the X and Y may have widely

    different numerical values we need to take this into

    account

    We do this by considering how far away from the

    means the two scores are

    http://find/
  • 7/28/2019 8 Identifying Relationships

    12/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    So, we are interested in the degree to which

    variations in variable values are related to each other

    Our basis for the measurement of correlation is

    n

    i=1

    (xi x)(yi y) =

    n

    i=1

    xiyi nxy

    Unfortunately, this measure is extremely sensitive to

    the units in which the variables are measured

    We would prefer a measure of correlation to remain

    the same regardless of the units of measurement

    (e.g. days, hours, minutes or seconds)

    http://find/
  • 7/28/2019 8 Identifying Relationships

    13/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    So, we use the following to measure the correlation

    for (sample) data

    r = xiyi nxy(

    x2i nx2) (

    y2i ny

    2)

    =

    (xi x)(yi y)

    (xi x)2

    (yi y)2

    http://find/
  • 7/28/2019 8 Identifying Relationships

    14/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    Returning to the unemployment/crime dataset:xi = 19979,

    x2i = 36695129,

    yi = 66803,

    y2i = 374471231,

    xiyi = 113784494

    Since n= 12, we have x = 19979/12 = 1664.92 andy = 66803/12 = 5566.92

    Hence the (sample) correlation coefficient, r, is

    r = 0.861

    Of course, in practise we can software like SPSS to

    calculate r for us!

    f

    http://find/
  • 7/28/2019 8 Identifying Relationships

    15/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Correlation

    The (sample) correlation coefficient, r, takes values

    between 1 and 1, i.e.

    1 r 1

    r> 0 indicates positive correlation, with r = 1indicating perfect positive correlation

    r< 0 indicates negative correlation, with r = 1

    indicating perfect negative correlation

    The closer |r| is to 1, the stronger the linearrelationship is

    Id tif iR i

    http://find/
  • 7/28/2019 8 Identifying Relationships

    16/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Regression

    Here we introduce simple linear regression

    Only part of a very large topic in statistical analysis

    In the simple model, we have two variables Y and X:

    Y is the dependent (or response) variable thatwhich we are trying to explain using:

    X, the independent (or explanatory) variable the factor we think influences Y

    IdentifyingTh i l li i d l

    http://find/
  • 7/28/2019 8 Identifying Relationships

    17/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    The simple linear regression model

    Assume a true (population) linear relationshipbetween a response variable y and an explanatory

    variable x of the approximate form:

    y = + x

    and are fixed, but unknown, populationparameters

    is the y-intercept

    is the slope of the line

    We seek to estimate and using (paired) sampledata (xi, yi), i= 1, . . . , n

    IdentifyingTh i l li i d l

    http://find/
  • 7/28/2019 8 Identifying Relationships

    18/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    The simple linear regression model

    Particularly in business, we would not expect aperfect linear relationship between the two variables

    Hence we modify this basic model to

    y = + x+

    is some random perturbation from the initialapproximate line

    In other words, each y observation almost lies on the

    postulated line, but jumps off the line according to

    the random variable

    Often referred to as the error term

    IdentifyingP t ti ti Th l t

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    19/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Parameter estimation The least

    squares method

    For given sample data we could produce a scatter

    plot

    Any linear relationship would be visible

    This would suggest performing a (simple) linear

    regression

    We estimate the population regression line

    This estimated line is often termed the line of best fit

    IdentifyingParameter estimation The least

    http://find/
  • 7/28/2019 8 Identifying Relationships

    20/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Parameter estimation The least

    squares method

    How do we choose the line of best fit?

    We require a formal criterion for determining the line

    of best fit

    Estimation of and will be by least squaresestimation

    Specifically, we seek to minimise the sum of thesquared residuals, where a residual is the difference

    between the true y value and its predicted (fitted)

    value

    IdentifyingParameter estimation The least

    http://find/
  • 7/28/2019 8 Identifying Relationships

    21/50

    Identifyingrelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Parameter estimation The least

    squares method

    The least squares estimator for is

    =

    xiyi nxyx2i nx

    2

    The least squares estimators for is

    = y x

    Hence the line of best fit has equation:

    y = + x

    Again, this is routinely calculated in SPSS

    IdentifyingExample

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    22/50

    y grelationships

    Dr James Abdey

    Overview

    Relationship between twovariables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Example

    Returning to the unemployment/crime dataset

    xi = 19979,

    x2i = 36695129,

    yi = 66803,

    y2i = 374471231,

    xiyi = 113784494

    Since n= 12, we have x = 19979/12 = 1664.92 andy = 66803/12 = 5566.92, hence

    =

    xiyi nxy

    x2i nx

    2

    =113784494 (12 1664.92 5566.92)

    36695129 (12 1664.922)

    = 0.7468

    IdentifyingExample

    http://find/
  • 7/28/2019 8 Identifying Relationships

    23/50

    y grelationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Example

    We estimate the intercept to be

    = y x

    = 5566.92 0.7468 1664.92

    = 4323.6 Hence the least squares regression line is

    y = 4323.6 + 0.7468x

    Note the y notation, where the hat denotes anestimated value

    IdentifyingInterpretation of correlation

    http://find/
  • 7/28/2019 8 Identifying Relationships

    24/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Interpretation of correlation

    coefficient

    In the case of perfect correlation between X and Y,

    we can predict Y directly and exactly from X

    In the case of zero correlation between X and Y,knowledge of X tells us nothing about Y

    Here we consider measuring the extent to which the

    values of one variable can be used to predict the

    values of another where the correlation is neither 1,nor 0, nor 1

    IdentifyingInterpretation of correlation

    http://find/
  • 7/28/2019 8 Identifying Relationships

    25/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Interpretation of correlation

    coefficient

    Our overall objective is to explain the response

    variable Y, which is a random variable

    We try to explain the variation in Y

    Using simple linear regression, we attempt this using

    a single explanatory variable, X

    The total variation in the response variable sample

    data is simplyn

    i=1

    (yi y)2

    We term this the total sum of squares (TSS)

    Identifyingl ti hiInterpretation of correlation

    http://find/
  • 7/28/2019 8 Identifying Relationships

    26/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Interpretation of correlation

    coefficient

    We can decompose TSS into two components:

    the amount we are able to explain using the modelcalled the explained sum of squares (ESS);

    and the remaining variation that we are unable toexplain with the model, called the residual sum ofsquares (RSS)

    Hence,

    TSS = ESS+ RSS

    IdentifyingrelationshipsCoefficient of determination R2

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    27/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Coefficient of determination, R

    We can assess the overall fit of a model using R2

    This measures the proportion of the total variabilityin the response variable explained by the model

    This statistic is known as the coefficient of

    determination and is denoted R2 and defined as

    R2 = ESSTSS

    0 R2 1

    The closer R2 is to 1, the better the explanatory

    power of the model

    Note that R2 = r2 for a simple linear model, so wecan also compute it from r (correlation coefficient)

    IdentifyingrelationshipsCoefficient of determination R2

    http://find/
  • 7/28/2019 8 Identifying Relationships

    28/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Coefficient of determination, R

    Returning to the crime/unemployment dataset, lets

    assign Y and X as follows Y = number of offences X = unemployment

    The least squares regression line was

    y = 4323.6 + 0.7468x

    The correlation coefficient was 0.861, therefore

    R2 = 0.8612 = 0.7413

    This means we can explain 74.13% of the variation

    in number of offences using unemployment

    IdentifyingrelationshipsPrediction

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    29/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Prediction

    One of the purposes in calculating the line of best fitis prediction

    Specifically, for some value of x, we can provide a

    prediction for y

    So, returning to the example, how many offenceswould you predict if there were 2000 unemployed

    people in a city area?

    Answer: just substitute the desired value of x into the

    least squares regression line:

    y = 4323.6 + 0.7468 2000 = 5817

    IdentifyingrelationshipsPrediction

    http://find/
  • 7/28/2019 8 Identifying Relationships

    30/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Prediction

    Provided we are predicting y for an x value that is

    within the available xdata, then we can be fairlyconfident in the prediction

    This is what we call interpolation

    However, if we base our prediction on an x value

    outside the available x data, then we should view

    the prediction with caution

    This would be an example of extrapolation which is

    risky since the relationship between x and y may

    change for such values of x

    IdentifyingrelationshipsRegression diagnostics

    http://find/
  • 7/28/2019 8 Identifying Relationships

    31/50

    relationships

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Regression diagnostics

    The usefulness of a fitted regression model rests on

    a basic assumption:

    E(y) = + x

    Furthermore inference such as the hypothesis tests,

    confidence intervals and predictive intervals onlymake sense if the error terms are (approximately)

    independent and normal with constant variance 2

    Therefore it is important to check these conditions

    are met in practice this task is called regression

    diagnostics

    Basic idea: Looking into the residualsi or thenormalised residuals

    i/

    IdentifyingrelationshipsRegression diagnostics

    http://find/
  • 7/28/2019 8 Identifying Relationships

    32/50

    p

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    g g

    What to look for?

    Do the residuals manifest IID normal behaviour?

    Is the scatter plot of

    i versus xi patternless?

    Is the scatter plot ofi versusyi patternless? Is the scatter plot ofi versus i patternless?

    If you see trends, periodic patterns, increasingvariation in any one of the above scatter plots, it is

    very likely that at least one assumption is violated!

    IdentifyingrelationshipsRegression diagnostics

    http://goforward/http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    33/50

    p

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    g g

    Two other issues in regression diagnostics: outliers

    and influential observations

    Outlier: An unusually small or unusually large yiwhich lies outside of the majority of observations

    An outlier is often caused by an error in either

    sampling or recording data. If so, we should correct itbefore proceeding with the regression analysis

    If an observation which looks like an outlier indeed

    belongs to the sample and no errors in sampling or

    recording were discovered, we may use a more

    complex model or distribution to accommodate this

    outlier. For example, stock returns often exhibit

    extreme values and they often cannot be modelled

    satisfactorily by a normal regression model

    IdentifyingrelationshipsRegression diagnostics

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    34/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    g g

    Influential observation: An xi which is far away

    from other xis Such an observation may have a large influence on

    the fitted regression line

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    35/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    g p

    We apply the simple linear regression method to

    study the relationship between two series of financial

    returns: a regression of Cisco Systems stock returns,y, on S&P500 Index returns, x

    This regression model is an example of the CAPM

    (Capital Asset Pricing Model)

    Stock returns:

    Return =Current price Previous price

    Previous price

    log

    current price

    previous price

    when the difference between the two prices is small

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    36/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Remark: Daily prices are definitely not independent.

    However, daily returns may be seen as a sequence

    of uncorrelated random variables

    For S&P500, the average daily return is -0.04%, the

    maximum daily return is 4.46%, the minimum daily

    return is -6.01%, and the standard deviation is 1.40%

    For Cisco, the average daily return is -0.13%, the

    maximum daily return is 15.42%, the minimum daily

    return is -13.44%, and the standard deviation is

    4.23%

    Descriptive Statistics

    N Range Minimum Maximum Mean Std. Deviation Variance

    SP500 252 10.66 -6.00 4.65 -.0424 1.40017 1.960

    Cisco 252 28.85 -13.44 15.42 -.1336 4.23419 17.928

    Valid N (listwise) 252

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    37/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Remark: Cisco is much more volatile than the

    S&P500

    There is clear synchronisation between the

    movements of the two series of returns

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    38/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    We fit a regression model:

    Cisco = + S&P500 +

    Rationale: Part of the fluctuation in Cisco returns

    was driven by the fluctuation of the S&P500 returns

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    39/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    Coefficientsa

    Model Unstandardized Coefficients Standardized

    Coefficients

    t Sig. 95.0% Confidence Interval for B

    B Std. Error Beta Lower Bound Upper Bound

    1(Constant) -.012 .064 -.188 .851 -.139 .114

    Cisco .227 .015 .687 14.943 .000 .197 .257

    a. Dependent Variable: SP500

    Model Summaryb

    Model R R Square Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .687a

    .472 .470 1.01964

    a. Predictors: (Constant), Cisco

    b. Dependent Variable: SP500

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    40/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    When testing the statistical significance ofregression coefficients, we just need to look at the

    p-value

    The smaller the p-value, the more significant the

    result, i.e. that the true parameter value is different

    from zero

    In practice, we treat p-values smaller than 0.05 as

    being statistically significant (at the 5% significancelevel)

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    41/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    The estimated slope:

    = 2.077. The null hypothesis

    H0 : = 0 is rejected with p-value 0.000: extremelysignificant

    Attempted interpretation: When the market index

    goes up by 1%, Cisco stock goes up by 2.077%, on

    average. However, the error term in the model islarge with an estimated = 3.08%

    The p-value for testing H0 : = 0 is 0.815, so we

    cannot reject the hypothesis that = 0

    Recall = yx and both y and x are very close to0

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    42/50

    Dr James Abdey

    Overview

    Relationship between two

    variables

    Correlation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    R2 = 47.2% of the variation of Cisco stock may beexplained by the variation of the S&P500 index, or in

    other words 47.2% of the risk in Cisco stock is the

    market-related risk see CAPM below

    CAPM: A simple asset pricing model in finance:

    yi = + xi + i

    where yi is a stock return and xi is a market return attime i

    IdentifyingrelationshipsRegression: Worked example

    http://find/
  • 7/28/2019 8 Identifying Relationships

    43/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    Total risk of the stock:

    1

    n

    ni=1

    (yi y)2 =

    1

    n

    ni=1

    (yi y)2 + 1n

    ni=1

    (yiyi)2 Market-related (or systematic) risk:

    1

    n

    ni=1

    (yi y)2 = 1n2 n

    i=1

    (xi x)2

    Firm-specific risk:

    1

    n

    ni=1

    (yiyi)2

    IdentifyingrelationshipsRegression: Worked example

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    44/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    measures the market-related (or systematic) risk ofthe stock

    Market-related risk is unavoidable, while firm-specific

    risk may be diversified away through hedging

    Variance is a simple measure (and one of the most

    frequently used) for risk in finance

    Identifyingrelationships

    D J Abd

    Multiple linear regression

    http://find/
  • 7/28/2019 8 Identifying Relationships

    45/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    Previously we saw simple linear regression

    That had one explanatory variable

    Often one explanatory variable is not enough to

    explain variation in the response variable

    So we add more linear explanatory variables

    Identifyingrelationships

    Dr James Abdey

    Multiple linear regression

    e amples

    http://find/
  • 7/28/2019 8 Identifying Relationships

    46/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    examples

    Absenteeism in the workforce could be due to:

    hours worked flexibility in work practice

    salary paid...

    Salary for managers could be related to:

    qualifications experience hours worked performance...

    Identifyingrelationships

    Dr James Abdey

    Multiple linear regression

    http://find/
  • 7/28/2019 8 Identifying Relationships

    47/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    Remember the aim of statistics is prediction anddecision making

    In order to make the best predictions and decisions

    we need to use the best models

    This often means making more complex models

    adding more explanation

    But not too complex (Occams razor)

    Identifyingrelationships

    Dr James Abdey

    The multiple linear model

    http://find/
  • 7/28/2019 8 Identifying Relationships

    48/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    PredictionRegression diagnostics

    Worked example

    Multiple linear regression

    Suppose y is the managers salary

    x1 = qualifications, x2 = experience, x3 = hours, x4 =

    performance

    y = 0 + qualx1 + expx2 + hrsx3 + perx4 +

    We can visualise up to n= 3

    Identifyingrelationships

    Dr James Abdey

    The multiple linear model

    http://find/
  • 7/28/2019 8 Identifying Relationships

    49/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Identifyingrelationships

    Dr James Abdey

    The multiple linear model

    http://find/http://goback/
  • 7/28/2019 8 Identifying Relationships

    50/50

    Dr James Abdey

    Overview

    Relationship between two

    variablesCorrelation

    Regression

    The simple linear

    regression model

    Parameter estimation

    Interpretation of correlation

    coefficient

    Coefficient of determination,

    R2

    Prediction

    Regression diagnostics

    Worked example

    Multiple linear regression

    Multiple linear regression uses least squares

    estimation like simple linear regression

    That is, we minimise the sum of the squared

    residuals in all dimensions

    Sounds tricky, but fortunately software (SPSS etc.)

    takes care of that for us

    http://find/