58

Regression for class teaching

Embed Size (px)

Citation preview

Page 1: Regression for class teaching
Page 2: Regression for class teaching

Regression

Understanding

Example

Diagnostics

Regression With SPSS

Transformation

Page 3: Regression for class teaching

Regression

What is

Regression

A Statistical Technique that is used to relate two or

more variables.

Use the independent variable(s) to predict the value of

dependent variable. Objective

Example

For a given value of advertisement expenditure, how

much sales will be generated.

With a given diet plan, how much weight an individual

will be able to reduce.

With a unit increase in green house gases, how much

will be the rise in the temperature?

Page 4: Regression for class teaching

Regression Understanding

A layman

Question

Suppose we want to find out how much the age of the

car helps you to determine the price of the car

The older the car ______ will be the priceA layman Answer

Regression in

Simple Words

As the age of the car increases by one year the price of

the car is estimated to decrease by a certain amount.

Y(Estimated) = b0 + b1 X Regression in

Statistical Terms

Page 5: Regression for class teaching

Regression Understanding

Data Set: Age &

Price of the Cars

A Negative RelationshipWhat Relation Do

you see?

Age 1 2 1 2 3 4 3 4 3

Price 90 85 93 84 80 74 81 76 79

A Convenient

Way to Look

(What is this tool

Called?)

Price

Age

70

80

90

1 2 3 4

Page 6: Regression for class teaching

Price

Age

70

80

90

1 2 3 4

How

to S

how

it

Sta

tistic

ally

Y (E) = b0 + b1 X

Y (E) = 97 – 5 X

Y = 97 – 5 X +E

Term

Y (E)

X

b0

b1

What it is!

Dependent Variable whose behavior is to be determined

Independent Variable whose effect to be determined

Intercept: Value of Y(E) when X = 0

Estimated Change in Y in response to unit Change in X

E Difference between the actual and estimated

Page 7: Regression for class teaching

Assessing the Goodness of Fit: Graphical Way

Goodness of

Fit Means

How well the model fits the actual data. Less residual

means a good fit, more residual means bad Fit

Bad Fit Good Fit Perfect Fit

Page 8: Regression for class teaching

Assessing the Goodness of Fit: Statistical Way

Expected Y

Estimated YActual Y

Page 9: Regression for class teaching

SSR

SSR =Σ (Estimated – Expected)2

Page 10: Regression for class teaching

SST

SST =Σ (Real – Expected)2

Page 11: Regression for class teaching

SSE

SSE =Σ (Actual – Estimated)2

Page 12: Regression for class teaching

Assessing the Goodness of Fit: Statistical Way R2

SST =Σ (Real – Expected)2

SSR =Σ (Estimated – Expected)2

SSE =Σ (Actual – Expected)2

A good Model is the one in

which SSE is the lowest

SSE = 0

SST = SSR + SSE R2 = SSR/SST R2 = 1 - SSE/SST

Page 13: Regression for class teaching

Residual Analysis

Why

The purpose of Modeling is to predict

(interpolate), the interpolation can be

correct when the assumptions about the

behavior of the data hold true.

Assumptions:

Response

Variable

is independentIs Normally

Distributed

Has constant

Variance

Has straight line

Relation with IV

Page 14: Regression for class teaching

Residual Analysis

In Terms of Response

VariableIn Terms of Residual

Independence

Normality

Constant

Variance

Linearity

Response Variable Random Errors

is independent

Is Normally

Distributed

Has constant Variance

Has straight line

Relation with IV

are independent

are Normally

Distributed

Have constant

Variance

Have straight line

relation with IV

Page 15: Regression for class teaching

Inferring About the Population

Assumptions

Expected Value

of Residual

Variance of

Residual

Distribution of

Residual

Dependency of

Residuals

E(ei ) = 0

σe1= σe2= …. = σei

Normal

Independent

What it means

No apparent pattern in residual plot

Residual Plot has consistent Spread

Histogram is symmetric or normal

(Histogram & Probability Plot of Residual)

Relationship

b/w IndV & DVLinear Linear Scatter Plot

How to Check it

Page 16: Regression for class teaching

The Three Conditions Shown Together

As the distribution is symmetric, the

mean distribution of error term will

be zero

The distribution of error term is

shown to be normally distributed

Variance of error term for different

values of x appear to be same

Page 17: Regression for class teaching

Residual Analysis

Types of Residuals

Normal or Raw

Residual: RESID

Standardized

Residual: ZRESID

Studentized

Residual: SRESID

Y – Y(Estimated)

{Y – Y(Estimated)}/Standard Error of Residual

{Y – Y(Estimated)}/ Varying Standard Error of Residual

Page 18: Regression for class teaching

Influential Observation

Outliers Observations with large error

Leverage

Points

Distinct from other values on the basis of

independent values

Influential

Observation

Value the inclusion of which can affect the

coefficient of regression line

Any Value can be Influential Observation

Page 19: Regression for class teaching

Outliers With Residuals

Standardized Residuals Un standardized Residuals

Can not tell how big

residual will be considered

big.

Using the Properties of

ND helps us in making a

rule for deciding large or

small

Rule of 3.28

Rule of 2.58

Rule of 1.96

SR > 3.28

1% or More % SR > 2.58

Model is Unacceptable When

5% or More % SR > 1.96

Page 20: Regression for class teaching

Identifying Influential Cases

I Will Look at

the World

Without You

Regression is done with a particular

data set removed and that particular

value is predicted

How it Looks

This adjusted Predicted value is similar

to the Predicted Value then the value is

not an influential observation

Page 21: Regression for class teaching

Identifying Influential Cases

Adjusted

Predicted Value

The predicted value of a case without

including that case for Predicting it

DFFit Original Predicted – Adjusted Predicted

Deleted

Residual

Studentized

Deleted Residual

Original Observed– Adjusted Predicted

Deleted Residual / Standard Deviation

Page 22: Regression for class teaching

Influential Cases

Coefficient with (xa, ya) included

&

Coefficient with (xa, ya) not included

Large Change in

Coefficient

Not Large Change

in Coefficient

Influential

Observation

Not an Influential

Observation

Page 23: Regression for class teaching

Influential Cases

(Adjusted Predicted Value)

Predicted Value

DFFit =Difference= PV - APV

Influential

Observation

Small Difference

Adjusted

Predicted Value

Large Difference

Not an Influential

Observation

Page 24: Regression for class teaching

Influential Cases

(Adjusted Predicted Value)

Original Value

(OV)

Deleted Residual (DR)= OV - APV

SDR Can be compared for different

Regression Models

Adjusted Predicted

Value

(APV)

Studentized Deleted Residual=DR/SE

Page 25: Regression for class teaching

Identifying Influential Cases

Cook’s

Distance

What is it?

Leverage

Is the measure of overall

influence of the case on

the model

Mah

ala

nob

is

Dis

tan

ceObservation is

influential if

CD > 1

Influence of observed on

predicted

Average Leverage(AL) =

(K+1)/2

AL > 2(k+1)/2

Or

AL > 3(k+1)/2

Distance of Cases from

mean of Predictor

variables

Use Barnett & Lewis

Table

Page 26: Regression for class teaching

Identifying Influential Cases

DfBeta/Standard

Error

DfBeta

Standardized

DfBeta

Covariance

Ratio = CVR

What is it?Observation is

influential if

>1

>2

Delete case if

CVR < 1-3(k+1)/n

Don’t Delete case if

CVR > 1+3(k+1)/n

K = Number of Predictors

Difference Between

Parameter with &

without Case

It measures whether

the case affects the

variance of Regression

Parameter

Scale Sensitive

therefore does not

provide Good CV

Page 27: Regression for class teaching
Page 28: Regression for class teaching

Heteroscedasticity

What is it?Changing Variance at different level

of predictor

+ + ++

+ ++

++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

The spread increases with y

Residual

y

Measure

Page 29: Regression for class teaching

Multicolinearity

What is it?Strong correlation between the

predictor variables

Eff

ect

s

Untrustworthy

bs

Restricted R2

Difficulty in

Picking the

Right Variable

Inflated

Standard

Error

Not

Significant

bs

Varying bs

from Sample

to Sample

The inclusion of new varaible which is

strongly correlated with the first one, R2

will not increase

The inclusion of new varaible which is

strongly correlated with the first one, R2

will not increase

Page 30: Regression for class teaching

Multicolinearity:

Measure

What is it? VIF = 1/(1-R2)

Interpretation

The lower the

value the better:

VIF < 10

VIF < 10

VIF: Variance

Inflating FactorDurbin Watson

Range of Value is

between 0 & 4

0 = Negative correlation

4= Positive Correlation

2 = No Correlation

Desired value is 2 or

near

Page 31: Regression for class teaching

Measures of Multicolinearity:

Variance

Inflation Factor

Tolerance

Eigen Value

Variance

Proportion

The Lower the

Better

Higher the Better

The Lower the

Better

Higher the Better

MeasureDesired

Behavior

VIF > 10

T < 0.1

The Lower the

Better

Each Dimension be

related with

separate Variable

Critical Value

Page 32: Regression for class teaching

Checking Assumptions Through Plots

P-P Plot: Standardized Residual

Normality

Scatter Plot: Standardized Residual /

Standardized Predicted ValueHeteroscedasticity

&

OutliersScatter Plot: Residual / Stadnardized

Predicted Value

Q-Q Plot: Standardized Residual

Page 33: Regression for class teaching

Transformation of a Variable

ReasonNonliear is translated into linear

Methods of explanation for linear relation are known

How

Justified

Theoretically

Diagnostic Plots

Transform x Y Both

Page 34: Regression for class teaching

Transformation of a Variable

Function

Reciprocal

Y =α+ β/x

ExponentialY =αebx

PowerY =αxb

Log

Y =α+ β log

x

Transform

Y’ =ln(Y) Y’ =lnα+ β x

Linear Form

Y’=log(Y),

X’=log(X) Y’ =logα+ β x’

X’ =log(X) Y’ =α+ β x’

X’ =1/x Y’ =α+ β x’

Page 35: Regression for class teaching

Regression through SPSS

Coefficients

Model Fit

Assumption

b0 & b1

SST =SSR + SSE

t

F=MSR/MSE

e is independent

e is Normally Distributed

e has constant Variance

e has straight line Relation with IV

Multicolinearity

Page 36: Regression for class teaching

Data Set

Variables

Study Time

Interest

Marks

Page 37: Regression for class teaching
Page 38: Regression for class teaching
Page 39: Regression for class teaching
Page 40: Regression for class teaching

Standardized Predicted

Standardized Residual

Deleted Residual

Adjusted Predicted

Studentized Residual

Studentized Deleted

Residual

Page 41: Regression for class teaching
Page 42: Regression for class teaching
Page 43: Regression for class teaching

Ass

um

pti

on e is independent

e is Normally Distributed

e has constant Variance

e has straight line Relation with IV

Multicolinearity

Page 44: Regression for class teaching

Norm

ali

ty Normal Probability Plot of the Standardized Residual

Histogram of the Standardized Residual

SK and Shapiro Test

Page 45: Regression for class teaching

Norm

ali

ty Normal Probability Plot of the Standardized Residual

Histogram of the Standardized Residual

SK and Shapiro Test

Getting the Residual & Standardized Residual

Page 46: Regression for class teaching

Norm

ali

ty Normal Probability Plot of the Standardized Residual

Histogram of the Standardized Residual

SK and Shapiro Test

Page 47: Regression for class teaching
Page 48: Regression for class teaching

Norm

ali

ty Normal Probability Plot of the Standardized Residual

Histogram of the Standardized Residual

SK and Shapiro Test

Page 49: Regression for class teaching
Page 50: Regression for class teaching

Norm

ali

ty Normal Probability Plot of the Standardized Residual

Histogram of the Standardized Residual

SK and Shapiro Test

Page 51: Regression for class teaching
Page 52: Regression for class teaching

Ass

um

pti

on e is independent

e is Normally Distributed

e has constant Variance

e has straight line Relation with IV

Multicolinearity

Z Predicted

Z Residual

-3 -2 -1 0 321

-3

-2

-1

0

1

2

3

Page 53: Regression for class teaching
Page 54: Regression for class teaching
Page 55: Regression for class teaching

Ass

um

pti

on e is independent

e is Normally Distributed

e has constant Variance

e has straight line Relation with IV

Multicolinearity

Page 56: Regression for class teaching
Page 57: Regression for class teaching
Page 58: Regression for class teaching