Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To...

Linear Modelling: Simple Regression10th of May 2018 R. Nicholls / D.-L. Couturier / M. Fernandes

Introduction:

ANOVA•  Usedfortestinghypothesesregardingdifferencesbetweengroups•  Considersthevariationwithinandbetweengroups

Regression•  Usedforrevealingandinvestigatingrelationshipsbetweeninputandoutputvariables•  Modeldata,andextrapolateasmuchinformationaspossible

0 10 20 30 40 50

yCorrelation:

Howtomeasurethestrengthofalinearrelationshipbetweenvariables?

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

0 10 20 30 40 50

yCorrelation:

Positivelycorrelated:

Negativelycorrelated:

Uncorrelated:

Correlation:

Pearson’sproduct-momentcorrelationcoefficient:

CoefficientofVariation(R2value):

0 10 20 30 40 50

Correlation:

r=0.931R2=0.866

0 10 20 30 40 500

r=-0.949R2=0.901

0 10 20 30 40 50-5

0 10 20 30 40 50

Correlation:

r=-0.060R2=0.004

r=0.106R2=0.011

0 10 20 30 40 50

Correlation:

data:xandyt=17.613,df=48,p-value<2.2e-16alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.88025560.9602168sampleestimates:cor0.9305923

data:xandyt=1.5609,df=48,p-value=0.1251alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:-0.062380660.46941403sampleestimates:cor0.2197833

0 10 20 30 40 50

CanIsaywhethermydataarecorrelated?Isanobservedcorrelationsignificant?

Simple Regression:

0 10 20 30 40 50

Aims:•  Toinvestigatelinearcorrelationbetweentwovariablesinmoredetail•  Beabletopredictresponsegivenaknowledgeoftheindependentvariable

PredictorvariableIndependentvariable

ResponsevariableDependentvariable

0 10 20 30 40 50

Simple Regression:

0 10 20 30 40 50

Simple Regression:

0 10 20 30 40 50

Simple Regression:

εi=errors,residuals

Fortheithobservation:

Simple Regression:

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand

0 10 20 30 40 500

Observations:

= !(! + !!+ !|!,!)

Simple Regression:

0 10 20 30 40 500

Observations:Fittedvalues:

Simple Regression:

! = !− !Residuals:

0 10 20 30 40 500

Simple Regression:

0 10 20 30 40 500

! = !− !

Residuals: xi! = !+ !

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand ! = !− !

Simple Regression:

0 10 20 30 40 500

! = !− !

! ~ !(!,!!)

Simple Regression:

0 10 20 30 40 500

! = !− !

! ~ !(!,!!)

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

0 10 20 30 40 500

Simple Regression:

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedata

! = !− !

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

! !,! !,! = !(!!|!!;!,!)!

= 12!!!!

!!(!!!!!)!

0 10 20 30 40 500

Simple Regression:

! = !− !

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

! !,! !,! = !(!!|!!;!,!)!

= 12!!!!

!!(!!!!!)!

ln! !,! !,! = !!! ln 2!!! − (!!!!!)!

= !!! log 2!!

! − !!!! (!! − !!)!

! !,! !,! !"#

0 10 20 30 40 500

Simple Regression:

Optimalparameters:minimiseresidualsumofsquaresMaximumLikelihoodandLeastSquaresestimatesareequivalent(forGaussianerrorsmodel)

! = !− !

! !,! !,! !"#

Simple Regression:

Optimalparameters:minimiseresidualsumofsquaresMaximumLikelihoodandLeastSquaresestimatesareequivalent(forGaussianerrormodel)

! = !− !

0 10 20 30 40 500

Simple Regression:

! = !− !

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedataMinimisesumofsquaredresiduals

0 10 20 30 40 500

Finalanswer:

Simple Regression:

Example:Predictingtimbervolumeoffelledblackcherrytrees

8 10 12 14 16 18 20

GirthVolume

> cor(trees$Volume,trees$Girth)[1] 0.9671194

> m1 = lm(Volume~Girth,data=trees)> summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Response: y=VolumePredictor: x=Girth

8 10 12 14 16 18 20

GirthVolume

Simple Regression:

Residuals

Frequency

-10 -5 0 5 10

Simple Regression:

! = 4.252!! = 18.1

Residuals

Frequency

-10 -5 0 5 10

Simple Regression:

95%within±8.5

! = 4.252!! = 18.1

Linear Regression:

Assumptions:1.  Modelislinearinparameters.

Linear Regression:

2.  Gaussianerrormodel.

Linear Regression:

3.  Additiveerrormodel.

Linear Regression:

4.  Independenceoferrors.

Noautocorrelation–whenoneobservationdependsonthelast

Linear Regression:

4.  Independenceoferrors.

Noautocorrelation–whenoneobservationdependsonthelast

5.  Homoscedasticity.Homogeneity/stabilityofvarianceoftheresiduals

Testing Assumptions: diagnostic plots

1.  ResidualsvsFittedValues

10 20 30 40 50 60

Fitted values

Residuals

lm(Volume ~ Girth)

Residuals vs Fitted

•  Shouldnotberelated•  Novisiblepattern•  Meanresidual=zero•  Constantvariance

-2 -1 0 1 2

Theoretical Quantiles

lm(Volume ~ Girth)

Normal Q-Q

1.  ResidualsvsFittedValues2.  NormalQuantile-Quantileplot

•  VisualtestforNormality•  Nostrongtrends/departures

10 20 30 40 50 60

Fitted values

Standardized residuals

lm(Volume ~ Girth)

Scale-Location31

1.  ResidualsvsFittedValues2.  NormalQuantile-QuantilePlot3.  Scale-LocationPlot

•  Testforhomoscedasticity•  Shouldbeconstant,≈1•  Notrend

0.00 0.05 0.10 0.15 0.20

Leverage

lm(Volume ~ Girth)

Cook's distance0.5

Residuals vs Leverage

1.  ResidualsvsFittedValues2.  NormalQuantile-QuantilePlot3.  Scale-LocationPlot4.  IndexPlotofCook’sDistance

•  Measurestheinfluenceofaparticularobservation

•  Extremex-vals:highleverage•  Mayinformoutlierrejection

Modelling Non-Linear Relationships

Linearmodelscanbeusedtodescribenon-linearrelationships…

Applyingtransformationstoresponseand/orpredictorvariablescanbeusefulto:•  Linearisethedata,i.e.maketherelationshipbetweenvariablesmorelinear.•  Stabilisethevarianceoftheresiduals,sothatσ2doesn’tdependonthe

independentvariable.•  Normalisethedistributionoftheresiduals

Example:Stoppingdistanceofcarsversusspeed(mph)

5 10 15 20 25

Response: y=distancePredictor: x=speed

5 10 15 20 25

0 20 40 60 80

Fitted values

Residuals

lm(dist ~ speed)

Residuals vs Fitted

R2=0.651

3 4 5 6 7 8 9

Fitted values

Residuals

lm(sqrt(dist) ~ speed)

Residuals vs Fitted

5 10 15 20 25

sqrt(dist)

R2=0.651R2=0.709

1.5 2.0 2.5 3.0 3.5 4.0 4.5

Fitted values

Residuals

lm(log(dist) ~ log(speed))

Residuals vs Fitted

1.5 2.0 2.5 3.0

log(speed)

log(dist)

R2=0.651R2=0.709R2=0.733

5 10 15 20 25

R2=0.651R2=0.709R2=0.733

Call:lm(formula = log(dist) ~ log(speed), data = cars)

Residuals: Min 1Q Median 3Q Max -1.00215 -0.24578 -0.02898 0.20717 0.88289

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.7297 0.3758 -1.941 0.0581 . log(speed) 1.6024 0.1395 11.484 2.26e-15 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4053 on 48 degrees of freedomMultiple R-squared: 0.7331, Adjusted R-squared: 0.7276 F-statistic: 131.9 on 1 and 48 DF, p-value: 2.259e-15

Canyouusesimpleregressiontofitthismodel?

Non-linearMultiplicativeerrormodel

Yes,solongasErrormodelislog-Normal.

5 10 15 20 25

R2=0.651R2=0.709R2=0.733

Call:lm(formula = log(dist) ~ log(speed), data = cars)

Residuals: Min 1Q Median 3Q Max -1.00215 -0.24578 -0.02898 0.20717 0.88289

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.7297 0.3758 -1.941 0.0581 . log(speed) 1.6024 0.1395 11.484 2.26e-15 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4053 on 48 degrees of freedomMultiple R-squared: 0.7331, Adjusted R-squared: 0.7276 F-statistic: 131.9 on 1 and 48 DF, p-value: 2.259e-15

! = !!!!!!

0 10 20 30 40 50

R functions:

plot(x,y)cor(x,y)cor.test(x,y)

data:xandyt=17.613,df=48,p-value<2.2e-16alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.88025560.9602168sampleestimates:cor0.9305923

Simple Regression in R:

Correlation Coefficients:

R functions:

plot(x,y)m1 = lm(y~x)abline(m1)

summary(m1)

R functions:

summary(m1)

r1 = residuals(r1)hist(r1)

R functions:

summary(m1)

r1 = residuals(r1)hist(r1)

plot(m1)

10 20 30 40 50 60

Fitted values

Residuals

Residuals vs Fitted

-2 -1 0 1 2

Theoretical Quantiles

Normal Q-Q

10 20 30 40 50 60

Fitted values

Standardized residuals

Scale-Location31

0.00 0.05 0.10 0.15 0.20

Leverage

Cook's distance 0.5

Residuals vs Leverage

Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To...

Documents

12. Simple Linear Regression 2019 - UMasspeople.umass.edu/biep540w/pdf/12. Simple Linear Regression 2019.… · § Interpret the computer output of a simple linear regression analysis

Chapter 11 Chapter Simple Linear Regression 11 11 Chapter Simple Linear Regression 11 ... =− = −

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION Determining the Regression ... of linear... · 2012. 6. 21. · REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression

Simple Linear Regression. Types of Regression Model Regression Models Simple (1 variable) LinearNon-Linear Multiple (2

Chapter 2 Simple Linear Regression Analysis The simple ...home.iitk.ac.in/~shalab/regression/Chapter2-Regression-Simple... · Regression Analysis | Chapter 2 | Simple Linear Regression

1 Simple Linear Regression Linear regression model Prediction Limitation Correlation

simple linear regression models

Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)

More Simple Linear Regression

Simple linear regression project

Basic Statistics Linear Regression. X Y Simple Linear Regression

สมการถดถอยอย่างง่าย Simple Linear Regression

Correlation and Simple Linear Regression Readings: …bsmith.mathstat.dal.ca/pharm2010/Notes/lecture8.pdf · Simple Linear Regression Simple linear regression is used to nd the best

REGRESSION 12.1 Simple Linear Regression Model 12.2 ...sman/courses/6739/SimpleLinearRegression.pdf · Goldsman — ISyE 6739 Linear Regression REGRESSION 12.1 Simple Linear Regression

Simple linear regression

Simple Linear Regression - Nc State UniversityObjectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating the regression parameters

Simple Linear Regression: Misc. Topics › slr_misc.pdf · Simple Linear Regression: Misc. Topics Sleuth3 Chapters 7, 8 Simple Linear Regression Model and Conditions

SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression

Simple linear regression Linear regression with one predictor variable