24
This Week Continue with linear regression Begin multiple regression Le 8.2 C & S 9:A-E Handout: Class examples and assignment 3

This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

This Week

• Continue with linear regression• Begin multiple regression

– Le 8.2

– C & S 9:A-E

• Handout: Class examples and assignment 3

Page 2: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Linear Regression

• Investigate the relationship between two variables• Dependent variable

– The variable that is being predicted or explained

• Independent variable – The variable that is doing the predicting or explaining

• Think of data in pairs (xi, yi)

Page 3: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Linear Regression - Purpose

• Is there an association between the two variables– Is BP change related to weight change?

• Estimation of impact– How much BP change occurs per pound of weight change

• Prediction – If a person loses 10 pounds how much of a drop in blood

pressure can be expected

Page 4: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Assumption for Linear Regression

• For each value of X there is a population of Y’s that are normally distributed

• The population means form a straight line• Each population has the same variance 2

• Note: The X’s do not need to be normally distributed, in fact the researcher can select these prior to data collection

Page 5: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3
Page 6: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Simple Linear Regression EquationSimple Linear Regression Equation

The The simple linear regression equationsimple linear regression equation is: is:

yy = = 00 + + 11xx

00 is the mean when x=0 is the mean when x=0

The mean increases by The mean increases by 11 for each increase of x for each increase of x by 1by 1

Page 7: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Simple Linear Regression ModelSimple Linear Regression Model

The equation that describes how individual y values relate The equation that describes how individual y values relate to x and an error term is called the to x and an error term is called the regression modelregression model..

yy = = 00 + + 11xx + +

reflects how individuals deviate from others with the reflects how individuals deviate from others with the same value of xsame value of x

Page 8: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation

The The estimated simple linear regression estimated simple linear regression equationequation is: is:

• bb00 is the estimate for is the estimate for 00

• bb11 is the estimate for is the estimate for 11

• is the estimated (predicted) value of is the estimated (predicted) value of yy for a given for a given xx value. It is the estimated mean value. It is the estimated mean for that x.for that x.

0 1y b b x 0 1y b b x

yy

Page 9: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Least Squares MethodLeast Squares Method

Least Squares Criterion: Choose Least Squares Criterion: Choose and and to minimizeto minimize

Of all possible lines pick the one that minimizes the sum Of all possible lines pick the one that minimizes the sum of the distances squared of each point from that lineof the distances squared of each point from that line

S = yi – 01xi)2

Page 10: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Slope: Slope:

The Least Squares EstimatesThe Least Squares Estimates

21)(

))((

xx

yyxxb

i

ii

21)(

))((

xx

yyxxb

i

ii

0 1b y b x 0 1b y b x Intercept:Intercept:

Page 11: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

An Estimate of An Estimate of 22

The mean square error (MSE) provides the estimateThe mean square error (MSE) provides the estimateof of 22, and the notation , and the notation ss22 is also used. is also used.

ss22 = MSE = SSE/(n-2) = MSE = SSE/(n-2)

where:where:

Estimating the VarianceEstimating the Variance

210

2 )()ˆ(SSE iiii xbbyyy 210

2 )()ˆ(SSE iiii xbbyyy

If points are close to the regression line then SSE will be small

If points are far from the regression line then SSE will be large

Page 12: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3
Page 13: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Estimating Estimating

An Estimate of An Estimate of To estimate To estimate we take the square root of we take the square root of 22.. The resulting The resulting ss is called the is called the root mean square error root mean square error ..

2

SSEMSE

ns

2

SSEMSE

ns

Page 14: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Hypothesis Testing for Hypothesis Testing for

Ho: 1 = 0 no relation between x and y

Ha: 1 ≠0 relation between x and y

Test Statistic: t = b1/SE(b1)

SE(b1) depends on • Sample size• How well the estimated line fits the points• How spread out the range of x values are

Page 15: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Rejection RuleRejection Rule

Reject Reject HH00 if if tt < - < -ttor or tt > > tt

where: where: tt is based on a is based on a tt distribution distribution

with with nn - 2 degrees of freedom - 2 degrees of freedom

Testing for Significance: Testing for Significance: tt Test Test

Page 16: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Confidence Interval for Confidence Interval for 11

)(12/1

bsetb )(12/1

bsetb

is cutoff value from t-distribution with n-2 df

CLM option in SAS on model statement

2/t

Page 17: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Estimating the Mean for a Particular XEstimating the Mean for a Particular X

Simply plug in your value of x in the estimated regression Simply plug in your value of x in the estimated regression equationequation

Want to estimate the mean BP for persons aged 50Want to estimate the mean BP for persons aged 50

Suppose bSuppose b00 = 100 and b = 100 and b11 = 0.80 = 0.80

Estimate = 100 + 0.80*50 = 140 mmHgEstimate = 100 + 0.80*50 = 140 mmHg

Can compute 95% CI for the estimate using SASCan compute 95% CI for the estimate using SAS

CLM option on model statementCLM option on model statement

Page 18: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

The Coefficient of DeterminationThe Coefficient of Determination

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

SST = SSR + SSESST = SSR + SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^

Page 19: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

The The coefficient of determinationcoefficient of determination is: is:

rr22 = SSR/SST = SSR/SST

where:where:

SST = total sum of squaresSST = total sum of squares

SSR = sum of squares due to SSR = sum of squares due to regressionregression

r r 22 = proportion of variability explained by X = proportion of variability explained by X

(must be between 0 and 1)(must be between 0 and 1)

The Coefficient of DeterminationThe Coefficient of Determination

Page 20: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

ResidualsResiduals

How far off (distance) an individual point is from the How far off (distance) an individual point is from the estimated regression lineestimated regression line

residual = predicted value – observed valueresidual = predicted value – observed value

Page 21: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

SAS CODE FOR REGRESSION;

PROC REG DATA=datasetname SIMPLE; MODEL depvar = indvar(s); PLOT depvar * indvar ;RUN;

Several options on model and plot statements.

Page 22: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

OPTIONS ON MODEL STATEMENT;

MODEL depvar = indvar(s)/options

Option What it does

clb 95% CI for 1

p Predicted valuesr Residualsclm 95% CI for the mean at

value of x

Page 23: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

OUTPUT FROM PROC REG

Dependent Variable: quarsales

Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > F

Model 1 SSR 14200 14200 74.25 <.0001

Error 8 SSE 1530 191.25000

Corrected Total 9 SST 15730

Root MSE 13.82932 R-Square 0.9027Dependent Mean 130.00000 Coeff Var 10.63794

Coefficient of Determination

14200/15730

MSE

Page 24: This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 60.00000 9.22603 6.50 0.0002

studentpop 1 5.00000 0.58027 8.62 <.0001

REGRESSION EQUATION:

Y = 60.0 + 5.0*X

QUARSALES = 60 + 5*STUDENTPOP

b1 SE(b1)