42
Simple linear regression Linear regression with one predictor variable

Simple linear regression Linear regression with one predictor variable

Embed Size (px)

Citation preview

Page 1: Simple linear regression Linear regression with one predictor variable

Simple linear regression

Linear regression with one predictor variable

Page 2: Simple linear regression Linear regression with one predictor variable

What is simple linear regression?

• A way of evaluating the relationship between two continuous variables.

• One variable is regarded as the predictor, explanatory, or independent variable (x).

• Other variable is regarded as the response, outcome, or dependent variable (y).

Page 3: Simple linear regression Linear regression with one predictor variable

A deterministic (or functional) relationship

50403020100

130

120

110

100

90

80

70

60

50

40

30

Celsius

Fah

renh

eit

Page 4: Simple linear regression Linear regression with one predictor variable

Other deterministic relationships

• Circumference = π×diameter• Hooke’s Law: Y = α + βX, where Y = amount of

stretch in spring, and X = applied weight.• Ohm’s Law: I = V/r, where V = voltage applied,

r = resistance, and I = current.• Boyle’s Law: For a constant temperature, P = α/V,

where P = pressure, α = constant for each gas, and V = volume of gas.

Page 5: Simple linear regression Linear regression with one predictor variable

A statistical relationship

A relationship with some “trend”, but also with some “scatter.”

27 30 33 36 39 42 45 48

100

150

200

Mor

talit

y (D

eath

s pe

r 10

mill

ion)

Latitude (at center of state)

Skin cancer mortality versus State latitude

Page 6: Simple linear regression Linear regression with one predictor variable

Other statistical relationships

• Height and weight

• Alcohol consumed and blood alcohol content

• Vital lung capacity and pack-years of smoking

• Driving speed and gas mileage

Page 7: Simple linear regression Linear regression with one predictor variable

Which is the “best fitting line”?

74706662

210

200

190

180

170

160

150

140

130

120

110

height

wei

ght w = -266.5 + 6.1 h

w = -331.2 + 7.1 h

Page 8: Simple linear regression Linear regression with one predictor variable

Notation

iy is the observed response for the ith experimental unit.

ix is the predictor value for the ith experimental unit.

iy is the predicted response (or fitted value) for the ith experimental unit.

Equation of best fitting line: ii xbby 10ˆ

Page 9: Simple linear regression Linear regression with one predictor variable

74706662

210

200

190

180

170

160

150

140

130

120

height

wei

ght

w = -266.5 + 6.1 h

1 64 121 126.3 2 73 181 181.5 3 71 156 169.2 4 69 162 157.0 5 66 142 138.5 6 69 157 157.0 7 75 208 193.8 8 71 169 169.2 9 63 127 120.110 72 165 175.4

ix iy iyi

Page 10: Simple linear regression Linear regression with one predictor variable

Prediction error (or residual error)

In using iy to predict the actual response iy

we make a prediction error (or a residual error)

iii yye ˆof size

A line that fits the data well will be one for which the n prediction errors are as small as possible in some overall sense.

Page 11: Simple linear regression Linear regression with one predictor variable

The “least squares criterion”

Choose the values b0 and b1 that minimize the sum of the squared prediction errors.

Equation of best fitting line: ii xbby 10ˆ

That is, find b0 and b1 that minimize:

2

1

ˆ

n

iii yyQ

Page 12: Simple linear regression Linear regression with one predictor variable

Which is the “best fitting line”?

74706662

210

200

190

180

170

160

150

140

130

120

110

height

wei

ght w = -266.5 + 6.1 h

w = -331.2 + 7.1 h

Page 13: Simple linear regression Linear regression with one predictor variable

w = -331.2 + 7.1 h

1 64 121 123.2 -2.2 4.84 2 73 181 187.1 -6.1 37.21 3 71 156 172.9 -16.9 285.61 4 69 162 158.7 3.3 10.89 5 66 142 137.4 4.6 21.16 6 69 157 158.7 -1.7 2.89 7 75 208 201.3 6.7 44.89 8 71 169 172.9 -3.9 15.21 9 63 127 116.1 10.9 118.81 10 72 165 180.0 -15.0 225.00 ------ 766.51

iyix iy ii yy ˆ 2ˆ ii yy i

Page 14: Simple linear regression Linear regression with one predictor variable

w = -266.5 + 6.1 h

1 64 121 126.271 -5.3 28.09 2 73 181 181.509 -0.5 0.25 3 71 156 169.234 -13.2 174.24 4 69 162 156.959 5.0 25.00 5 66 142 138.546 3.5 12.25 6 69 157 156.959 0.0 0.00 7 75 208 193.784 14.2 201.64 8 71 169 169.234 -0.2 0.04 9 63 127 120.133 6.9 47.61 10 72 165 175.371 -10.4 108.16 ------ 597.28

iyix iy ii yy ˆ 2ˆ ii yy i

Page 15: Simple linear regression Linear regression with one predictor variable

The least squares regression line

Using calculus, minimize (take derivative with respect to b0 and b1, set to 0, and solve for b0 and b1):

2

110

n

iii xbbyQ

and get the least squares estimates b0 and b1:

n

ii

n

iii

xx

yyxxb

1

2

11 xbyb 10

Page 16: Simple linear regression Linear regression with one predictor variable

Fitted line plot in Minitab

65 70 75

120

130

140

150

160

170

180

190

200

210

height

wei

ght

weight = -266.534 + 6.13758 height

S = 8.64137 R-Sq = 89.7 % R-Sq(adj) = 88.4 %

Regression Plot

Page 17: Simple linear regression Linear regression with one predictor variable

Regression analysis in MinitabThe regression equation isweight = - 267 + 6.14 height

Predictor Coef SE Coef T PConstant -266.53 51.03 -5.22 0.001height 6.1376 0.7353 8.35 0.000

S = 8.641 R-Sq = 89.7% R-Sq(adj) = 88.4%

Analysis of Variance

Source DF SS MS F PRegression 1 5202.2 5202.2 69.67 0.000Residual Error 8 597.4 74.7Total 9 5799.6

Page 18: Simple linear regression Linear regression with one predictor variable

Prediction of future responses

A common use of the estimated regression line.

htiwti xy ,, 14.6267ˆ

24.1386614.6267ˆ , wtiy

Predict mean weight of 66"-inch tall people.

Predict mean weight of 67"-inch tall people.

38.1446714.6267ˆ , wtiy

Page 19: Simple linear regression Linear regression with one predictor variable

What do the “estimated regression coefficients” b0 and b1 tell us?

• We can expect the mean response to increase or decrease by b1 units for every unit increase in x.

• If the “scope of the model” includes x = 0, then b0 is the predicted mean response when x = 0. Otherwise, b0 is not meaningful.

Page 20: Simple linear regression Linear regression with one predictor variable

So, the estimated regression coefficients b0 and b1 tell us…

• We predict the mean weight to increase by 6.14 pounds for every additional one-inch increase in height.

• It is not meaningful to have a height of 0 inches. That is, the scope of the model does not include x = 0. So, here the intercept b0 is not meaningful.

Page 21: Simple linear regression Linear regression with one predictor variable

What do b0 and b1 estimate?

54321

22

18

14

10

6

High school gpa

Co

llege

ent

ranc

e te

st s

core

xYEY 10

ii xY 10

Page 22: Simple linear regression Linear regression with one predictor variable

What do b0 and b1 estimate?

54321

22

18

14

10

6

High school gpa

Co

llege

ent

ranc

e te

st s

core

xYEY 10

xbby 10ˆ

Page 23: Simple linear regression Linear regression with one predictor variable

The simple linear regression model

54321

22

18

14

10

6

High school gpa

Co

llege

ent

ranc

e te

st s

core

xYEY 10

ii xY 10

Page 24: Simple linear regression Linear regression with one predictor variable

The simple linear regression model

Page 25: Simple linear regression Linear regression with one predictor variable

The simple linear regression model

• The mean of the responses, E(Yi), is a linear function of the xi.

• The errors, εi, and hence the responses Yi, are independent.

• The errors, εi, and hence the responses Yi, are normally distributed.

• The errors, εi, and hence the responses Yi, have equal variances (σ2) for all x values.

Page 26: Simple linear regression Linear regression with one predictor variable

What about (unknown) σ2?

54321

22

18

14

10

6

High school gpa

Co

llege

ent

ranc

e te

st s

core

It quantifies how much the responses (y) vary around the (unknown) mean regression line E(Y) = β0 + β1x.

xYEY 10

ii xY 10

Page 27: Simple linear regression Linear regression with one predictor variable

Will this thermometer yield more precise future predictions …?

40302010 0

100

90

80

70

60

50

40

30

Celsius

Fah

renh

eit

S = 4.76923 R-Sq = 96.1 % R-Sq(adj) = 95.5 %

fahrenheit = 34.1233 + 1.61538 celsius

Regression Plot

Page 28: Simple linear regression Linear regression with one predictor variable

… or this one?

40302010 0

100

50

0

Celsius

Fah

renh

eit

S = 21.7918 R-Sq = 70.6 % R-Sq(adj) = 66.4 %

fahrenheit = 17.0709 + 2.30583 celsius

Regression Plot

Page 29: Simple linear regression Linear regression with one predictor variable

Recall the “sample variance”

148132116100846852

0.025

0.020

0.015

0.010

0.005

0.000

IQ

Pro

babi

lity

Den

sity

The sample variance

1

1

2

2

n

YYs

n

ii

estimates σ2, the variance of the one population.

Page 30: Simple linear regression Linear regression with one predictor variable

Estimating σ2 in regression setting

The mean square error

2

ˆ1

2

n

YYMSE

n

iii

estimates σ2, the common variance of the many populations.

Page 31: Simple linear regression Linear regression with one predictor variable

Estimating σ2 from Minitab’s fitted line plot

65 70 75

120

130

140

150

160

170

180

190

200

210

height

wei

ght

weight = -266.534 + 6.13758 height

S = 8.64137 R-Sq = 89.7 % R-Sq(adj) = 88.4 %

Regression Plot

Page 32: Simple linear regression Linear regression with one predictor variable

Estimating σ2 from Minitab’s regression analysis

The regression equation isweight = - 267 + 6.14 height

Predictor Coef SE Coef T PConstant -266.53 51.03 -5.22 0.001height 6.1376 0.7353 8.35 0.000

S = 8.641 R-Sq = 89.7% R-Sq(adj) = 88.4%

Analysis of Variance

Source DF SS MS F PRegression 1 5202.2 5202.2 69.67 0.000Residual Error 8 597.4 74.7Total 9 5799.6

Page 33: Simple linear regression Linear regression with one predictor variable

Inference for (or drawing conclusions about) β0 and β1

Confidence intervals and hypothesis tests

Page 34: Simple linear regression Linear regression with one predictor variable

Relationship between state latitude and skin cancer mortality?

•Mortality rate of white males due to malignant skin melanoma from 1950-1959.

•LAT = degrees (north) latitude of center of state

•MORT = mortality rate due to malignant skin melanoma per 10 million people

# State LAT MORT1 Alabama 33.0 2192 Arizona 34.5 1603 Arkansas 35.0 1704 California 37.5 1825 Colorado 39.0 149 49 Wyoming 43.0 134

Page 35: Simple linear regression Linear regression with one predictor variable

(1-α)100% t-interval for slope parameter β1

Formula in words:

Sample estimate ± (t-multiplier × standard error)

Formula in notation:

22,211xx

MSEtb

i

n

Page 36: Simple linear regression Linear regression with one predictor variable

Hypothesis test for slope parameter 1

Null hypothesis H0: 1 = some number β Alternative hypothesis HA: 1 ≠ some number

1

1

2

1*bse

b

xx

MSE

bt

i

Test statistic

P-value = How likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis is true?

The P-value is determined by referring to a t-distribution with n-2 degrees of freedom.

Page 37: Simple linear regression Linear regression with one predictor variable

Inference for slope parameter β1 in Minitab

The regression equation is Mort = 389 - 5.98 Lat

Predictor Coef SE Coef T PConstant 389.19 23.81 16.34 0.000Lat -5.9776 0.5984 -9.99 0.000

S = 19.12 R-Sq = 68.0% R-Sq(adj) = 67.3%

Analysis of Variance

Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365Total 48 53637

Page 38: Simple linear regression Linear regression with one predictor variable

(1-α)100% t-interval for intercept parameter β0

Formula in notation:

Formula in words:

Sample estimate ± (t-multiplier × standard error)

2

2

2,210

1

xx

x

nMSEtb

in

Page 39: Simple linear regression Linear regression with one predictor variable

Hypothesis test for intercept parameter 0

Null hypothesis H0: 0 = some number Alternative hypothesis HA: 0 ≠ some number

0

0

2

2

0

1*

bse

b

xx

xn

MSE

bt

i

Test statistic

P-value = How likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis is true?

The P-value is determined by referring to a t-distribution with n-2 degrees of freedom.

Page 40: Simple linear regression Linear regression with one predictor variable

Inference for intercept parameter β0 in Minitab

The regression equation is Mort = 389 - 5.98 Lat

Predictor Coef SE Coef T PConstant 389.19 23.81 16.34 0.000Lat -5.9776 0.5984 -9.99 0.000

S = 19.12 R-Sq = 68.0% R-Sq(adj) = 67.3%

Analysis of Variance

Source DF SS MS F PRegression 1 36464 36464 99.80 0.000Residual Error 47 17173 365Total 48 53637

Page 41: Simple linear regression Linear regression with one predictor variable

What assumptions?

• The intervals and tests depend on the assumption that the error terms (and thus responses) follow a normal distribution.

• Not a big deal if the error terms (and thus responses) are only approximately normal.

• If have a large sample, then the error terms can even deviate far from normality.

Page 42: Simple linear regression Linear regression with one predictor variable

Basic regression analysis outputin Minitab

• Select Stat.

• Select Regression.

• Select Regression …

• Specify Response (y) and Predictor (x).

• Click OK.