Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 1

Chapter 1: Linear Regression with One Predictor Variable

also known as: Simple Linear Regression

Bivariate Linear Regression

Introduction:

· Functional relation between two variables:

𝑌 = 𝑓(𝑋)

Value of X ⇒ Value of Y

Example: °F = 32° + (9/5)°C

is a deterministic relationship

1 value of X ⇒ 1 unique value of Y

· Statistical relation between two variables:

1 value of X ⇒ a distribution of values of Y

Y = Dependent / Response / Outcome Variable

X = Independent / Explanatory / Predictor Variable


Linear Equation: General equation for a straight line

𝑌 = 𝑏0 + 𝑏1𝑋

𝑏0: Intercept = value of Y when X=0

𝑏1: Slope = change in Y per unit change in X

𝑏1 =𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑌−𝑣𝑎𝑙𝑢𝑒

𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑋−𝑣𝑎𝑙𝑢𝑒="𝑟𝑖𝑠𝑒"

"𝑟𝑢𝑛"

What if X increases by 1 unit?

𝑌 = 𝑏0 + 𝑏1(𝑋 + 1) = {𝑏0 + 𝑏1𝑋} + 𝑏1

Y increases by 𝑏1 units

{ Other than linear: e.g. curvilinear 𝑌 = 𝑏0 + 𝑏1𝑋 + 𝑏2𝑋2 }

Regression of Y on X

· Observe data points {(𝑋1, 𝑌1), . . . , (𝑋𝑛, 𝑌𝑛)}

· At each point 𝑋𝑖 there is a distribution of 𝑌𝑖’s


Example: Y = head circumference (cm), X = gestational age (wks)

in a sample of 100 low birth weight infants

Qs: Does average head circumference change with gestational age?

What is the form of the relationship? (linear? curvilinear?)

How to estimate the relationship, given the data?

How to make predictions for new observations?

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)

X = Gestational Age (weeks)


Descriptive Data:

· Scatterplot

· Correlation Coefficients

Some examples:


Population Correlation Coefficient:

Random Variables X and Y with parameters 𝜇𝑋, 𝜇𝑌, 𝜎𝑋2, 𝜎𝑌

2

𝜌 =𝐶𝑜𝑣(𝑋,𝑌)

𝜎𝑋𝜎𝑌=𝐸[(𝑋−𝜇𝑋)(𝑌−𝜇𝑌)]

𝜎𝑋𝜎𝑌 , −1 ≤ 𝜌 ≤ +1

𝜌 measures the direction and strength of linear association

between X and Y

Maximum likelihood estimator of 𝜌 is

the Pearson Correlation Coefficient:

𝑟 =∑(𝑋𝑖−𝑋)(𝑌𝑖−𝑌)

√∑(𝑋𝑖−𝑋)2∑(𝑌𝑖−𝑌)

2=∑(𝑋𝑖−𝑋)(𝑌𝑖−𝑌)

(𝑛−1)𝑠𝑋𝑠𝑌

Inference on 𝜌:

If X and Y are both Normal,

𝐻0: 𝜌 = 0 vs. 𝐻𝑎: 𝜌 ≠ 0

𝑇 =𝑟√𝑛−2

√1−𝑟2 ~ 𝑡(𝑛−2) under 𝐻0: 𝜌 = 0

critical value = ±𝑡(𝑛−2,𝛼 2⁄ )


Spearman Rank Correlation Coefficient:

For X or Y non-Normal

𝑅𝑋𝑖= rank of 𝑋𝑖

𝑅𝑌𝑖= rank of 𝑌𝑖

𝑅𝑋 = 𝑅𝑌 =𝑛+1

2 means of ranks 𝑅𝑋𝑖 or 𝑅𝑌𝑖

Spearman rank correlation coefficient is

𝑟𝑠 =∑(𝑅𝑋𝑖−𝑅𝑋)(𝑅𝑌𝑖−𝑅𝑌)

√∑(𝑅𝑋𝑖−𝑅𝑋)2∑(𝑅𝑌𝑖−𝑅𝑌)

2 , −1 ≤ 𝑟𝑠 ≤ +1

𝐻0: There is no association between X and Y

𝐻𝑎: There is association between X and Y

𝑇 =𝑟𝑠√𝑛−2

√1−𝑟𝑠2 ~ 𝑡(𝑛−2) under 𝐻0

critical value = ±𝑡(𝑛−2,𝛼 2⁄ )


The Simple Linear Regression Model

𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖 , i = 1, . . . ,n observations

𝑌𝑖 value of response for ith observation

𝑋𝑖 value of predictor for ith observation

Population parameters (unknown):

𝛽0 Population intercept

𝛽1 Population regression coefficient

𝜀𝑖 is ith random error term

Mean: 𝐸(𝜀𝑖) = 0

Variance: 𝑉𝑎𝑟(𝜀𝑖) = 𝜎2

Independence: 𝜀𝑖 and 𝜀𝑗 are uncorrelated for 𝑖 ≠ 𝑗

Normality: 𝜀𝑖~𝑁(0, 𝜎2) 𝑖. 𝑖. 𝑑. for all i

⇒ 𝑌𝑖 = 𝛽0 + 𝛽1𝑋𝑖⏟ 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡

+ 𝜀𝑖⏟𝑅𝑎𝑛𝑑𝑜𝑚,𝑖.𝑖.𝑑.𝑁(0,𝜎2)

⇒ 𝐸(𝑌𝑖) = 𝐸(𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖) = 𝛽0 + 𝛽1𝑋𝑖

𝑉𝑎𝑟(𝑌𝑖) = 𝑉𝑎𝑟(𝛽0 + 𝛽1𝑋𝑖 + 𝜀𝑖) = 𝑉𝑎𝑟(𝜀𝑖) = 𝜎2

⇒ 𝑌𝑖 ~ 𝑁(𝜇𝑌, 𝜎2) 𝑖. 𝑖. 𝑑.

where 𝜇𝑌 = 𝛽0 + 𝛽1𝑋𝑖


How to obtain �̂�0 and �̂�1, estimates for 𝛽0 and 𝛽1?

Least Squares Estimators (LSE):

LSEs minimize the sum of squared deviations of 𝑌𝑖 from 𝐸(𝑌𝑖)

Least Squares Criterion:

𝑄 = ∑ [𝑌𝑖 − 𝐸(𝑌𝑖)]2𝑛

𝑖=1 = ∑ [𝑌𝑖 − (𝛽0 + 𝛽1𝑋𝑖)]2𝑛

𝑖=1

Minimize Q: set first derivatives w.r.t. each parameter = 0

First derivatives are:

𝜕𝑄

𝜕𝛽0= −2∑(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖) (1)

𝜕𝑄

𝜕𝛽1= −2∑𝑋𝑖(𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖) (2)

Normal Equations: set (1)=0 and (2)=0; call solutions �̂�0 and �̂�1

−2∑(𝑌𝑖 − �̂�0 − �̂�1𝑋𝑖) = 0

−2∑𝑋𝑖(𝑌𝑖 − �̂�0 − �̂�1𝑋𝑖) = 0

⇒

∑𝑌𝑖 = 𝑛�̂�0 + �̂�1∑𝑋𝑖

∑𝑋𝑖𝑌𝑖 = �̂�0∑𝑋𝑖 + �̂�1∑𝑋𝑖2


Solution to Normal Equations:

Least Squares Estimators (LSE):

�̂�𝟏 =∑(𝑿𝒊−𝑿)(𝒀𝒊−𝒀)

∑(𝑿𝒊−𝑿)𝟐

�̂�𝟎 = 𝒀 − �̂�𝟏𝑿

Properties of LSE:

Unbiased estimators (accuracy)

𝐸(�̂�0) = 𝛽0 , 𝐸(�̂�1) = 𝛽1

Minimum variance (precision)

Robust against Normality assumption

Note:

functions are called “estimators”

calculated values from data are called “estimates”

Interpretation:

Intercept (�̂�0) Value of Y when X=0

(not always meaningful!)

Slope (�̂�1) Average change in Y per unit increase in X

“Effect” of X on Y; “regression coefficient”


Example: X = gestational age (wks), Y = head circumference (cms)

Formula for least squares regression line is:

Y = 3.91 + 0.78X

Intercept: not meaningful! (extrapolation to X = 0 weeks)

Slope: For every increase of one week gestational age,

there is an increase of about 0.78 cm head circumference.

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)



Another approach:

Method of Maximum Likelihood

The MLE maximizes the likelihood function

(the likelihood of the observed data, given the model parameters)

Q. Under which parameter values is the sample data

most likely to occur?

[see explanation of MLE on p.27-29]

For simple linear regression:

𝜀𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1𝑋𝑖 ~ 𝑁(0, 𝜎2)

⇒ 𝑓(𝜀𝑖) =1

√2𝜋𝜎𝑒𝑥𝑝 {−

1

2𝜎2(𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)

2}

⇒ likelihood = 𝐿 = ∏ 𝑓(𝜀𝑖)𝑛𝑖=1

⇒ 𝑙𝑜𝑔𝑒𝐿 = 𝑙𝑛 {1

(2𝜋𝜎2)𝑛 2⁄ 𝑒𝑥𝑝 [−1

2𝜎2∑ (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)

2𝑛𝑖=1 ]}

= −𝑛

2𝑙𝑛(2𝜋𝜎2) −

1

2𝜎2∑ (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)

2𝑛𝑖=1

𝜕𝑙𝑜𝑔𝑒𝐿

𝜕𝛽0= 0 ,


𝜕𝛽1= 0 ,


𝜕𝜎= 0

⇒ same solution as LSE (please prove for yourself!)

same nice properties


After calculating the fitted regression line:

Fitted value �̂�𝒊 �̂�𝑖 = �̂�0 + �̂�1𝑋𝑖

On the fitted line for the value 𝑋𝑖

Fitted Y- values are estimates of the Mean Response Function

�̂�𝑖 is an unbiased estimator of the mean response at 𝑋𝑖

The fitted line is an unbiased estimator of the mean response

function

Note: the point (𝑋, 𝑌) is ALWAYS on the fitted regression line,

i.e,

𝑌 = �̂�0 + �̂�1𝑋


The ith residual 𝑒𝑖:

𝑒𝑖 = 𝑌𝑖 − �̂�𝑖 = 𝑌𝑖 − (�̂�0 + �̂�1𝑋𝑖)

· it is the vertical distance between (𝑋𝑖 , 𝑌𝑖) and (𝑋𝑖 , �̂�𝑖)

· it is the estimate of the ith error term, 𝑒𝑖 = 𝜀�̂�

· ∑ 𝑒𝑖 = 0𝑛𝑖=1

Proof:

∑𝑒𝑖 = ∑[𝑌𝑖 − (�̂�0 + �̂�1𝑋𝑖)]

= ∑𝑌𝑖 − 𝑛�̂�0 − �̂�1∑𝑋𝑖

= 0 (by normal equation 1)


Error Sum of Squares:

𝑆𝑆𝐸 = ∑ (𝑌𝑖 − �̂�𝑖)2𝑛

𝑖=1 = ∑ 𝑒𝑖2𝑛

𝑖=1

minimum when residuals are from LSE or MLE.

associated degrees of freedom 𝑑𝑓 = 𝑛 − 2

(generally 𝑑𝑓 = 𝑛 − 𝑝 where p = # of parameters in the model)

Mean Squared Error: unbiased estimator of 𝜎2

𝑀𝑆𝐸 =𝑆𝑆𝐸

𝑑𝑓=

𝑆𝑆𝐸

𝑛−2 𝐸(𝑀𝑆𝐸) = 𝜎2


Example: X = gestational age and Y = head circumference

100 observations

scatterplot, fitted line, fitted values

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)



EXCEL: SUMMARY OUTPUT

Regression Statistics Multiple R 0.780691936 R Square 0.609479899 Adjusted R

Square 0.605495 Standard

Error 1.590413353 Observations 100

ANOVA df SS MS F Significance F

Regression 1 386.8673658 386.8674 152.9474 1.00121E-21 Residual 98 247.8826342 2.529415

Total 99 634.75

Coefficients Standard

Error t Stat P-value Intercept 3.914264144 1.82914689 2.13994 0.034842 X Variable 1 0.780053162 0.063074406 12.36719 1E-21


SAS output:

The REG Procedure

Model: MODEL1

Dependent Variable: headcirc

Number of Observations Read 100

Number of Observations Used 100

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 386.86737 386.86737 152.95 <.0001

Error 98 247.88263 2.52941

Corrected Total 99 634.75000

Root MSE 1.59041 R-Square 0.6095

Dependent Mean 26.45000 Adj R-Sq 0.6055

Coeff Var 6.01290

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 3.91426 1.82915 2.14 0.0348

gestage 1 0.78005 0.06307 12.37 <.0001

Documents

Chapter 1: Linear Regression with One Predictor Variable