Medical Statistics (full English class)

Preview:

DESCRIPTION

Medical Statistics (full English class). Ji-Qian Fang School of Public Health Sun Yat-Sen University. Chapter 12 Linear Correlation and Linear Regression. 12.3 Linear regression. Initial meaning of “regression”: Galdon noted that if father is tall, his son - PowerPoint PPT Presentation

Citation preview

Medical Statistics Medical Statistics (full English class)(full English class)

Ji-Qian Fang

School of Public Health

Sun Yat-Sen University

Chapter 12 Chapter 12 Linear Correlation Linear Correlation

and and Linear RegressionLinear Regression

Vocabulary for Chapter 12-2

univariate 单变量

multivariate 多变量

phenomenon 现象

least square 最小二乘

intercept 截距 slope 斜率

regression coefficient 回归系数

population regression coefficient 总体回归系数 sample regression coefficient 样本回归系数 residual 剩余、残差

standard deviation of regression coefficient

回归系数的标准差

standard deviation of residual 剩余标准差

sum of squared residuals 残差平方和

coefficient of determination 决定系数、确定系数

subject matter 专业

Initial meaning of “regression”: Galdon noted that if father is tall, his sonwill be relatively tall; if father is short, hisson will be relative short. But, if father is very tall, his son will not taller th

an his father usually; if father is very short, his son will not shorter than his father usually.

Otherwise, ……?!Galdon called this phenomenon “regression to th

e mean”

12.3 Linear regression

What is regression in statistics?What is regression in statistics?

To find out the track of the means

100

120

140

160

180

200

220

100 120 140 160 180 200 220

Father’s height( cm)

Son’s height (cm)

1700

1900

2100

2300

2500

2700

2900

55 60 65 70 75 80 85

cm胸围( )

ml

肺活量(

Vita

l C

apac

ity (

ml)

Chest circumference (cm)

Given the value of chest circumference (X), the vital capacity (Y) vary around a center (y|x)

All the centers locate on a line -- regression line. The relationship between the center y|x and X – regression equation

|y x X

Linear regression

Try to estimate and , getting

Where

a -- estimate of , intercept

b -- estimate of , slop

-- estimate of y|x

bXaY ˆ

Y

1. Linear regression equation

|y x X

Least square method

To find suitable a and b such that

By calculus,

minimum)ˆ(1

2

n

ii YY

XX

XY

l

lb

22 )(1

))((1

Xn

Xl

YXn

XYl

XX

XY

XbYa

1137.459333.304

667.13756

9333.304)1036(15

171858)(

1

667.13756)35150)(1036(15

12441450))((

1

222

XX

XY

XX

XY

l

lb

Xn

Xl

YXn

XYl

5211.7720667.691137.453333.2343

3333.2343,0667.69

XbYa

YX

Slop bSlop b

Intercept a

Regression Equation

XY 1137.455211.772ˆ

2. t test for regression coefficient

b is sample regression coefficient, change from sample to sample

There is a population regression coefficient, denoted by

Question : Whether =0 or not? H0: =0, H1: ≠0

α=0.05

2

)ˆ( 2

.

n

YYs xy

20

ns

bt

b

Statistic

Standard deviation of regression coefficient

Standard deviation of residual

XYYY

n

iii bllYY

1

2)ˆ(

Sum of squared residuals

2

.

)( XX

ss xyb

Back to Example 12-3

132157338.30826.12

01137.450

bs

bt

0826.129333.304

99.210

)( 2

.

XX

ss xyb

99.210215

291.578719

2

)ˆ( 2

.

n

YYs xy

291.578719667.137561137.4533.1199333

)ˆ(

1137.45,667.13756,33.1199333

1

2

XYYY

n

iii

XYYY

bllYY

bll

3. Application of regression

1) To describe how the value of Y depending on X2) To estimate or predict the value of Y through a

value of X (known) -- based on the regression of Y on X.3) To control the value of X through a value of Y

(known) -- If X is not a random variable, based on the regression of Y on X. -- If X is also a random variable, based on the regression of X on Y.

12.4 The relationship between12.4 The relationship betweenRegression and CorrelationRegression and Correlation

1. Distinguish and connection Distinguish:

Correlation: Both X and Y are random

Regression: Y is random

X is not random – Type regression

X is also random – Type regression

Connection: When both X and Y are random

1) Same sign for correlation coefficient

and regression coefficient

2) t tests are equivalent

tr = tb

3) Coefficient of determination Without regression, given the value of Xi we can only predict , the sum of

squared residuals is

After regression, given the value of Xi we can predict , the sum of squared residuals is

Contribution of regression

It can be proved

n

ii YYSS

1

2Total )(

n

iii YYSS

1

2Residual )ˆ(

ResidualTotalRegression SSSSSS

Total

RegressionionDeterminat oft CoefficienSS

SS

2ionDeterminat oft Coefficien r

YYi

iii bXaYY ˆ

2. Caution --

for regression and correlation

1) Don’t put any two variables together for correlation and regression – They must have some relation in subject matter;

2) Correlation does not necessary mean causality

-- sometimes may be indirect relation or even no any real relation;

3) A big value of r does not necessary mean a big regression coefficient b;

4) To reject H0: ρ=0 does not necessary mean the correlation is strong -- ρ≠0;

5) Scatter diagram is useful before working with linear correlation and linear regression;

6) The regression equation is not allowed to be applied beyond the range of the data set.

Recommended