View
20
Download
1
Category
Preview:
DESCRIPTION
Chapter 11. Linear Regression and Correlation. Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model:. - PowerPoint PPT Presentation
Citation preview
Linear Regression and Correlation
• Explanatory and Response Variables are Numeric• Relationship between the mean of the response
variable and the level of the explanatory variable assumed to be approximately linear (straight line)
• Model:
),0(~ 210 eNxY
• 1 > 0 Positive Association
• 1 < 0 Negative Association
• 1 = 0 No Association
Least Squares Estimation of 0, 1
0 Mean response when x=0 (y-intercept)
1 Change in mean response when x increases by 1 unit (slope)
• 0, 1 are unknown parameters (like )
• 0+1x Mean response when explanatory variable takes on the value x
• Goal: Choose values (estimates) that minimize the sum of squared errors (SSE) of observed values to the straight-line:
2
1 1
^
0
^
1
2^
1
^
0
^^
n
i ii
n
i ii xyyySSExy
Example - Pharmacodynamics of LSD
Score (y) LSD Conc (x)78.93 1.1758.20 2.9767.47 3.2637.47 4.6945.65 5.8332.92 6.0029.97 6.41
• Response (y) - Math score (mean among 5 volunteers)
• Predictor (x) - LSD tissue concentration (mean of 5 volunteers)
• Raw Data and scatterplot of Score vs LSD concentration:
LSD_CONC
7654321
SC
OR
E
80
70
60
50
40
30
20
Source: Wagner, et al (1968)
Least Squares Computations
xxxyyy
yy
xy
xx
SSSSSE
yyS
yyxxS
xxS
2
2
2
221
2^
2
1
^
0
^
21
^
n
SSE
n
yys
xy
S
S
xx
yyxx
n
iii
xx
xy
e
Summary Calculations Parameter Estimates
Example - Pharmacodynamics of LSD
72.5001.910.89
10.89)33.4)(01.9(09.5001.94749.22
4872.202
333.47
33.30087.50
7
61.350
2^
1
^
0
^
1
^
esxy
xy
xy
Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.91864958.20 2.97 -1.363 8.113 1.857769 -11.058019 65.82076967.47 3.26 -1.073 17.383 1.151329 -18.651959 302.16868937.47 4.69 0.357 -12.617 0.127449 -4.504269 159.18868945.65 5.83 1.497 -4.437 2.241009 -6.642189 19.68696932.92 6.00 1.667 -17.167 2.778889 -28.617389 294.70588929.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343
(Column totals given in bottom row of table)
SPSS Output and Plot of EquationCoefficientsa
89.124 7.048 12.646 .000
-9.009 1.503 -.937 -5.994 .002
(Constant)
LSD_CONC
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: SCOREa.
Linear Regression
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00
sco
re
score = 89.12 + -9.01 * lsd_concR-Square = 0.88
Math Score vs LSD Concentration (SPSS)
Inference Concerning the Slope (1)
• Parameter: Slope in the population model (1)
• Estimator: Least squares estimate:• Estimated standard error:
• Methods of making inference regarding population:– Hypothesis tests (2-sided or 1-sided)
– Confidence Intervals
1
^
xxe SsE 1S ^
1
Hypothesis Test for 1
• 2-Sided Test– H0: 1 = 0
– HA: 1 0
• 1-sided Test– H0: 1 = 0
– HA+: 1 > 0 or
– HA-: 1 < 0
|)|(2:value
||:..
1:..
2,2/
1
^
obs
nobs
xxe
obs
ttPP
ttRR
SstST
)(:)(:
:..:..
1:..
2,2,
1
^
obsobs
nobsnobs
xxe
obs
ttPvalPttPvalP
ttRRttRR
SstST
(1-)100% Confidence Interval for 1
xxe S
stSEt1
2/1
^
2/1
^
^
1
• Conclude positive association if entire interval above 0
• Conclude negative association if entire interval below 0
• Cannot conclude an association if interval contains 0
• Conclusion based on interval is same as 2-sided hypothesis test
Example - Pharmacodynamics of LSD
50.1475.22
112.7SE
475.2212.772.5001.97
1
^
1
^
xxe Ssn
• Testing H0: 1 = 0 vs HA: 1 0
571.2|:|..01.650.1
01.9:.. 5,025.
ttRRtST obsobs
• 95% Confidence Interval for 1 :
)15.5,87.12(86.301.9)50.1(571.201.9
Confidence Interval for Mean When x=x*
• Mean Response at a specific level x* is
• Estimated Mean response and standard error (replacing unknown 0 and 1 with estimates):
• Confidence Interval for Mean Response:
**)|( 10 xxyE y
xx
ey S
xx
nsx
2
1
^
0
^^ *1SE* ^
xx
enyny S
xx
nstt
2
2,2/
^
2,2/
^ *1SE ^
Prediction Interval of Future Response @ x=x*
• Response at a specific level x* is
• Estimated response and standard error (replacing unknown 0 and 1 with estimates):
• Prediction Interval for Future Response:
*10* xy yx
xx
ey S
xx
nsxy
2
1
^
0
^^ *11SE* ^
xx
eny
n S
xx
nstyty
2
2,2/
^
2,2/
^ *11SE ^
Correlation Coefficient• Measures the strength of the linear association
between two variables• Takes on the same sign as the slope estimate from
the linear regression• Not effected by linear transformations of y or x• Does not distinguish between dependent and
independent variable (e.g. height and weight)
• Population Parameter: yx
• Pearson’s Correlation Coefficient:
11 rSS
Sr
yyxx
xyyx
Correlation Coefficient• Values close to 1 in absolute value strong linear
association, positive or negative from sign• Values close to 0 imply little or no association• If data contain outliers (are non-normal),
Spearman’s coefficient of correlation can be computed based on the ranks of the x and y values
• Test of H0:yx = 0 is equivalent to test of H0:1=0
• Coefficient of Determination (ryx2) - Proportion of
variation in y “explained” by the regression on x:
10)Total(
)Residual()Total()( 222
r
SS
SSSS
S
SSESrr
yy
yyyxyx
Example - Pharmacodynamics of LSD
22 )94.0(88.0183.2078
89.253183.2078
94.0)183.2078)(475.22(
487.202
89.253183.2078487.202475.22
yx
yx
yyxyxx
r
r
SSESSS
Mean
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00
Mean = 50.09
Linear Regression
1.00 2.00 3.00 4.00 5.00 6.00
lsd_conc
30.00
40.00
50.00
60.00
70.00
80.00
score
score = 89.12 + -9.01 * lsd_concR-Square = 0.88
Syy SSE
Example - SPSS OutputPearson’s and Spearman’s Measures
Correlations
1 -.937**
. .002
7 7
-.937** 1
.002 .
7 7
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
SCORE
LSD_CONC
SCORE LSD_CONC
Correlation is significant at the 0.01 level (2-tailed).**.
Correlations
1.000 -.929**
. .003
7 7
-.929** 1.000
.003 .
7 7
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
SCORE
LSD_CONC
Spearman's rhoSCORE LSD_CONC
Correlation is significant at the 0.01 level (2-tailed).**.
Hypothesis Test for yx
• 2-Sided Test– H0: yx = 0
– HA: yx 0
• 1-sided Test– H0: yx = 0
– HA+: yx > 0 or
– HA-: yx < 0
|)|(2:value
||:..
1
2:..
2,2/
2
obs
nobs
yx
yxobs
ttPP
ttRR
r
nrtST
)(:)(:
:..:..
1
2:..
2,2,
2
obsobs
nobsnobs
yx
yxobs
ttPvalPttPvalP
ttRRttRR
r
nrtST
Analysis of Variance in Regression
• Goal: Partition the total variation in y into variation “explained” by x and random variation
2^2^2
^^
)()()(
)()()(
yyyyyy
yyyyyy
iiii
iiii
• These three sums of squares and degrees of freedom are:
•Total (TSS) DFT = n-1
• Error (SSE) DFE = n-2
• Model (SSR) DFR = 1
Analysis of Variance for Regression
Source ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare F
Model SSR 1 MSR = SSR/1 F = MSR/MSEError SSE n-2 MSE = SSE/(n-2)Total TSS n-1
• Analysis of Variance - F-test
• H0: 1 = 0 HA: 1 0
)(:value
:..
:..
2,1,
obs
nobs
obs
FFPP
FFRRMSE
MSRFST
Example - Pharmacodynamics of LSD
• Total Sum of squares:
617183.2078)( 2 Ti DFyyTSS
• Error Sum of squares:
527890.253)( 2^
Eii DFyySSE
• Model Sum of Squares:
1293.1824890.253183.2078)( 2^
RiDFyySSR
Example - Pharmacodynamics of LSDSource ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare F
Model 1824.293 1 1824.293 35.93Error 253.890 5 50.778Total 2078.183 6
•Analysis of Variance - F-test
• H0: 1 = 0 HA: 1 0
)93.35(:
61.6:..
93.35:..
5,1,05.
FPvalP
FFRRMSE
MSRFST
obs
obs
Recommended