21
Session 13: Correlation (Zar, Chapter 19)

Session 13: Correlation (Zar, Chapter 19). (1)Regression vs. correlation Regression: R 2 is the proportion that the model explains of the variability

Embed Size (px)

Citation preview

Session 13: Correlation

(Zar, Chapter 19)

(1) Regression vs. correlationRegression:

2

22•

2

2 2

ˆ ˆˆ= - =

or

i i

i

R R

T y x

i i

i i

y a b x

x x y ya y b x b

x x

SS MSR F

SS s

x x y y

x x y y

R2 is the proportion that the model explains of the variability of y.

Correlation: Relationship (x & y both random)Linear Relationship measured as the correlation coefficient:

2 2

2 2

2 2

11

11

i i

i i

xy

x y

x x y y nrx x y y

nS

S S

r R

“Pearson Correlation Coefficient”

“Pearson Linear Correlation Coefficient”

The correlation coefficient can be negative depending upon the numerator . SoxyS

2 20 1

1 1

r R

r

Perfect Negative Perfect Positive

0 -- No Linear Relationship

No Linear Relationship

No Linear Relationship

(2) Hypotheses about r

0

A

H : 0

H : 0

2

r

o2

2

02

where

1

2

if t - 2 , accept H

or

1if r=0 t n-2 , accept H

2

r

rt

S

rs

n

t n

r

n

Problem! Show ˆ

r b

r b

s s

Can use table B.17 directly to perform the test.

Wing Length (cm) Tail Length (cm)10.4 7.410.8 7.611.1 7.910.2 7.210.3 7.410.2 7.110.7 7.410.5 7.210.8 7.811.2 7.710.6 7.811.4 8.3

Example 19.1a

Correlations

1.000 .870**. .000

12 12

.870** 1.000

.000 .12 12

Pearson CorrelationSig. (2-tailed)N

Pearson CorrelationSig. (2-tailed)N

WING

TAIL

WING TAIL

Correlation is significant at the 0.01 level(2-tailed).

**.

Model Summary b

.870a .758 .733 .2040Model1

R R SquareAdjusted R

SquareStd. Error ofthe Estimate

Predictors: (Constant), TAILa.

Dependent Variable: WINGb.

ANOVA b

1.300 1 1.300 31.240 .000a

.416 10 4.163E-021.717 11

RegressionResidualTotal

Model1

Sum of Squares df Mean Square F Sig.

Predictors: (Constant), TAILa.

Dependent Variable: WINGb.

Coefficients a

3.248 1.332 2.439 .035

.983 .176 .870 5.589 .000

(Constant)

TAIL

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: WINGa.

Tail Length

8.48.28.07.87.67.47.27.0

11.6

11.4

11.2

11.0

10.8

10.6

10.4

10.2

10.0

0 0 0

A 0

H : = where 0

H :

Cannot just replace 0 with

0 like we did for , , , etc.b a x

Fisher developed the following transformation:

0

0

10.5ln

1

10.5 ln

1o

rz

r

and

0

z

02

2

and

where

1

3

is N 0,1 so that

if K , (Table B.2), reject H

Can use t 2

z

zZ

n

Z

n

For the 1-sided Tests:

1A 0

o

H : >

if > K , reject Hz

2A 0

0

H :

if <-K , reject Hz

F.N. David showed that Fisher’s z is good if n ≥ 20.

Note KK are from Table B.2. Table B.3 can be used (with ).

Note: Table B.18 calculates the transform.

(3) Spearman rank correlation coefficient

2

13

3 2

3 3

6 1-

or

/ 6

/ 6 2 / 6 2

n

ii

s

i x y

s

x y

dr

n n

n n d T Tr

n n T n n T

If there are ties, where

3

3

12

12

x x

y y

i i

x

i i

y

t tT

t tT

0

A 0

H : 0

H : s

s

0If < Table B.20 2 , , accept H

If > 100, use for and use t-tabless

s

r n

n r r

0 0

A 0

s

(from Fisher's z-trans)

0.5 ln

H : 0

H :

If 10 and 0.9 use to get

1

1

1.060with

3

s

s

s s

ss

s

s

z

n r z

r

r

n

The Spearman rank correlation coefficient, computed for the data of Example 19.1.

X Rank ofX

Y Rank ofY

dt di2

10.4 4 7.4 5 -1 110.8 8.5 7.6 7 1.5 2.2511.1 10 7.9 11 -1 110.2 1.5 7. 2 2.5 -1 110.3 3 7.4 5 -2 410.2 1.5 7.1 1 0.5 0.2510.7 7 7.4 5 2 410.5 5 7.2 2.5 2.5 6.2510.8 8.5 7.8 9.5 -1 111.2 11 7.7 8 3 910.6 6 7.8 9.5 -3.5 12.2511.4 12 8.3 12 0 0

Correlations

1.000 .851**. .000

12 12

.851** 1.000

.000 .12 12

Correlation CoefficientSig. (2-tailed)N

Correlation CoefficientSig. (2-tailed)N

WING

TAIL

Spearman's rhoWING TAIL

Correlation is significant at the .01 level (2-tailed).**.

(4) Correlation in Tables: Cohen’s kappa

Recall the coded agreement tables & preference tables of Chapter 9:

Define Pij =fij/n Benign P11 P12 P13 P14 P1+ Poss P21 P22 P23 P24 P2+ Prob P31 P32 P33 P34 P3+ Cancer P41 P42 P43 P44 P4+ P+1 P+2 P+3 P+4 1

Method “A” Benign Possible Probable Cancer Benign f11 f12 f13 f14 R1

Method Possible F21 f22 f23 f24 R2 “B” Probable f31 f32 f33 f34 R3

Cancer f41 f42 f43 f44 R4 C1 C2 C2 C4 n

R1 R2 R3 R4 n

Define

11

21

k

iii

k

i ii

P

P P

sum of observed probability on main diagonal

sum of main diagonal estimates

1 2

2

2

ˆ1 1

ii i i

i i

ii i i

i i

P P P

P P

N f R C

N R C

0

A A

H : 0

H : 0 usually H : 0

(positive association)

0

2 0

from B.2or B.3(n )

ˆ -If K , then accept H

ˆZ

1

A 0

0 01

Table B.2 or B.3 ( = )

H :

ˆ ˆIf K , reject H

Test 2-sided:

Test 1-sided:

where

2 21 4 21 1 1 1 2 32

2 3 4

2 2 2

1 41 2 1 21ˆ

1 1 1N

and

3

2

4

ii i ii

ii j ii j

P P P

P P P

Or we can use the following if testing versus 0:

0

22 2

22

2

H : "A" independent of "B"

ˆ1

i i i ii

I

P P P P

n

Remember the Dental Mildness Study:

Exam 1

0 1 2

Exam 2 0 2288 121 37

1 84 32 9

2 44 6 1

THETA1 0.8852 THETA2 0.8628 THETA3 1.6194

KAPPA= 0.1631

THETA4 = 3.104 A= 5.40025 B= -8.17327 C= 4.69052 THETA5= 1.5747948 VAR1= 0.0007313 (Large Sample) VAR2= 0.0006585 (Null Model of Independence)

z0.1631

0.00065856.36