55
Correlation

Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Embed Size (px)

Citation preview

Page 1: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation

Page 2: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Comparisons• So far, our inferential statistics have

focused on comparing populations:• Single-sample t- and Z-tests

• Whether a sample comes from a known population • Independent- and dependent-samples t-tests

• Whether two samples are from the same population• ANOVAs

• Whether 3+ samples are from the same population

•What about relationships within a single population?

Correlation Regression

Page 3: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation•Correlations represent the systematic

relationship between two variables

Page 4: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation• Two main properties of a correlation:• Directionality• The nature of the relationship between the

variables

Negativethe variables go opposite directions

Variable X Variable Y

Variable X Variable Y

Positivethe variables go in same direction

Variable X Variable Y

Variable X Variable Y

Page 5: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation• Two main properties of a correlation:• Strength• The consistency in the relationship between

the variables

Weak Strong

Variables are consistently

related

Extreme values in X paired

with extreme values in Y

Variables are inconsistently

related

Page 6: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation• These relationships are quantified by a

correlation coefficient •We are going to use Pearson’s correlation

coefficient: r• r can take any value from -1.00 to +1.00• The sign of r (- / +) indicates the direction of

the relationship • The absolute value (magnitude) of r indicates

the strength of the relationship Small .10

Medium .30

Large .50

Page 7: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation•Correlations can be visualized using a

scatterplot

Page 8: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation is not causation!

Page 9: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation is not causation!

Page 10: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation• Limitations• Causal direction• Both directions of causality are possible

Chocolate consumption Nobel Prizes

Page 11: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation• Limitations• Third variable• Some other variable is related to both our

variables and accounts for their (illusory) relationship

Chocolate consumption Nobel Prizes

Distance from chocolate exporters

Page 12: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Correlation•What sort of correlation would be

expected between bacon consumption and incidence of heart disease?

Large positive

0 1 2 3 4 5 6 7 8 9 100123456789

10

Bacon consumption

Inci

denc

e of

hea

rt d

isea

se

r = .93

Page 13: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

0 1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

Time spent watching TV

Tim

e sp

ent r

eadi

ng

Correlation•What sort of correlation would be

expected between amount of leisure time spent reading and the amount of time watching TV?

Small negative

r = -.24

Page 14: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

0 1 2 3 4 5 6 7 8 9 100123456789

10

Length of commute

Hei

ght

Correlation•What sort of correlation would be

expected between a person’s height and the length of their daily commute?

No correlation

r = .02

Page 15: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so too have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Superhero Debut Height (in.)

Gambit 1990 6’2” (74)

The Hulk 1962 7’ (84)

Aquaman 1941 6’1” (73)

Jetstream 1984 5’7” (67)

Daredevil 1940 6’ (72)

Silver Surfer 1966 6’4” (76)

Captain Britain 1976 6’6” (78)

Wolverine 1974 5’3” (63)

Superman 1938 6’3” (75)

Green Lantern 1959 6’ (72)

Page 16: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing1. Select test.

2. State the null and research hypotheses

3. Describe the distribution of the null hypothesis.

4. Determine the critical values.

5. Calculate the test statistic.

6. Make a decision.

Page 17: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

•Are you trying to test how one variable changes when another variable changes?•Correlation – r •Assumptions• Normal population• Each variable shows similar variability• No “outliers”

1. Select test.

Page 18: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Assumptions• Each variable shows similar variability• Is the “spread” in one variable about the

same at each level of the other?

Page 19: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Assumptions• Each variable shows similar variability• Is the “spread” in one variable about the

same at each level of the other?

Page 20: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Assumptions•No extreme outliers• This can greatly affect the correlation even

though it is spurious

Page 21: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

1930 1940 1950 1960 1970 1980 1990 200055

60

65

70

75

80

85

90

Year of Debut

Hei

ght (

inch

es)

Page 22: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

•Describe the two mutually exclusive possibilities in words and symbolically• Hypotheses about the relationship

between two variables across the population• Correlation coefficients = the amount of

variance in one variable predicted by variance in the other variable

2. State the null and research hypotheses

r = Amount of common varianceAmount of total variance

Page 23: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Research hypothesis (H1):There is a correlation between superhero height and the year of debut. H1: ρ ≠ 0

Note: Our book only allows non-directional tests of correlations In reality, could do directional

Null hypothesis (H0): There is not a positive correlation between superhero height and the year of debut.H0: ρ = 0

Page 24: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

•Well, not really a unique step here for correlations…•…So let’s just use this step to make

sure we know our df

3. Describe the distribution of the null hypothesis.

Page 25: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

• Behind the scenes, r’s are tested with t’s, so is a family of distributions based on dfs

• Because we will be computing variance for each variable, we lose two degrees of freedom• df = n – 2

3. Describe the distribution of the null hypothesis.

df = 5

df = 25

df = 50

df = 50

larger df = narrower distribution

Page 26: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Distribution: r-distribution with 8 dfs

8

210

2

df

df

ndf

Page 27: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing4. Determine the critical values.

•Consult table H As always, use smaller df if what you need isn’t there

Page 28: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Critical value(s):rcrit = ±0.6319Look up in Table H

df = 8α = .05

Remember you must include ±

Tests are non-directional

Page 29: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

• Common variance: What is “shared” between the two variables• What makes them “go together” (if they do)• Total variance: All the variance in our

variables

5. Calculate the test statistic.

r = Amount of common varianceAmount of total variance

Page 30: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Our old friend the deviation!• X – M•But now we’ve got two deviations per

raw score• X – MX

• Y – MY

Person X Y

A 1 2

B 2 3

C 5 4

D 6 7

E 7 6

M = 4.2 M = 4.4

Page 31: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Visualizing joint deviations in a positive

correlation

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

Both deviations are positive

(7-4.2) & (6-4.4)

Both deviations are negative

(2-4.2) & (3-4.4)X

Y

For positive correlations:

For a given raw score both deviations will (usually) be negative

OR both deviations will (usually) be positive

Page 32: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Visualizing joint deviations in a

negative correlation

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

One deviation pos, one deviation neg(2-4.2) & (7-4.4)

One deviation pos, one deviation neg(7-4.2) & (2-4.4)X

Y

For negative correlations:

For a given raw score the two deviations will (usually)

have opposite signs

Page 33: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Numerator of r is the

Sum of Products – SP

• Created by multiplying the two deviations and then summing• Positive correlations will have positive

products• + times + is +• - times - is +

• Negative correlations will have negative products• + times - is –

))(( YX MYMXSP

r = Amount of common varianceAmount of total variance

Page 34: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Products also keep track of strength

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

For Strong correlations:

When one deviation is big, the other is big

X

Y

Page 35: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Products also keep track of strength

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

For weak correlations:

Deviations tend to cancel each other out

X

Y

Page 36: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r•Now we need to compare this to the

overall (expected) variability•We start with SS for each variable• Takes into account sample size

• Then we take the square root of the product • Deals with the squared deviations

• This measure of overall variability is our denominator:

YX SSSS

r = Amount of common varianceAmount of total variance

Page 37: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Calculating r• Therefore, the correlation coefficient is

just the standardized form of SP• Standardized so that it ranges from -1 to 1

YX

YX

YX SSSS

MYMX

SSSS

SPr

))((

Page 38: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Debut(X)

Height(Y)

(X-MX) (Y-MY) (X-MX)(Y-MY)

1990 74.000 27.000 0.600 16.200

1962 84.000 -1.000 10.600 -10.600

1941 73.000 -22.000 -0.400 8.800

1984 67.000 21.000 -6.400 -134.400

1940 72.000 -23.000 -1.400 32.200

1966 76.000 3.000 2.600 7.800

1976 78.000 13.000 4.600 59.800

1974 63.000 11.000 -10.400 -114.400

1938 75.000 -25.000 1.600 -40.000

1959 72.000 -4.000 -1.400 5.600

Calculating the numerator (SP)

MX = 1963 MY = 73.400

SP = -169.000

Page 39: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

Debut(X)

Height(Y)

(X-MX) (Y-MY) (X-MX)2 (Y-MY)2

1990 74.000 27.000 0.600 729.000 0.3601962 84.000 -1.000 10.600 1.000 112.3601941 73.000 -22.000 -0.400 484.000 0.1601984 67.000 21.000 -6.400 441.000 40.9601940 72.000 -23.000 -1.400 529.000 1.9601966 76.000 3.000 2.600 9.000 6.7601976 78.000 13.000 4.600 169.000 21.1601974 63.000 11.000 -10.400 121.000 108.1601938 75.000 -25.000 1.600 625.000 2.5601959 72.000 -4.000 -1.400 16.000 1.960

Calculating the denominator

MX = 1963 MY = 73.400

SSX = 3124.000 SSY = 296.400

Page 40: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

SP = -169.000 SSX = 3124.000 SSY = 296.400

18.0265.962

000.169

600.925953

000.169

400.296000.3124

000.169

YX SSSS

SPr

Page 41: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Steps of hypothesis-testing

•Compare your computed r to the critical r from step 4• r computed = -.18 > -.6319 = r crit•Reject or Fail to reject the null

hypothesis• In this example, we retain, p > .05

6. Make a decision.

Page 42: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.

I could not reject the null hypothesis because these superheroes’ heights did not positively correlate with their year of debut, r(8) = -0.18, p > .05.

Page 43: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Partial correlation•Do taller people have deeper voices?•What if you found r = .6?• Is there a THIRD VARIABLE that is really

driving this effect?• Sex?•Men taller on average•Men deeper voices on average• Could be sex, not height itself driving

effect

Page 44: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Partial correlation•When we create a partial correlation we:• Control for, remove, partial out…• …variance from one variable that potentially

obscures or changes the correlation between two other variables

•Once we partial out sex from the correlation between height and pitch, is there any correlation left?• Partial correlations allow us to

“decontaminate” correlations• Often useful when do not have complete

experimental control

Page 45: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Visualizing partial correlations

Genuine: correlation

holds within both groups

Not Genuine: correlation holds in 1 group, not

other

Not Genuine: No correlation in either group;

Mean difference in groups creates appearance of correlation

Not Genuine: within groups, correlation goes

opposite direction;Mean difference in groups

creates appearance of correlation

Page 46: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Partial correlation• All three variables can be continuous, too•What if there was a correlation between age

and salary?• Do people just give more money to older workers?• Or is it simply that older workers have more years

of experience?• Partial out years of experience to see if there is

still a correlation between age and salary• Foreshadowing: What if there are

independent contributions of age and years of experience?

Page 47: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Computing partial correlation

• rxy is the correlation of interest• Z is the variable to partial out• rxy.z is the correlation of interest once Z is

partialed out• rxz and ryz are the correlations of X and Y with Z• So, you will need to compute three separate

correlations before you can actually compute rxy.z

. 2 21 1XY XZ YZ

XY Z

XZ YZ

r r rr

r r

Page 48: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Hypothesis testing for partial correlations

• Some changes in how to describe your hypotheses and how to describe your results• See cheat sheet at end•Now have to compute 3 correlations

before computing partial correlation•DF = N – 3 for partial correlations•We have to estimate another variance for

the third variable

Page 49: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

What to report in homework• All the steps of hypothesis testing• Make sure both English and symbolic

hypotheses• Make sure to explicitly compare computed r

and critical r• All your work and equations computing r,

including SP and the two SS’s• Description of what you found, see

following page• Include your correlation and df in your

description

Page 50: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Hypotheses - standard•H1: There is a correlation between

height and voice pitch• ρ ≠ 0•H0: There is no correlation between

height and voice pitch• ρ = 0

•Remember, our book doesn’t give the option for one-tailed correlations

Page 51: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Hypotheses - partial•H1: There is a correlation between

height and voice pitch when controlling for sex• ρ ≠ 0•H0: There is no correlation between

height and voice pitch when controlling for sex• ρ = 0•Remember, our book doesn’t give the

option for one-tailed correlations

Page 52: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Description - standard• “We can reject the null hypothesis and conclude that

here was a statistically significant positive correlation between height and deepness of voice such that taller people had deeper voices, r(12) = .65, p < .05”• “We must retain the null and conclude there is no

correlation between income and charitable giving, r(7) = -.03, p > .05.”• For all descriptions, describe the variables.• For significant correlations also:• Describe the sign, pos or neg• Describe how that played out in the varibles.

• In r(#), the # is your df – don’t forget to put the right one

Page 53: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Description - partial• “We can reject the null hypothesis and conclude that

here was a statistically significant positive correlation between height and deepness of voice when controlling for sex such that taller people had deeper voices, r(12) = .65, p < .05”• “We must retain the null and conclude there is no

correlation between income and charitable giving when controlling for political ideology, r(7) = -.03, p > .05.”• For all descriptions, describe the variables.• For significant correlations also:• Describe the sign, pos or neg• Describe how that played out in the varibles.

• In r(#), the # is your df – don’t forget to put the right one

Page 54: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes

Degrees of freedom•Df = N – 2 for standard correlations•Df = N – 3 for partial correlations

Page 55: Correlation. Comparisons So far, our inferential statistics have focused on comparing populations: Single-sample t- and Z-tests Whether a sample comes