Upload
mitchell-burke
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Correlation
Comparisons• So far, our inferential statistics have
focused on comparing populations:• Single-sample t- and Z-tests
• Whether a sample comes from a known population • Independent- and dependent-samples t-tests
• Whether two samples are from the same population• ANOVAs
• Whether 3+ samples are from the same population
•What about relationships within a single population?
Correlation Regression
Correlation•Correlations represent the systematic
relationship between two variables
Correlation• Two main properties of a correlation:• Directionality• The nature of the relationship between the
variables
Negativethe variables go opposite directions
Variable X Variable Y
Variable X Variable Y
Positivethe variables go in same direction
Variable X Variable Y
Variable X Variable Y
Correlation• Two main properties of a correlation:• Strength• The consistency in the relationship between
the variables
Weak Strong
Variables are consistently
related
Extreme values in X paired
with extreme values in Y
Variables are inconsistently
related
Correlation• These relationships are quantified by a
correlation coefficient •We are going to use Pearson’s correlation
coefficient: r• r can take any value from -1.00 to +1.00• The sign of r (- / +) indicates the direction of
the relationship • The absolute value (magnitude) of r indicates
the strength of the relationship Small .10
Medium .30
Large .50
Correlation•Correlations can be visualized using a
scatterplot
Correlation is not causation!
Correlation is not causation!
Correlation• Limitations• Causal direction• Both directions of causality are possible
Chocolate consumption Nobel Prizes
Correlation• Limitations• Third variable• Some other variable is related to both our
variables and accounts for their (illusory) relationship
Chocolate consumption Nobel Prizes
Distance from chocolate exporters
Correlation•What sort of correlation would be
expected between bacon consumption and incidence of heart disease?
Large positive
0 1 2 3 4 5 6 7 8 9 100123456789
10
Bacon consumption
Inci
denc
e of
hea
rt d
isea
se
r = .93
0 1 2 3 4 5 6 7 8 9 100
2
4
6
8
10
Time spent watching TV
Tim
e sp
ent r
eadi
ng
Correlation•What sort of correlation would be
expected between amount of leisure time spent reading and the amount of time watching TV?
Small negative
r = -.24
0 1 2 3 4 5 6 7 8 9 100123456789
10
Length of commute
Hei
ght
Correlation•What sort of correlation would be
expected between a person’s height and the length of their daily commute?
No correlation
r = .02
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so too have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Superhero Debut Height (in.)
Gambit 1990 6’2” (74)
The Hulk 1962 7’ (84)
Aquaman 1941 6’1” (73)
Jetstream 1984 5’7” (67)
Daredevil 1940 6’ (72)
Silver Surfer 1966 6’4” (76)
Captain Britain 1976 6’6” (78)
Wolverine 1974 5’3” (63)
Superman 1938 6’3” (75)
Green Lantern 1959 6’ (72)
Steps of hypothesis-testing1. Select test.
2. State the null and research hypotheses
3. Describe the distribution of the null hypothesis.
4. Determine the critical values.
5. Calculate the test statistic.
6. Make a decision.
Steps of hypothesis-testing
•Are you trying to test how one variable changes when another variable changes?•Correlation – r •Assumptions• Normal population• Each variable shows similar variability• No “outliers”
1. Select test.
Assumptions• Each variable shows similar variability• Is the “spread” in one variable about the
same at each level of the other?
Assumptions• Each variable shows similar variability• Is the “spread” in one variable about the
same at each level of the other?
Assumptions•No extreme outliers• This can greatly affect the correlation even
though it is spurious
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
1930 1940 1950 1960 1970 1980 1990 200055
60
65
70
75
80
85
90
Year of Debut
Hei
ght (
inch
es)
Steps of hypothesis-testing
•Describe the two mutually exclusive possibilities in words and symbolically• Hypotheses about the relationship
between two variables across the population• Correlation coefficients = the amount of
variance in one variable predicted by variance in the other variable
2. State the null and research hypotheses
r = Amount of common varianceAmount of total variance
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Research hypothesis (H1):There is a correlation between superhero height and the year of debut. H1: ρ ≠ 0
Note: Our book only allows non-directional tests of correlations In reality, could do directional
Null hypothesis (H0): There is not a positive correlation between superhero height and the year of debut.H0: ρ = 0
Steps of hypothesis-testing
•Well, not really a unique step here for correlations…•…So let’s just use this step to make
sure we know our df
3. Describe the distribution of the null hypothesis.
Steps of hypothesis-testing
• Behind the scenes, r’s are tested with t’s, so is a family of distributions based on dfs
• Because we will be computing variance for each variable, we lose two degrees of freedom• df = n – 2
3. Describe the distribution of the null hypothesis.
df = 5
df = 25
df = 50
df = 50
larger df = narrower distribution
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Distribution: r-distribution with 8 dfs
8
210
2
df
df
ndf
Steps of hypothesis-testing4. Determine the critical values.
•Consult table H As always, use smaller df if what you need isn’t there
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Critical value(s):rcrit = ±0.6319Look up in Table H
df = 8α = .05
Remember you must include ±
Tests are non-directional
Steps of hypothesis-testing
• Common variance: What is “shared” between the two variables• What makes them “go together” (if they do)• Total variance: All the variance in our
variables
5. Calculate the test statistic.
r = Amount of common varianceAmount of total variance
Calculating r•Our old friend the deviation!• X – M•But now we’ve got two deviations per
raw score• X – MX
• Y – MY
Person X Y
A 1 2
B 2 3
C 5 4
D 6 7
E 7 6
M = 4.2 M = 4.4
Calculating r•Visualizing joint deviations in a positive
correlation
0 1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
Both deviations are positive
(7-4.2) & (6-4.4)
Both deviations are negative
(2-4.2) & (3-4.4)X
Y
For positive correlations:
For a given raw score both deviations will (usually) be negative
OR both deviations will (usually) be positive
Calculating r•Visualizing joint deviations in a
negative correlation
0 1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
One deviation pos, one deviation neg(2-4.2) & (7-4.4)
One deviation pos, one deviation neg(7-4.2) & (2-4.4)X
Y
For negative correlations:
For a given raw score the two deviations will (usually)
have opposite signs
Calculating r•Numerator of r is the
Sum of Products – SP
• Created by multiplying the two deviations and then summing• Positive correlations will have positive
products• + times + is +• - times - is +
• Negative correlations will have negative products• + times - is –
))(( YX MYMXSP
r = Amount of common varianceAmount of total variance
Calculating r•Products also keep track of strength
0 1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
For Strong correlations:
When one deviation is big, the other is big
X
Y
Calculating r•Products also keep track of strength
0 1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
For weak correlations:
Deviations tend to cancel each other out
X
Y
Calculating r•Now we need to compare this to the
overall (expected) variability•We start with SS for each variable• Takes into account sample size
• Then we take the square root of the product • Deals with the squared deviations
• This measure of overall variability is our denominator:
YX SSSS
r = Amount of common varianceAmount of total variance
Calculating r• Therefore, the correlation coefficient is
just the standardized form of SP• Standardized so that it ranges from -1 to 1
YX
YX
YX SSSS
MYMX
SSSS
SPr
))((
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Debut(X)
Height(Y)
(X-MX) (Y-MY) (X-MX)(Y-MY)
1990 74.000 27.000 0.600 16.200
1962 84.000 -1.000 10.600 -10.600
1941 73.000 -22.000 -0.400 8.800
1984 67.000 21.000 -6.400 -134.400
1940 72.000 -23.000 -1.400 32.200
1966 76.000 3.000 2.600 7.800
1976 78.000 13.000 4.600 59.800
1974 63.000 11.000 -10.400 -114.400
1938 75.000 -25.000 1.600 -40.000
1959 72.000 -4.000 -1.400 5.600
Calculating the numerator (SP)
MX = 1963 MY = 73.400
SP = -169.000
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
Debut(X)
Height(Y)
(X-MX) (Y-MY) (X-MX)2 (Y-MY)2
1990 74.000 27.000 0.600 729.000 0.3601962 84.000 -1.000 10.600 1.000 112.3601941 73.000 -22.000 -0.400 484.000 0.1601984 67.000 21.000 -6.400 441.000 40.9601940 72.000 -23.000 -1.400 529.000 1.9601966 76.000 3.000 2.600 9.000 6.7601976 78.000 13.000 4.600 169.000 21.1601974 63.000 11.000 -10.400 121.000 108.1601938 75.000 -25.000 1.600 625.000 2.5601959 72.000 -4.000 -1.400 16.000 1.960
Calculating the denominator
MX = 1963 MY = 73.400
SSX = 3124.000 SSY = 296.400
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
SP = -169.000 SSX = 3124.000 SSY = 296.400
18.0265.962
000.169
600.925953
000.169
400.296000.3124
000.169
YX SSSS
SPr
Steps of hypothesis-testing
•Compare your computed r to the critical r from step 4• r computed = -.18 > -.6319 = r crit•Reject or Fail to reject the null
hypothesis• In this example, we retain, p > .05
6. Make a decision.
Sheldon’s exampleSheldon is convinced that there is a positive correlation between superhero height and the year they debuted. He thinks that since humans have been getting taller over the years, so to have their superheroes. He’s collected the following random sample of 10 superheroes to assess his claim.
I could not reject the null hypothesis because these superheroes’ heights did not positively correlate with their year of debut, r(8) = -0.18, p > .05.
Partial correlation•Do taller people have deeper voices?•What if you found r = .6?• Is there a THIRD VARIABLE that is really
driving this effect?• Sex?•Men taller on average•Men deeper voices on average• Could be sex, not height itself driving
effect
Partial correlation•When we create a partial correlation we:• Control for, remove, partial out…• …variance from one variable that potentially
obscures or changes the correlation between two other variables
•Once we partial out sex from the correlation between height and pitch, is there any correlation left?• Partial correlations allow us to
“decontaminate” correlations• Often useful when do not have complete
experimental control
Visualizing partial correlations
Genuine: correlation
holds within both groups
Not Genuine: correlation holds in 1 group, not
other
Not Genuine: No correlation in either group;
Mean difference in groups creates appearance of correlation
Not Genuine: within groups, correlation goes
opposite direction;Mean difference in groups
creates appearance of correlation
Partial correlation• All three variables can be continuous, too•What if there was a correlation between age
and salary?• Do people just give more money to older workers?• Or is it simply that older workers have more years
of experience?• Partial out years of experience to see if there is
still a correlation between age and salary• Foreshadowing: What if there are
independent contributions of age and years of experience?
Computing partial correlation
• rxy is the correlation of interest• Z is the variable to partial out• rxy.z is the correlation of interest once Z is
partialed out• rxz and ryz are the correlations of X and Y with Z• So, you will need to compute three separate
correlations before you can actually compute rxy.z
. 2 21 1XY XZ YZ
XY Z
XZ YZ
r r rr
r r
Hypothesis testing for partial correlations
• Some changes in how to describe your hypotheses and how to describe your results• See cheat sheet at end•Now have to compute 3 correlations
before computing partial correlation•DF = N – 3 for partial correlations•We have to estimate another variance for
the third variable
What to report in homework• All the steps of hypothesis testing• Make sure both English and symbolic
hypotheses• Make sure to explicitly compare computed r
and critical r• All your work and equations computing r,
including SP and the two SS’s• Description of what you found, see
following page• Include your correlation and df in your
description
Hypotheses - standard•H1: There is a correlation between
height and voice pitch• ρ ≠ 0•H0: There is no correlation between
height and voice pitch• ρ = 0
•Remember, our book doesn’t give the option for one-tailed correlations
Hypotheses - partial•H1: There is a correlation between
height and voice pitch when controlling for sex• ρ ≠ 0•H0: There is no correlation between
height and voice pitch when controlling for sex• ρ = 0•Remember, our book doesn’t give the
option for one-tailed correlations
Description - standard• “We can reject the null hypothesis and conclude that
here was a statistically significant positive correlation between height and deepness of voice such that taller people had deeper voices, r(12) = .65, p < .05”• “We must retain the null and conclude there is no
correlation between income and charitable giving, r(7) = -.03, p > .05.”• For all descriptions, describe the variables.• For significant correlations also:• Describe the sign, pos or neg• Describe how that played out in the varibles.
• In r(#), the # is your df – don’t forget to put the right one
Description - partial• “We can reject the null hypothesis and conclude that
here was a statistically significant positive correlation between height and deepness of voice when controlling for sex such that taller people had deeper voices, r(12) = .65, p < .05”• “We must retain the null and conclude there is no
correlation between income and charitable giving when controlling for political ideology, r(7) = -.03, p > .05.”• For all descriptions, describe the variables.• For significant correlations also:• Describe the sign, pos or neg• Describe how that played out in the varibles.
• In r(#), the # is your df – don’t forget to put the right one
Degrees of freedom•Df = N – 2 for standard correlations•Df = N – 3 for partial correlations