Upload
helen-wheeler
View
227
Download
1
Embed Size (px)
Citation preview
Chapter 4 - Scatterplots and Correlation
• Dealing with several variables within a group vs. the same variable for different groups.
• Response Variable: measures the outcome of a study.
• Explanatory Variable: attempts to explain the observed outcomes.
• ex: Body Temp vs. Alcohol (Mice)
• ex: Predicting SAT Math if you know SAT Verbal
WARNING! WARNING!EXPLANATORY VARIABLES DO NOT
NECESSARILY CAUSE CHANGESIN RESPONSE VARIABLES!!!
Scatter Plot: The most effective way to display the relationship between two quantitative variables measured on the same individuals.
(2.1 cont’d)
Horizontal Axis (x) = explanatory variable (if there is one)
Vertical Axis (y) = response variable (if there is one)
If there is no exp/resp distinction, it can be plotted either way…
Interpreting Scatter Plots
• Look for overall pattern
• Direction / Form / Strength
• Direction = “Positive” or “Negative “ Association:
•Positive Association: Above average values of one variable tend to accompany above average values of the other variable.
•Negative Association: Above average values of one variable tend to accompany below average values of the other variable.
• Form - can be linear / curved / clustered
• Strength Stronger = less scatter - closer to a straight line… Weaker = more scatter, not as linear…
Direction = Positive
Form = Linear
Strength = Fairly Strong
Direction = Positive
Form = Scattered
Strength = Weak
Direction = Negative Form = Scattered Strength = Weak
Direction = Negative Form = Curved / Clustered Strength = Weak
Calculator Steps for Scatter plot
1) Enter data into list 1 & 2
ex 2.5 pg 99:
2) 2nd Y=
3) Enter
4)
Select Type
Set Xlist / Frequency
Calculator Steps for Scatter plot (cont’d)
5) Turn On
6) Set Window to match Data Window
7) Graph
Adding categorical variables to scatter plots
• Use different colors or symbols to indicate a categorical variable or duplicate values…
1.0
2.0
3.0
4.0
25 50 75 100 125 150 175 200 225 250
Cell Minutes per Week vs. GPA
+
+
++
+
+
++
+
++++
+
+
+
++
++
+
+
Seniors
Juniors
Soph ++ Duplicate
Correlation
• Correlation = a numerical measure of how strong a linear relationship is.
• Visually, correlation is hard to judge. Our eyes can be fooled by white space around a scatterplot and the plotting scales.
**Same Data – Different Scales**
ex:
Formula:
Correlation variable
… of the sum
… of the products
ex: Correlation between height and weight – height is x / weight is y….
… of the standardized
heights
… and the standardized
weights
…for each measurement
n - 1
… is an average
Calculator Procedure
ex: Fossil Data
Femur Humerus38 4156 6359 70
64 7274 84
Step 1) Insert Data into lists
** Set DiagnosticOn**(one time step)
Step 2) Run Stat Calc LinReg
Correlation Facts
• Correlation (r) always falls between -1 and 1.
• The closer to 0 r is, the weaker the relationship.
• Positive r = positive association / negative r = negative association.
• Because r uses standardized values, r has no units.
• Correlation measures the strength of only LINEAR relationships. It cannot be used to describe curved relationships no matter how strong they are.
WARNING! WARNING!CORRELATION IS STRONGLY
AFFECTED BY OUTLIERS!!
WARNING! WARNING!CORRELATION IS NOT A
COMPLETE DESCRIPTION OF2-VARIABLE DATA!!
The image above shows scatterplots of Anscombe's quartet, a set of four different pairs of variables created by Francis Anscombe. The four y variables have the same mean (7.5), standard deviation (4.12) and correlation (0.81)