Upload
carol-king
View
213
Download
0
Embed Size (px)
Citation preview
Is there a relationship between the lengths of body
parts?
The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.
--Stephen Jay Gould, The Mismeasure of Man
Linear Correlation & Regression
Essentials: Correlation(The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning. --Stephen Jay Gould,
The Mismeasure of Man.)
Correlation – potential relationships, not causality.
Know the steps one might employ before obtaining a correlation.
Know the characteristics of the Pearson Product Moment Correlation Coefficient (for us the correlation).
Be able to calculate a correlation and determine if it is statistically significant.
Be able to create a scatter plot of the paired data being studied.
Be able to determine the directionality of a correlation and its strength via formula and observation of plotted data.
Correlation Correlation – A correlation exists between two
variables when one of them is related to the other in some way.
Paired Data – A measurement on two variables for each unit in a population or sample.
Scatterplot – a graph in which the paired (x,y) data are plotted with a horizontal x-axis (independent variable) and a vertical y-axis (dependent variable). Each individual pair is plotted as a single point.
ANATOMY OF A SCATTER PLOT
A scatterplot graphs the relationship between paired (x, y) quantitative data values. If it is believed that there is a causal relationship, the independent variable (x) is placed on the x-axis, while the dependent variable (y) is placed on the y-axis.
The data presented in this scatterplot represent the time and distance of eight balsa wood airplane flights. Making the assumption that time in air might affect overall distance, the time variable was placed on the x-axis. The distance variable is presented on the y-axis. Each dot on the graph corresponds to one (x,y) pair from the data set.
Time and Distance Relationship for Straight Flights
of Starfire Balsa Wood Airplanes
Data Collected: Fall 2004
Time (sec.)
987654321
Dis
t (c
m)
5000
4000
3000
2000
1000
0
Time (sec.) Distance (cm.)2.75 6268.59 27706.42 45805.22 29332.9 15033.02 19734.31 22351.68 1250
Building a Scatterplot:
1) Identify two quantitative variables that appear to have a relationship. If there appears to be a causal relationship, the values of the independent variable (x) are recorded on the x-axis and the values of the dependent variable (y) are recorded via the y-axis.
2) Create a graph with the x-axis containing a scale appropriate to the x variable and a label, which identifies the measurement scale, e.g. seconds. On the y-axis place the scale for the y variable and include a label.
3) Obtain a listing of the paired data values. (The data for this scatterplot are noted below.)
4) Using the (x,y) coordinates, place a mark on the graph for each set of paired values.
5) Add a title and other useful information
Data used for this scatterplot
Title.
Y-axis variable and measurement scale.
X-axis variable and measurement scale.
Data points for the paired variables.
e.g. (8.59, 27.70)
Tar and Nicotine Amounts
In 29 Brands of Cigarettes
NICOTINE
1.61.41.21.0.8.6.4.20.0
TAR
20
10
0
Scatter plot
Paired Data For Six Dining Parties
x x
yy y
x( a ) P o s i t iv e ( b ) S t r o n g
p o s i t iv e( c ) P e r f e c t
p o s i t iv e
x x
yy y
x( a ) P o s i t iv e ( b ) S t r o n g
p o s i t iv e( c ) P e r f e c t
p o s i t iv e
Positive Linear Correlation
x x
yy y
x(d ) N e g a t iv e (e ) S tro n g
n e g a t iv e( f) P e r fe c t
n e g a t iv e
x x
yy y
x(d ) N e g a t iv e (e ) S tro n g
n e g a t iv e( f) P e r fe c t
n e g a t iv e
Negative Linear Correlation
x x
yy
(g ) N o C o rre la tio n (h ) N o n lin e a r C o rre la tio n
x x
yy
(g ) N o C o rre la tio n (h ) N o n lin e a r C o rre la tio n
No Linear Correlation
The Linear Correlation Coefficient
Denoted r when considering a sample, and
(rho) when considering a population. The Linear Correlation Coefficient is a measure
of direction and magnitude between the paired x and y values in a sample. Its value is obtained using the following formula:
2222 )()(
))((
yynxxn
yxxynr
Facts About r
The value of r is always between –1 and 1. The sign (-/+) of r reflects the direction of
the correlation. If r is negative, then there exists a negative
association between the two variables. That is, as one increases, the other decreases.
If r is positive, then there exists a positive relationship between the two variables. That is, as one increases, the other increases.
The magnitude of the correlation indicates the strength of the association. Values closer to –1 and 1 signify a stronger association A value of –1 is a perfect negative correlation. A value of 1 is a perfect positive correlation.
Facts About r (cont.)
The value of r does not change if all values of either variable are converted to a different scale.
The value of r is not affected by the choice of x and y. That is, if x and y are interchanged, the value of r will not change.
Facts About r (cont.)
Does a Correlation Actually Exist?
The answer to this can be somewhat subjective. How strong does a correlation need to be? Start by asking the following:
Does it make sense to look at this relationship? Does a scatter plot present a relationship (either
positive or negative)? If yes to both, calculate r.
We Begin With a Hypothesis
In linear correlation, the null hypothesis states that no linear correlation exists. In other words,
In notation The alternative hypothesis states that a
linear correlation does exist. In other words
In notation
0:0 H
0:1 H
0
0
We Test The Hypothesis
Based on the sample data, a value for r is obtained. This is called the test statistic.
The absolute value of the test statistic is then compared to the appropriate value in a table of critical values of r.
456789
101112131415161718192025303540455060708090100
n
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05 = .01
456789
101112131415161718192025303540455060708090100
n
.999
.959
.917
.875
.834
.798
.765
.735
.708
.684
.661
.641
.623
.606
.590
.575
.561
.505
.463
.430
.402
.378
.361
.330
.305
.286
.269
.256
.950
.878
.811
.754
.707
.666
.632
.602
.576
.553
.532
.514
.497
.482
.468
.456
.444
.396
.361
.335
.312
.294
.279
.254
.236
.220
.207
.196
= .05 = .01Table of CriticalValues for r
Conclusion If the absolute value of r exceeds the table
value, we reject the null hypothesis which states that no significant linear correlation exists.
If the absolute value of r does not exceed the table value, we fail to reject the null hypothesis.
Recall Linear Correlation
Association between 2 quantitative variables.
Paired data (bivariate data). Scatter plot. Positive/Negative. Correlation coefficient, r.