Upload
sabrina-mcbride
View
233
Download
0
Tags:
Embed Size (px)
Citation preview
10/03/2003 [email protected]
Introduction
We are often asked to describe the relationship between two or more variables
Is there a relationship between points in the leaving cert and QCA
Is there a relationship between parents IQ and children's IQ
10/03/2003 [email protected]
What are Scatter Plots
Two dimensional plot showing the (X,Y) value for each observation
Used to determine whether there is any pronounced relationship and if so whether the relationship may be treated as approximately linear.
Y is usually the response (dependent) variable X is usually the explanatory (independent) variable The response variable is the variable whose variation we wish
to explain An explanatory variable is a variable used to explain variation
in the response variable
10/03/2003 [email protected]
Positive Linear Relationship
10/03/2003 [email protected]
Negative Linear Relationship
10/03/2003 [email protected]
No Linear Relationship
10/03/2003 [email protected]
No Relationship
10/03/2003 [email protected]
Example 1
Two sets of exam results for 11 students Maths & Physics Are they related Does a good performance in Maths go with a
good performance in Physics Let the Maths mark be X Let the Physics mark be Y
10/03/2003 [email protected]
Table of Results
X 41 37 38 39 49 47 42 34 36 48 29
Y 36 20 31 24 37 35 42 26 27 29 23
• X- Total is 440
• X-mean is 40
• Y-Total is 330
• Y-mean is 30
10/03/2003 [email protected]
Maths Vs Physics
10/03/2003 [email protected]
What does the Graph tell us
The means divide the graph into four quadrants
Most of the data lies in the bottom left or top right quadrants
Only two fall outside these quadrants This indicates a probable relationship
between X and Y for a particular student
10/03/2003 [email protected]
Correlation Coefficient From a diagram we get a general idea of the relationship. For precision we need a numerical measure. We need to measure the strength of the relationship The most common measure is the Pearson Product Moment Correlation Coefficient Usually known as the Correlation Coefficient We will usually be dealing with population samples The sample correlation coefficient is called r
10/03/2003 [email protected]
Properties of r r can take values from -1 to +1 r = +1 or r = -1 represents a perfect linear correlation or
a perfect relationship between the variables r = 0 indicates little or no linear relationship i.e. as X
increases there is no definite tendency for the values of Y to increase or decrease in a straight line
r close to +1 indicates a large positive correlation i.e. Y tends to increase as X increases.
r close to -1 indicates a large negative correlation i.e. Y tends to decrease as X increases.
Further r differs from 0, the stronger the relationship. The sign of r indicates the direction of the relationship
10/03/2003 [email protected]
Examples of various r values
r = +1 r = -1 r = -0.54
r = 0.70 r = 0 r = 0
10/03/2003 [email protected]
The formula for Calculating r
yyxx
xy
n
i
n
ii
i
n
i
n
ii
i
n
i
n
ii
n
ii
ii
SS
S
n
y
yn
x
x
n
yxyx
r
1
2
12
1
2
12
1
11
10/03/2003 [email protected]
Example 2
Find the correlation coefficient r between Y and X
Subject A B C D E F G
X 1 3 5 7 9 11 13
Y 7 4 13 16 10 22 19
10/03/2003 [email protected]
Create a table
Subject Xi Yi XiYi Xi squared Yi squared
A 1 7 7 1 49
B 3 4 12 9 16
C 5 13 65 25 169
D 7 16 112 49 256
E 9 10 90 81 100
F 11 22 242 121 484
G 13 19 247 169 361
Total 49 91 775 455 1435
10/03/2003 [email protected]
Significance Test
Ho: No Linear relationship exists r equal to 0 HA: There is a linear relationship r not equal to 0 Confidence Interval say 90%, 95%, 99% etc This means alpha = 0.1, 0.05, 0.01 etc Use table 10: Percentage points of the Correlation Coefficient Left hand column choose v = n-2 ( n = sample size) Find critical value If r > critical value then reject Ho
10/03/2003 [email protected]
Conclusion
r = 0.82 let alpha = 0.05 v = n-2 giving v = 5 From tables the critical point is 0.7545 0.82 > 0.7545 We reject Ho and conclude: We are 95% confident that there is a linear
relationship between X and Y
10/03/2003 [email protected]
Example 3
Is there an obvious relationship between X and Y
Y = X+2 This is a Perfect Relationship What will r be r will be equal to 1
X 3 4 5 6 7 8
Y 5 6 7 8 9 10
10/03/2003 [email protected]
Set up the data table
Subject Y X XY X squared Y squared
A 5 3 15 9 25
B 6 4 24 16 36
C 7 5 35 25 49
D 8 6 48 36 64
E 9 7 63 49 81
F 10 8 80 64 100
Total 45 33 265 199 355
10/03/2003 [email protected]
Calculate r
15.17
5.17
5.175.17
5.17
yyxx
xy
SS
Sr
Perfect Positive Linear Relationship
10/03/2003 [email protected]
Back to Example 1
In our original example with the student results we drew a scatter plot.
From the diagram it looked as if there was a probable positive linear relationship
To be sure we need to calculate r Using a significance level of alpha = 0.05 we
will test the claim that there is no linear correlation between Maths results and Physics results
10/03/2003 [email protected]
Create a data table
Student X Y XY X squared Y squared
A 41 36 1476 1681 1296
B 37 20 740 1369 400
C 38 31 1178 1444 961
D 39 24 936 1521 576
E 49 37 1813 2401 1369
F 47 35 1645 2209 1225
G 42 42 1764 1764 1764
H 34 26 884 1156 676
I 36 27 972 1296 729
J 48 29 1392 2304 841
K 29 23 667 841 529
Total 440 330 13467 17986 10366
10/03/2003 [email protected]
Apply the formulae
26711
33044013467
46611
33010366
38611
44017986
2
2
xy
yy
xx
S
S
S
10/03/2003 [email protected]
Correlation Coefficient is
63.012.424
267
466386
267
yyxx
xy
SS
Sr
92
5.0
vnv
10/03/2003 [email protected]
Conclusion
From the tables the critical point is 0.6021 r = 0.63 0.63 > 0.6201 We Reject the claim and conclude that There is a Positive Linear Relationship
between results in Maths and results in Physics
10/03/2003 [email protected]
What is Regression?
Regression Analysis is used for prediction It allows us to predict the value of one
variable given the value of another variable It gives us an equation that uses one variable
to help explain variation in another In this course we deal with Simple Linear
Regression
10/03/2003 [email protected]
Simple Linear Regression
First step in determining a relationship was drawing a scatter plot
If a possible relationship was shown we found the strength of the relationship by calculating the correlation coefficient r
The next stage is to calculate an equation which best describes the relationship between the two variables
This line is called the Regression Line
10/03/2003 [email protected]
What is the ‘best fit’ lineExample 1
10/03/2003 [email protected]
‘Least Squares’ best fit line
We can have several lines of the form
ii bxay
We want ‘best’ least residuals
ii xbay ˆˆˆ
10/03/2003 [email protected]
Least Squares estimates
ba ˆ,ˆ are the least squares estimates of ba,
xx
xy
S
Sb ˆ Closely related to r
xbya ˆˆ
10/03/2003 [email protected]
Example 2
1387
914977511
1
n
yxyxS
n
ii
n
ii
i
n
iixy
1127
49455
2
2
1
1
2
n
x
xS
n
iin
iixx
10/03/2003 [email protected]
Verifying the equation is correct
5.176
4533265
n
i
n
i
n
iii
iixy n
yxyxS
5.176
33199
2
2
1
1
2
n
x
xS
n
iin
iixx
10/03/2003 [email protected]
Regression line
692.0386
267ˆ xx
xy
S
Sb
32.211
440692.0
11
330ˆˆ xbya
ii xy 692.032.2ˆ
10/03/2003 [email protected]
Example 1 continued
If a student received a grade of 53 in Maths, what would the expected grade be in Physics
We use the Regression line in order to predict the Physics result
996.3853692.032.2ˆ
692.032.2ˆ
y
xy ii
10/03/2003 [email protected]
Graphing The Regression Line