Upload
zoe-kelley
View
222
Download
6
Tags:
Embed Size (px)
Citation preview
Exploring Relationships:
Correlations & Multiple Linear Regression
Dr James Betts
Developing Study Skills and Research Methods (HL20107)
Lecture Outline:
•Correlation Coefficients
•Coefficients of Determinations
•Prediction & Regression
•Multiple Linear Regression
•Assessment Details.
Statistics
Descriptive Inferential
Correlational
Relationships
GeneralisingOrganising, summarising & describing data
Significance
Correlation• A measure of the relationship (correlation) between
interval/ratio LOM variables taken from the same set of subjects
• A ratio which indicates the amount of concomitant variation between two sets of scores
• This ratio is expressed as a correlation coefficient (r):
Perfect Negative
Relationship
Perfect Positive
RelationshipNo
Relationship+_
Strong Moderate Weak StrongModerateWeak
-1 +10 +0.7+0.3+0.1-0.7 -0.3 -0.1
Correlation Coefficient & ScatterplotsDirection
Variable X (e.g. VO2max).
Var
iabl
e Y
(e.g
. 10
km r
un ti
me)
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. Exe
rcis
e C
apac
ity)
.
Correlation Coefficient & Scatterplots
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. Exe
rcis
e C
apac
ity)
.Variable X (e.g. Age)
Var
iabl
e Y
(e.g
. Str
engt
h)
Form
Correlation Coefficient & Scatterplots
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. Exe
rcis
e C
apac
ity)
.
Significance
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. 100
m S
prin
t tim
e)
.
Correlation Coefficient & Scatterplots
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. Exe
rcis
e C
apac
ity)
.Variable X (e.g. VO2max)
Var
iabl
e Y
(e.g
. 100
m s
prin
t tim
e)
.
Significance
Methods of Calculating r• Any method of calculating r requires:
– Homoscedacity (i.e. equal scattering)– Linear data (curvilinear data requires eta η)
• Parametric data (i.e. raw data >ordinal LOM and either
normal distribution or large sample) permits the use of ‘Pearson’s Product-Moment Correlation’
• If raw data violates these assumptions then use ‘Spearman’s Rank Order Correlation’ instead.
X = Alcohol Units Y = Skill Score X2 Y2 XY
15 4 225 16 60
14 6 196 36 84
10 4 100 16 40
9 8 81 64 72
8 7 64 49 56
8 8 64 64 64
7 10 49 100 70
6 9 36 81 54
4 14 16 196 56
2 12 4 144 24Totals=
Pearson’s Product-Moment Correlation
r = nXY-(X)(Y)
[nX2-(X)2] [nY2-(Y)2
Pearson’s Product-Moment Correlation
X = Alcohol Units Y = Skill Score Rank X Rank Y D D2
15 4 10 1.5 8.5 72
14 6 9 3 6 36
10 4 8 1.5 6.5 42
9 8 7 5.5 1.5 2.3
8 7 5.5 4 1.5 2.3
8 8 5.5 5.5 0 0
7 10 4 8 4 16
6 9 3 7 4 16
4 14 2 10 8 64
2 12 1 9 8 64Total=
Spearman’s Rank-Order Correlation
Spearman’s Rank-Order Correlation
r = 1 - 6D2
n(n2-1)
Correlations
1 -.860**
.001
10 10
-.860** 1
.001
10 10
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
VAR00001
VAR00002
VAR00001 VAR00002
Correlation is significant at the 0.01 level (2-tailed).**. Correlations
1.000 -.927**
. .000
10 10
-.927** 1.000
.000 .
10 10
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
VAR00001
VAR00002
Spearman's rhoVAR00001 VAR00002
Correlation is significant at the 0.01 level (2-tailed).**.
SPSS Correlation Outputs
Coefficient of Determination (r2 x 100)• AKA ‘variance explained’, this figure denotes how much
of the variance in Y can be explained/predicted by X
e.g. to predict long jump distance (Y) from maximum sprint speed (X)
r = 0.8
r2 = 64%
Y X
Correlation versus Regression
• By attempting to predict one variable using another, we are now moving away from simple correlation and moving into the concept of regression
Correlation =
Regression =
Linear Regression • The equation for a linear relationship can be expressed as:
Y= a + bX -where: a = the y intercept; and b = the
gradient
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.
g. E
xerc
ise
Cap
acit
y)
.
SPSS Regression OutputLinear Regression
5.00 10.00 15.00
AlcoholUnits
5.00
7.50
10.00
12.50
Ski
llSco
re
SkillScore = 13.92 + -0.69 * AlcoholUnitsR-Square = 0.74
Extrapolation versus Interpolation
Variable X (e.g. VO2max)
Var
iabl
e Y
(e.
g. E
xerc
ise
Cap
acit
y)
.
Remember that the accuracy of your equation depends upon the
linear relationship you observed ?
Interpolation =
Extrapolation =
Multiple Linear Regression • We saw earlier how maximum sprint speed (X) can
predict/explain 64% of variance in long jump distance (Y) Y X
r2 = 64%
…but can Y be predicted any more effectively using more than one independent variable (i.e. X1, X2 , X3, etc)?
Multiple Linear Regression • However, we can often predict Y effectively just using a
specific subset of X variables (i.e. a reduced model) Y X1
X2 Event Experience
Multiple Linear Regression • ‘Best Subset Selection Methods’ involve calculation of r for every possible combination of IVs• Stepwise regression methods involve gradually either adding or removing variables and monitoring the impact of each action on r.
– Standard methods add and remove variables– Forward selection methods begin with 1 IV and add more– Backwards elimination methods begin with all IVs and remove
• The order in which IVs are added/removed is critical as the variance explained solely by any one will be entirely dependent upon the presence of others.
Model Summary
.860a .740 .708 1.74391Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), AlcoholUnitsa.
Excluded Variablesb
.072a .374 .720 .140 .994
.208a 1.150 .288 .399 .950
BodyMass
Age
Model1
Beta In t Sig.Partial
Correlation Tolerance
CollinearityStatistics
Predictors in the Model: (Constant), AlcoholUnitsa.
Dependent Variable: SkillScoreb.
SPSS Multiple Linear Regression Output
Summary: Exploring Relationships•The relationship between two variables can be expressed as a correlation coefficient (r)
•The coefficient of determination (r2) denotes the % of one variable that is explained by another
•Linear regression can provide an equation with which to predict one variable from another
•Multiple linear regression can potentially improve this prediction using multiple predictor variables.
Coursework Project (40 % overall grade)• Your coursework will require you to address
2 out of 3 research scenarios that are available on the unit webpage
• For each of the 2 scenarios you will need to:
– Perform a literature search in order to provide a
comprehensive introduction to the research area
– Identify the variables of interest and evaluate the
research design which was adopted
– Formulate and state appropriate hypotheses…
• Cont’d…– Summarise descriptive statistics in an appropriate
and well presented manner– Select the most appropriate statistical test with
justification for your decision– Transfer the output of your inferential statistics
into your word document– Interpret your results and discuss the validity and
reliability of the study– Draw a meaningful conclusion (state whether
hypotheses are accepted or rejected).
Coursework Details (see unit outline)• 2000 words maximum (i.e. 1000 for each)
• Any supporting SPSS data/outputs to be appended
• To be submitted on Thursday 6th May
Assessment Weighting
Evaluation & Analysis (30 %)
Reading & Research (20 %)
Communication & Presentation (20 %)
Knowledge (30 %)
Coursework Details• All information relating to your coursework (including
the relevant data files) are accessible via the module web page:
http://people.bath.ac.uk/jb335/Y2%20Research%20Skills%20(FH200107).html
Web address also referenced on shared area
Electronic copy to be included with submission.
Any further questions/problems can be raised in the CW revision lecture/labs after Easter
Timed Practical Computing Exercise (20 % overall grade)
• This test will involve analysis/interpretation of the resultant data assessed via short answer questions
• Practice session Wednesday 14th April
• Duration = 80 min (2 groups)
• I will Email specific details after Easter.
bothIVs
unpaired
BothIVs
paired
>2 variables
2 variables
>2 groups
2 groups
>2 observations
2 observations
>1 observed frequency
1observedfrequency
Looking for differences between categories/frequencies?
(i.e. nominal LOM)
Goodnessof Fit χ2
Looking for differences within the same group
of subjects? (i.e. paired data)
Looking for differences between 2 separate groups of subjects? (i.e. unpaired data)
Looking for relationships?Looking for differences
with >1 independent variable?
Contingency χ2
Pairedt-test
1-way paired
ANOVA
Independent t-test
1-wayunpairedANOVA
Pearson’sr
Multiple Linear
Regression
2-waypaired
ANOVA
2-wayunpairedANOVA
1 IV paired1 IV
unpaired
2-waymixed model
ANOVA
Wilcoxon test
Friedman’stest
Mann-Whitneytest
KruskalWallis
test
Spearman’sr
Post-Hoc Tests
non-parametric
Start Here
If multiple DVs are involved then use MANOVA