Upload
marsha-gibson
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Available at http://planet.uwc.ac.za/nisl
Chapter 13
Multivariate
Analysis
BCB 702:Biostatistics
http://he
i.unige
.ch/~e
lkhou
99/im
ageS
C7.JPG
Usually involves situations where there are two or more dependent (response) variables
Examines the relationships or interactions of these variables
Takes into account the fact that: Variables may not be independent of each other
Performing multiple comparisons increases the risk of making a Type I error
Simply performing a series of multiple univariate tests would not be appropriate and would give false results
What is Multivariate Analysis?What is Multivariate Analysis?
Include:
Multivariate Analysis of Variance (MANOVA)
Discriminant Function Analysis (DFA)
Principal Components Analysis (PCA)
Factor Analysis
Cluster Analysis
Canonical Correlation Analysis
Multidimensional Scaling
Types of Multivariate TestsTypes of Multivariate Tests
Extension of the ANOVA
Examines two or more response variables
Combines multiple response variables into a single new variable to maximise the differences between the treatment group means
Obtain a multivariate F value – Wilks’ lambda (value between 0 and 1) is most commonly used
If the overall test is significant, we can then go on to examine which of the individual variables contributed to the significant effect
MANOVAMANOVA
A researcher has collected a certain species of lizard from three different island populations. Each island represents a different eco-zone. He wishes to test whether lizards from different islands differ in their morphology and abilities, so he collects 10 lizards from each island and measures their body length, limb length and running speed.
Independent variable: Island of origin
Dependent variables: Body length Limb length Running speed
MANOVA: ExampleMANOVA: Example
http://ww
w.flic
kr.com
/pho
tos/wy
scan
/147
3985
3/
From the analysis, we get:
The model shows a significant difference in lizards from the three islands (p <0.001)
MANOVA: ExampleMANOVA: Example
Wilks’ lambda
F df (num)
df (den)
p
0.1732 11.689 6 50 <0.001
Limb length and running speed differ significantly between lizards from different islands. There is no difference in body length
MANOVA: ExampleMANOVA: Example
Source Dependent variable
Sum of squares
df Mean square
F p
Island Body length 10.467 2 5.233 0.988 0.385
Limb length 62.600 2 31.300 8.237 0.002
Running speed 8.261E-04 2 4.130E-04 11.004 <0.001
Error Body length 143.000 27 5.296
Limb length 102.600 27 3.800
Running speed 1.013E-03 27 3.753E-05
Discriminant Function Analysis (DFA) is used to determine which variables predict naturally occurring groups in data
Several independent variables and one non-metric (grouping) dependent variable
MANOVA in reverse
DFA organises the original independent variables into a set of canonical correlations, which are linear combinations of the original variables
Discriminant Function AnalysisDiscriminant Function Analysis
The first canonical correlation explains the most variation in the data set, the second canonical correlation explains the most variation that is left over, and so on
Three steps:
1. Look for an overall significant effect using a multivariate F test (Wilks’ lambda)
2. Examine the independent variables individually for differences in mean by group
3. Classification
Discriminant Function AnalysisDiscriminant Function Analysis
Populations of a sunflower species grow at four sites (two in riparian habitat and two in serpentine habitat) that differ in soil chemistry and water availability. Various measures of soil chemistry were taken in order to determine which of these variables can be used to distinguish among sites. (Sambatti & Rice, 2006)
Independent variables: Ca Mg P Organic matter (OM) pH
Dependent variable: Site
DFA: ExampleDFA: Example
http://en
.wikipe
dia.org/wiki/Imag
e:Su
nflowe
rs.jpg
The overall model was significant (p <0.001), meaning that sites differ in soil nutrients
First canonical axis: The riparian habitats (particularly R1) have more OM and a lower pH
Second canonical axis: The two serpentine habitats (S1 and S2) have lower levels of Ca and P and slightly higher levels of Mg than riparian sites
Canonical Centroid plotDFA: ExampleDFA: Example
The goal of PCA is to reduce complex data sets containing a large number of variables to a lower dimension in order to see the relationships of variables more clearly
It computes a new set of composite variables called principal components (PCs)
Each PC explains a certain proportion of the variation in the data set, with PC1 explaining the most amount of variation, PC2 the next most amount of variation, and so on
Principal Components AnalysisPrincipal Components Analysis
Similar to Principal Components Analysis
Used to uncover underlying trends and relationships in large and complex data sets
Works on a correlation matrix of variables
Combines original variables into a smaller set of factors
Variables are correlated with each other due to their correlation with a common factor
Factor AnalysisFactor Analysis
Cluster analysis encompasses a number of different methods
Used to organize or group data according to similarities
There is no real dependent variable – cluster analysis does not attempt to explain why groups (clusters) exist
Often used in species taxonomy BA
CDE
Cluster AnalysisCluster Analysis
Used when variables fall naturally into two groups (a group of dependent variables and a group of independent variables)
Tries to determine if there are linear relationships between the two sets of variables
It creates functions for each group, such that the correlation between the functions of each group is maximised
In this way, a combination of variables from the first group predicts a combination of variables from the second group
Canonical Correlation AnalysisCanonical Correlation Analysis
Analyses pairwise similarities between variables
Only applicable to continuous data
Plots variables graphically to provide a visual representation of the pattern of proximity of a set of variables (objects)
Objects plotted close together are relatively similar to each other, while objects plotted far apart are relatively dissimilar
Multidimensional ScalingMultidimensional Scaling