16
Available at http://planet.uwc.ac.za/nisl Chapter 13 Multivariate Analysis BCB 702: Biostatistics http://hei.unige.ch/~elkhou99/imageSC7.JPG

Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Embed Size (px)

Citation preview

Page 1: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Available at http://planet.uwc.ac.za/nisl

Chapter 13

Multivariate

Analysis

BCB 702:Biostatistics

http://he

i.unige

.ch/~e

lkhou

99/im

ageS

C7.JPG

Page 2: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Usually involves situations where there are two or more dependent (response) variables

Examines the relationships or interactions of these variables

Takes into account the fact that: Variables may not be independent of each other

Performing multiple comparisons increases the risk of making a Type I error

Simply performing a series of multiple univariate tests would not be appropriate and would give false results

What is Multivariate Analysis?What is Multivariate Analysis?

Page 3: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Include:

Multivariate Analysis of Variance (MANOVA)

Discriminant Function Analysis (DFA)

Principal Components Analysis (PCA)

Factor Analysis

Cluster Analysis

Canonical Correlation Analysis

Multidimensional Scaling

Types of Multivariate TestsTypes of Multivariate Tests

Page 4: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Extension of the ANOVA

Examines two or more response variables

Combines multiple response variables into a single new variable to maximise the differences between the treatment group means

Obtain a multivariate F value – Wilks’ lambda (value between 0 and 1) is most commonly used

If the overall test is significant, we can then go on to examine which of the individual variables contributed to the significant effect

MANOVAMANOVA

Page 5: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

A researcher has collected a certain species of lizard from three different island populations. Each island represents a different eco-zone. He wishes to test whether lizards from different islands differ in their morphology and abilities, so he collects 10 lizards from each island and measures their body length, limb length and running speed.

Independent variable: Island of origin

Dependent variables: Body length Limb length Running speed

MANOVA: ExampleMANOVA: Example

http://ww

w.flic

kr.com

/pho

tos/wy

scan

/147

3985

3/

Page 6: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

From the analysis, we get:

The model shows a significant difference in lizards from the three islands (p <0.001)

MANOVA: ExampleMANOVA: Example

Wilks’ lambda

F df (num)

df (den)

p

0.1732 11.689 6 50 <0.001

Page 7: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Limb length and running speed differ significantly between lizards from different islands. There is no difference in body length

MANOVA: ExampleMANOVA: Example

Source Dependent variable

Sum of squares

df Mean square

F p

Island Body length 10.467 2 5.233 0.988 0.385

Limb length 62.600 2 31.300 8.237 0.002

Running speed 8.261E-04 2 4.130E-04 11.004 <0.001

Error Body length 143.000 27 5.296

Limb length 102.600 27 3.800

Running speed 1.013E-03 27 3.753E-05

Page 8: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Discriminant Function Analysis (DFA) is used to determine which variables predict naturally occurring groups in data

Several independent variables and one non-metric (grouping) dependent variable

MANOVA in reverse

DFA organises the original independent variables into a set of canonical correlations, which are linear combinations of the original variables

Discriminant Function AnalysisDiscriminant Function Analysis

Page 9: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

The first canonical correlation explains the most variation in the data set, the second canonical correlation explains the most variation that is left over, and so on

Three steps:

1. Look for an overall significant effect using a multivariate F test (Wilks’ lambda)

2. Examine the independent variables individually for differences in mean by group

3. Classification

Discriminant Function AnalysisDiscriminant Function Analysis

Page 10: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Populations of a sunflower species grow at four sites (two in riparian habitat and two in serpentine habitat) that differ in soil chemistry and water availability. Various measures of soil chemistry were taken in order to determine which of these variables can be used to distinguish among sites. (Sambatti & Rice, 2006)

Independent variables: Ca Mg P Organic matter (OM) pH

Dependent variable: Site

DFA: ExampleDFA: Example

http://en

.wikipe

dia.org/wiki/Imag

e:Su

nflowe

rs.jpg

Page 11: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

The overall model was significant (p <0.001), meaning that sites differ in soil nutrients

First canonical axis: The riparian habitats (particularly R1) have more OM and a lower pH

Second canonical axis: The two serpentine habitats (S1 and S2) have lower levels of Ca and P and slightly higher levels of Mg than riparian sites

Canonical Centroid plotDFA: ExampleDFA: Example

Page 12: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

The goal of PCA is to reduce complex data sets containing a large number of variables to a lower dimension in order to see the relationships of variables more clearly

It computes a new set of composite variables called principal components (PCs)

Each PC explains a certain proportion of the variation in the data set, with PC1 explaining the most amount of variation, PC2 the next most amount of variation, and so on

Principal Components AnalysisPrincipal Components Analysis

Page 13: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Similar to Principal Components Analysis

Used to uncover underlying trends and relationships in large and complex data sets

Works on a correlation matrix of variables

Combines original variables into a smaller set of factors

Variables are correlated with each other due to their correlation with a common factor

Factor AnalysisFactor Analysis

Page 14: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Cluster analysis encompasses a number of different methods

Used to organize or group data according to similarities

There is no real dependent variable – cluster analysis does not attempt to explain why groups (clusters) exist

Often used in species taxonomy BA

CDE

Cluster AnalysisCluster Analysis

Page 15: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Used when variables fall naturally into two groups (a group of dependent variables and a group of independent variables)

Tries to determine if there are linear relationships between the two sets of variables

It creates functions for each group, such that the correlation between the functions of each group is maximised

In this way, a combination of variables from the first group predicts a combination of variables from the second group

Canonical Correlation AnalysisCanonical Correlation Analysis

Page 16: Available at  Chapter 13 Multivariate Analysis BCB 702: Biostatistics elkhou99/imageSC7.JPG

Analyses pairwise similarities between variables

Only applicable to continuous data

Plots variables graphically to provide a visual representation of the pattern of proximity of a set of variables (objects)

Objects plotted close together are relatively similar to each other, while objects plotted far apart are relatively dissimilar

Multidimensional ScalingMultidimensional Scaling