View
37
Download
0
Category
Tags:
Preview:
DESCRIPTION
Step three: statistical analyses to test biological hypotheses. General protocol continued. Biological hypotheses and statistical tests. Hypotheses driven by Biology Statistics depend on data and hypotheses NO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!! - PowerPoint PPT Presentation
Citation preview
Step three: statistical analyses to test biological hypothesesGeneral protocol continued
Biological hypotheses and statistical testsHypotheses driven by BiologyStatistics depend on data and hypothesesNO NEW STATISTICAL TOOLS ARE NEEDED FOR MORPHOMETRICS!!
Explanatory hypotheses: relative position of specimens in data space:relationship among specimens in data spaceConfirmatory hypotheses: compare groups, associate shape with other variables, etc.
Some hypotheses (shape related)How do populations and species differ?Does the observed variation generate a predictable pattern?Are there additional factors (ecological, evolutionary) correlated with variation?How does shared evolutionary history affect the observed patterns?
Hypotheses as statistical testsDo populations differ?Is there a predictable pattern?Correlated factors?Effect of phylogeny?
MANOVA, CVA
PCA, UPGMA
Regression, 2B-PLS
Comparative Method
Exploratory data analysisInvestigate data using only Y-matrix of shape variables (PWScores + U1,U2)Specimens are points in high-dimensional data spaceLook for patterns and distributions of pointsGenerate summary plot of data space (ordination)Look for relationships of points (clustering)
Ordination and dimension reductionVisualize high dimensional data space as succinctly as possible
Describe variation in original data with new set of variables (typically orthogonal vectors)Order new variables by variation explained (most least)Plot first few dimensions to summarize dataPrincipal Components Analysis (PCA) one approach (others include: PCoA, MDS, CA, etc.)
PCA: what does it do?Rotates data so that main axis of variation (PC1) is horizontalSubsequent PC axes are orthogonal to PC1, and are ordered to explain sequentially less variationThe goal is to explain more variation in fewer dimensions
PCA: interpretationsEigenvectors are linear combinations of original variables (interpreted by PC loadings of each variable)PCA PRESERVES EUCLIDEAN DISTANCES among objectsPCA does NOTHING to the data, except rotate it to axes expressing the most variation; it loses NO INFORMATION (if all PC vectors retained)If the original variables are uncorrelated, PCA not helpful in reducing dimensionality of data
PCA does not find a particular factor (e.g., group differences, allometry): it identifies the direction of most variation, which may be interpretable as a factor (but may not)
Example: leatherside chub
ClusteringData are dots in a high-dimensional space (Y-matrix)Can we connect to dots for groupings, where clusters represent groups of similar specimens?Cluster methods generate 1-dimensional view of relationships, based on some criterionClustering requires distance (or similarity) between points MANY different criteria
Clustering is algorithmic, not algebraic (i.e., it is a procedure, or set of rules for connecting data)
Clustering: UPGMA
Conclusions: exploratory methodsUseful tools for summarizing shape variationHelp you understand your data through visualizing variation (both ordination plots and cluster diagrams)Help describe relationships among specimens in terms of overall similarity
Confirmatory data analysisInvestigate data using shape variables (Y-matrix) and other (independent) variables (X-matrix)Test for patterns of shape variationIndependent variables determine type of statistical test
Types of independent variablesCategorical: variables delineating groups of specimens (e.g., male/female, species, etc.)Continuous: variables on a continuous scale (e.g., size, moisture, age, etc.)Different statistical methods for each
Some statistical testsCategorical: shape differences among groupsContinuous: relationship of variables and shapeContinuous: association of variables and shape
MANOVA
Mult. Regression
2B-PLS (2-Block Partial Least squares)MANOVA and multivariate regression are both GLM statistics (General Linear Models)
Group differences: MANOVAIs there a difference in shape between groups?Multivariate generalization of ANOVACompares variation within groups to variation between groupsSignificant MANOVA: Group means are different in shape
RW1-RW30 Utah chub
SourceSexLocSex X loc IL/SLSizeMANOVAWilks' Lambda 0.61907356 1.83 30 89 0.0159 Wilks' Lambda 0.75516916 0.96 30 89 0.5318Wilks' Lambda 0.10138762 1.40 180 533.33 0.0020Wilks' Lambda 0.00308619 3.26 240 706.35
MANOVA: post hoc testsPairwise comparisons using Generalized Mahalanobis Distance (D2 or D) Convert D2 T2 F to testFor experiment-wise error rate, adjust using Bonferroni: exp = / # comparisons
Discriminant analysis: CVA & DFACombination of MANOVA and PCATests for group differences (MANOVA)PCA of among-group variation relative to within-group variationSuggests which groups differ on which variablesCan classify specimens to groups
Special case: 2 groups= discriminant function analysis (DFA)
DFA/CVA: post-hoc testsFor DFA/CVA, compare difference among groups using Generalized Mahalanobis Distance (D2)Mahalanobis D2 is logical choice because CVA/DFA is MANOVA, and the PCA is relative to within-group variability (i.e., VCV standardized)Convert D2 T2 F to perform statistical testExperiment-wise error rate adjusted as before (i.e., adjusted )
Continuous variation: regressionIs there a relationship between shape and some other variable?Multivariate regression of shape on continuous variableSignificant regression implies shape changes as a function of other variable (e.g., size)
Example of shape on size in mountain sucker
Multivariate tests of significance: Statistic Value Fs df1 df2 Prob Wilks' Lambda: 0.34356565 22.822 36 430.0 3.580E-078 Pillai's trace: 0.65643435 22.822 36 430.0 3.580E-078 Hotelling-Lawley trace: 1.91065190 22.822 36 430.0 3.580E-078 Roy's maximum root: 1.91065190 22.822 36 430.0 3.580E-078
Test that kth root and those that follow are zero: k U Fs df1 df2 Prob 1 0.34356565 22.822 36 430.0 3.580E-078
Continuous variation: association 2B-PLSIs there an association between shape and some other set of variables (not causal)?Find pairs of linear combinations for X & Y that maximize the covariation between data setsLinear combinations are constrained to be orthogonal within each set (like PC axes) but NOT between data setsCalculations less complicated for 2B-PLS (because fewer mathematical constraints)Analogous to multivariate correlation
2B-PLS is called SINGULAR WARPS when shape is one or more of the data sets. Bookstein et al., 2003: J. of Hum. Evol.)
Resampling methodsMethods that take many samples from original data set in some specified way and evaluate the significance of the original based on these samplesResampling approaches are nonparametric, because they do not depend of theoretical distributions for significance testing (they generate a distribution from the data)Are very flexible, and can allow for complicated designs
Very useful in morphometrics, and can be used for:Testing standard designsTesting non-standard designsTesting when sample sizes small relative to # of variables
Randomization (permutation)Proposed by Fisher (1935) for assessing significance of 2-sample comparison (Fishers exact test)Fishers exact test: a total enumeration of possible pairings of dataRandomization can be used to determine most any test statistic ProtocolCalculate observed statistic (e.g., T-statistic): EobsReorder data set (i.e. randomly shuffle data) and recalculate statistic ErandRepeat many times to generate distribution of statisticPercentage of Erand more extreme than Eobs is significance level
Randomization: commentsRandomization EXTREMELY useful and flexible techniqueHow and what to resample depends upon data and hypothesisRegression and correlation: shuffle Y vs. XGroup comparison (e.g., ANOVA): shuffle Y on groupsSome tests (e.g., t-test) may depend on direction (1-tailed vs. 2-tailed)
Also useful when no theoretical distribution exists for statistic, or when design is non-standardThis is frequently the case in E&E studies
Step four: Graphical depiction of resultsStrength of landmark-based TPS approachCan view deformation of TPS grid among groups or with continuous variable
Superimposition
Effect of relative intestinal length: measure of trophic level
Long IL/SL3.0
Short IL/SL0.72
Effect of gradient on shape in mountain suckerLowHigh
Recommended