Upload
octavia-king
View
216
Download
2
Embed Size (px)
Citation preview
Cajo J. F. ter Braak
Biometris, Wageningen University and Research Centre
www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak
History of canonical correspondence analysis in ecology
correspondence analysis without chi-square and
the barycentric principle as ecological niche model
Biometrisquantitative methods in life and earth sciences
25 year
1986-2011
History of canonical correspondence analysis in ecology
Introduction: Papers 1986-88 & citation Example data, eigen equation & diagram
History of CA in ecology Theory, WA method From WA to CA, algorithm Transition formulas, but why optimal?
History of CCA in ecology my prehistory: absences don’t count 1,2,3,4,5 my history 1-3
Theory of CCA From WA to CA to CCA, algorithm Eigen equation and its origin My CCA derivation 1-2 Duality diagram CCA, triplets & biplots
Ordination diagram Description, Data, & example CCA in R
CCA, RC model & ideal point discriminant analysis Later landmarks Summary of CA-related methods in ecology Open questions
Canonical Correspondence Analysis (CCA) 25 years
New multivariate technique to relate community composition to known variation in the environment
Extends Correspondence Analysis (CA) Two steps (1. CA, 2. interpret axes) →one step (CCA) Ordination diagram: the pattern of community variation that can
be explained best by known environment
CCA = (M)CA with external linear restrictions on the row points
from the abstract:
1986
duality diagram of CCA (ACC)
1987
barycentric principle
vegetation
plant ecology
animal ecology
freshwater ecologymarine ecology
soil ecologymycologyother
Citations, Canoco and usage
3000+ citations to ter Braak (86,87,95) on CCA
1996 2000 2004 2008
050
100
150
200
250
300
cita
tions
Spider abundance Yn×m and environment Zn×p
n = 28 sites (pitfalls), m = 12 spider species , p = 6 environmental
variables
row/column order from CCA
CtB, Ecology 1986DATA:
discretized data
YT=
ZT=
sites
species
env. var.
Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik }n
m (e.g. abundances of
of m species in n sites)
Z quantitative and/or nominal predictor matrix {zij }np
(on p environmental variables)
Eigen equation: with
Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)
Looks like: canonical correlation analysis, but Dc/r ..?...
Special case: Z = In×n or indicator matrix correspondence analysis
Canonical Correspondence Analysis (CCA)?
Y~Z
eigen eq in appendix
(ZTY (Dc-1) YTZ – λZTDrZ)b =0
CtB, Ecology 1986
CCA ordination diagram (factorial plane)
CtB, Ecology 1986
Projection of species points on
BARE SAND arrow shows
approx. ranking of the species
centroids along this variable
biplot {u, c*} of Dc-1YT Z ={weighted average of species wrt
environment variables}
joint plot { x*, u } of Y as in CA
uk species
xi*
sites
c* env var
weighted least squares biplot
History of CA in ecology builds on (1)
Theory:
“a plant species does not grow when it is either too wet or too dry”
Liebig’s law: a species requires a minimum amount of aresource (e.g. N): agriculture
Shelford’s law of tolerance (1919): but also does not tolerate more than a certain maximum yik
→resource
Ecological niche: region where species actually grows
Niches vary among species
preference model
History of CA in ecology builds on (2)
Method of weighted averaging (WA) to obtain preference or indicator value of a species wrt to a
physical gradient, e.g. moisture WA of moisture values at sites where species occurs
estimate of moisture value at a site WA of indicator values of species that occur there
Gause (1930), Ellenberg (1948), Whittaker (1948)
Idea: iterate to replace a manifest gradient by the best latent one
→ Reciprocal averaging (Hill, 1973,74) == CA
“le principe barycentric” both ways
Independently: Roux&Roux, RSA,1967
→moisture
uk = centroid , weighted average wrt moisture, “optimum”
From weighted averaging (WA) to CA (1)
• WA: uk = i yik moisturei /i yik
yik
→moisture
i sitesk species
yik ≥ 0
→moisture
uk = centroid, weighted average wrt moisture, “optimum”
From weighted averaging (WA) to CA (1)
• WA: uk = i yik moisturei /i yik
For presence/absence data yik is 1 or 0.
Example: species k found at sites with moisture values 20, 30 and 40 and nowhere else, then
uk = (1*20+1*30+1*40)/(1+1+1)= 90/3 = 30
→moisture
i sitesk species
yik ≥ 0
yik
Reverse (calibration): weighted averaging of species optima
From training data or literature, the ‘optima’ {uk} of 200 species wrt a moisture scale [0-100] are known.
Then an estimate of the moisture value at a site is obtained from the species composition at the site by the weighed average:
xi = k yikuk/ k yik -site score is (prop. to) WA of species scores
where yik is abundances (amounts,yik ≥ 0 ) of these species.
Example: Let site i contain the species with optima 75, 80, 85Then it moisture value is estimated as xi = (1*75+1*80+1*85)/(1+1+1)=80 (the average of the optima of species present)
If the abundances (amounts) of these species are 4, 1,1 then
xi = (4*75+1*80+1*85)/(4+1+1)= 465/6=77.5 (the weighted average of the optima of species present)
Algorithm CA
• Start with arbitrary site scores { xi } with zero mean
• Calculate new species scores = WA of site scores:
uk = i yikxi /i yik
• Calculate new site scores = WA of species scores:
xi = k yikuk / k yik
• Remove arbitrariness in scaling by standardizing the site scores
• Stop on convergence, i.e. when site scores stay the same after cycle
WA = Weighted Averaging; RA = Reciprocal Averaging
(Hill 1973)
==Two way weighted summation
1- bk = i yik xi
xi = k yikbk
eigenvalue >0 (prop. to variance
explained)
→Transition formulae
Correspondence analysis == Two way Weighted Averaging
1- uk = i yikxi / i yik -species score is (prop. to) WA of site
scores
xi = k yikuk / k yik -site score is (prop. to) WA of species
scores
l eigenvalue (between 0 and 1) = scaling of the axis
But what does it optimize?
(Hill 1973)
For comparison
Principal components (PCA)
Reciprocal linear regression
bk = i yik xi /i xi2
xi = k yikbk /k bk2
But what does it optimize? Seriation: recovery of a Petrie matrix, consecutive 1’s Relation with canonical correlation: max.
correlation of row and column quantification Finds block structures in two-way data → cluster analysis
(TWINSPAN), spectral clustering Also: bad features
sensitivity to rare species arch effect (Guttman effect)
Hill 1974, Heiser 1981
truthCA
DECORANA, 1979 Hill & Gauch 1980
How was CCA derived? My prehistory Benzécri’s 1973 L’analyse des donneés I and II with Hans MSc Newcastle (UK, ’79-80), learned ML eqs from RL Plackett
contacts with:
Colin Prentice (NMDS) and Mark Hill (CA/RA/DECORANA) Mark did not like my PCA biplot and diversity paper (Ecology ’83).
Ecology is not linear... Differences in niche location create diversity! See DECORANA
Second Gifi course Leiden (’81):J de Leeuw/W Heiser From ’81, for ecologists, I studied properties of
Weighted Averaging vs ML in ecological niche models (Gaussian model) (ter Braak Vegetatio ’86, Math. Biosc. ’86)
Reciprocal Averaging (CA) vs ML (ter Braak Biometrics ’85)
When might WA be (nearly) as good as ML?
1979-1984
Absences don’t count in...weighted averaging
“telephone poles have an optimum of pH of 5.5”
h(x|y>0) = probability density function of x given species presence
f(y>0|x) = occurrence probability given x
g(x) = (prior) pdf of x
Bayes’theorem
h(x|y>0) g(x) f(y>0|x)
g(x) is same for all species, so perhaps distinction does not matter in MVA...
response function in logistic regression uses
WA/CA uses
Greig-Smith 1983
Absences don’t count in... estimating optimaRegression and WA:
a) response functions for species A and B:
b) x~ Uniform or even, g(x) ~ 1
c) But for this g(x) reversal of optima
So be warned, use GLM ...
ter Braak, Vegetatio 1986
a)
b)
c)
WA for
g(x)
Absences don’t count in... estimating site conditions
Calibration and WA:
Find a model such that the weighted average
xi = k yikuk / k yik is as efficient as maximum likelihood (ML): species packing model
μik= ckexp{-½ ( xi-uk )2 /tk }; yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-tk constant
ter Braak, Math. Biosc. 1986
WA for
Absences don’t count: CA vs Gaussian ordination
Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictor x
μik= ckexp{-½ ( xi-uk )2 /tk }; yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-xi uniform over large interval B in A
-tk constant
ter Braak Biometrics 1985
Unimodal model
How was CCA derived? My history (1) Preparing with Colin Prentice in Uppsala (1984)
a Theory of Gradient Analysis, search for
“something like canonical correlation for niche models”
Idea: linear constraints on scores in Gaussian ordination and
approximate the ML equations, as I did for CA (Oct 1984)
1985: linear combination of Z that best separates species niches
Niche: region in x-space which species occurs (yik >0)
1984-5
How was CCA derived? My history (2) Quick first draft → report (1985) sent around. Jan de Leeuw commented quickly:
CCA is a relatively simple special case of HOMALS (or, if you like CANALS). [YES!]
this suggests directly which geometry Gifi would use. Nothing bilinear, nothing biplot, just centres of gravity [YES, for nominal predictors]
Extending DECORANA (Hill, 1979) to include CCA → CANOCO v1.0 ’85 and later PCA & redundancy analysis
Help from Onno van Tongeren and Petr Smilauer for
CANOCO v2.0 1985; v3.0 1990 (permutation testing), v4.0 1998 (1st Windows version), v4.5 2002 (2nd Windows version) Ambassadors: Colin Prentice, John Birks, Paul van de Brink
1985 & Canoco
How was CCA derived? My history (3) Partial CCA at 1st IFCS in Aachen (Germany): Y. Escoufier stood up and ..... disliked CCA and ... invited me to
Montpellier to work with JD Lebreton, D Chessel and P Sabatier, who independently invented CCA just a few months after I did.
I tried to bridge the gap: lectured in English with only French references learned duality diagrams
Later found : RH Green, Ecology 1971,1974: multi-group discriminant analysis -
a precursor of CCA (lacking environmental arrows)
Interest lost in same time that CA got popular via Mark Hill (1973-81)!
Lets celebrate now 40 years of CCA!
1986-France
• WA: uk = i yik moisturei /i yik
How well does moisture explain the data?
From weighted averaging (WA) to CA (2)
δ = dispersion of species scores
δ = k y+k uk 2 / y++ wrt standardized env. var.
uk = centroid , weighted average wrt env. var., “optimum”
i sitesk species
yik ≥ 0
yik
=B/T
From weighted averaging (WA) to CA (3)
• WA: uk = i yik moisturei /i yik
{xi} = Site scores that best separate the species curves as measured by the dispersion of the {uk}
• CA: uk = i yik xi /i yik
uk = Species score, centroid, weighted average “optimum”
i sitesk species
yik ≥ 0
yik
Arch effect: folded first axis has also high δ2
From weighted averaging (WA) to CCA (4)
• WA: uk = i yik moisturei /i yik
{xi} = Site scores that are the linear combination of environmental variables {zj} that best separate the
species curves as measured by the dispersion of the {uk}
• CCA: xi = i cj zij uk = i yik xi /i yik
uk = Species score, centroid, weighted average “optimum”
i sitesk species
yik ≥ 0
yik
In multigroup disciminant: max B/T
*Site weights = { yi+ } but why?
Algorithm CA• Start with arbitrary site scores with zero mean
• Calculate new species scores = WA of site scores
• Calculate new site scores = WA of species scores
• Remove arbitrariness in scaling by standardizing the site score
• Stop on convergence, i.e. when site scores stay the same after cycle
WA = Weighted averaging; (C)CA = (Canonical)
Correspondence Analysis
and constrain the site scores by a weighted* multiple regression on the env. variables; take fitted values as new scores
C
Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik }n
m (e.g. abundances of
of m species in n sites)
Z quantitative and/or nominal predictor matrix {zij }np
(on p environmental variables)
Eigen equation: with
Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)
Looks like: canonical correlation analysis, but Dc/r ..?...
Special case: Z = In×n or indicator matrix correspondence analysis
Canonical Correspondence Analysis (CCA)?
Y~Z
eigen eq in appendix
(ZTY (Dc-1) YTZ – λZTDrZ)b =0
CtB, Ecology 1986
Origin of CCA - Why is this equation appealing?
Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictor
x = Zb (b unknown)
μik= ckexp{-½ ( xi-uk )2}; yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-xi uniform over large interval B in A
As in CA, but now with linear restrictions
ter Braak 1986,1987
CCA derivation (1): ML eqs of equi-widthGaussian response model with latent predictor
xi = j zij bj {bj } to be estimated
uk = i yikxi /y+k – [i (xi - uk )μik /y+k ] (A.1)
i zij [k yik (xi - uk )] = i [k (xi - uk ) μik ] zij (A.2)
Under the conditions
k (xi - uk ) μik ]≈0 and i (xi - uk )μik ≈-λ* uk y+k
we obtain
λ uk = i yikxi /y+k (λ = 1- λ*)
i zij [k yik (xi - uk )]=0
CtB, Ecology 1986
(CtB,Biometris 1985)
CCA derivation (2) : Transition formulas of CCA
From:
xi = j zij bj
λ uk = i yikxi /y+k
i zij [k yik (xi - uk )]=0
λ uk = i yikxi /y+k
xi*= k yikuk / yk+
b = (ZTRZ )-1ZTRx*
x = Zb
In matrix notation: u = Dc
-1YT xx* = Dr
-1Yub = (ZTDrZ )-1ZTDrx*
x = Zb
b = (ZTDrZ )-1ZTDrx*
= (ZTDrZ )-1ZTYu
= (ZTDrZ )-1ZTY Dc-1YT x-1
=(ZTDrZ )-1ZTY Dc-1YT Zb-1
(ZTY (Dc-1) YTZ – λZTDrZ)b =0
CtB, Ecology 1986
Duality diagrams and transition formulas
Duality diagrams and transition formulas
CCA, PCA-triplets and biplots
fitted contingency ratios
fitted table of WA’s
low rank regression coefs
Ordination diagram (factorial plane)
Two sets of sites (row) scores x derived from Z (environmental data) x* derived from Y (species data)[chosen to be closer to Y]
Plot x*, u and interset correlations, c* = cor (Z,x* ) CA type of joint plot { x*, u } of Y WLS-biplot {u, c*} of Dc
-1YT Z ={weighted average of species wrt environment variables}
For nominal predictors: class point at centroid of x* CA type plot, but no example given except in Canoco
manual
CtB, Ecology 1986
(instead of b)
Dune meadow data
n = 20 sites (meadows),
m = 30 plant species ,
p = 8 environmental variables
Predictors mix of quantitative (3) nominal (1 with 4 classes BF,SF,NM,HF)
→ class centroids (squares) pCCA uses all power of regression
factors, interactions between pred.
row/column order from CCA
CtB, Vegetio,1987
YT=
ZT=
discretized data
CCA Dune meadow biplot of species’ centroids (WA) w.r.t. Manure
-1.0 +1.0
-1.0
+1.0
MoistureA1 hor
NM
BF
Manure
HF
Use
SF
14......
15......
16......
5.......
19......20......
6.......
13......
2.......7.......
1.......
8.......
10......
12......
11......
4.......3.......
9.......
17......18......
Pot pal
Cal cus
Ele pal
Ran fla
Emp nig
Tri pra
Che alb
Jun art
Ach mil
Bro hor
Pla lan
Rum ace
Agr sto
Sal rep
Bel per
Air pra
Ely repLol per
Vic lat
Hyp rad
Poa pra
Ant odo
Cir arv Alo gen
Sag pro
Jun buf
Poa tri
Bra rutTri rep
Leo aut
spec uk,
xi*
sites
c* env var
Barycentric principle
everywhere: projection points of
species are centroids of
the site projections on
the same environmental
variable
u = WA(x) λx= WA(u)
squares:
class centroids
CCA in R packages anacor, vegan, ade4 #e.g.: library(anacor) # Jan de Leeuw & Patrick Mair# CCAres = anacor(dune, row.covariates = dune.env)plot(res)
library(vegan)cca(Y~ Z)# also with a formula interface# e.g factors A, B and Ccca(Y~ A*B + Condition(C))
-4 -3 -2 -1 0 1
-4-3
-2-1
01
Joint plot
Dimension 1
Dim
ensi
on 2
213 4
16 6
1
8 5
17
15
10
11
9
18
3
20
14
19
12
7Belper
Empnig
Junbuf
Junart
Airpra
Elepal
Rumace
ViclatBrarutRanfla
Cirarv
Hyprad
LeoautPotpal
Poapra
Calcus
TripraTrirep
Antodo
Salrep
Achmil
Poatri
Chealb Elyrep
SagproPlalan
AgrstoLolper
AlogenBrohor
(C)CA & RC-model, ideal point discr. anal.
Goodman’s RC model Eyik = rickexp(xi T uk ) is equivalent
to Eyik = rickexp(d2(xi , uk )) with
d2(xi , uk )= (xi - uk )T(xi - uk )
which Ihm’ model B, and
with x=Zb identical to ideal point discriminant anal.
(C)CA is approx. to this model (ter Braak 1988, Takane ...)
Goodman showed this for small λ, ter Braak for large λ
See de Rooij, 2007 JCGS for plot scalings
bilinear model
distance model
Later landmarks Cointertia analysis (Dolédec & Chessel , 1994)
Linear restrictions, norm on b coefficients CCA-PLS (ter Braak & Verdonschot, 1995) Key: 1.Pre-transform:
2. apply PLS 3. post-transform!
Variance partitioning (Borcard, Legendre, Drapeau, 1992)
RLQ (Dolédec et al. 1996) ; doubly constrained CA (Lavourel...)
All (?) CA methods obtainable from standard methods by inflating the data matrices (super indicator matrices, in
ecology: unit = the individual instead of the site by pre and post transformation
→Regularized generalized canonical correlation, Tenenhaus2
and forward selection ...
Summary: CA-related methods in ecology
-
Method Abbr Responsevars
Predictors
Correspondence analysis CA Community
Canonical CA CCA Communitydata
Environmental variables
CCA Partial Least Squares CCA-PLS Community Many env.vars
Weighted averaging WA Env.var. Communitydata
WA-PLS WA-PLS Env.var(s) Community
Co-correspondenceanalysis
CO-CA Community Community
Coinertia anal., RLQ,...
Open questions
(C)CA is a chameleon: sometimes shows up as linear method: reconstitution formula & biplot unimodal method: relation with CVA & r-c distance plot
In the RC model and logratio analysis: clearly both
With arch effect: 1-d reconstitution bad....Can Greenacre 2009 power transformations shed light?
Sparse CA?
Sparse CCA?
Apply principal response curves (PRC) in CCA context
PRC: good for display of interactions
Max δ* = i,k yik (xi - uk )2 / y++
subject to i yi+ xi =0, i (yi+ /y++) xi 2=1 and x=Zb
squared distance (xi - uk )2 has meaning (unfolding) the data are a kind of weights and are not approximated
in any formal way.
See also Nishisato’s (1980) quantification of row and columns in one way ANOVA: his D2 = 1/δ*=B/W
and ideal point model
Unfolding criterion
(cf. Heiser 1987, Takane et al 1991)
As in multigroup disciminant: max W/B
CA:
Max ! δ = k y+k uk2 / y++ with
uk = i yik xi /i yik, ,weighted average of sites scores {xi }
which are Dr-standardized
i yi+ xi =0 and i (yi +/y++) xi 2=1 .
CCA: extra contraint x=Zb
δ = dispersion = weighted variance = inertia
Best separation of species niches, max dispersion δ
ter Braak 1987
In multigroup disciminant: B/W instead of δ=B/T
Gaussian regression → ordination
• Gaussian regression:
log(Eyik) = ck - (moisture - uk)2 / 2tk
2
{uk} = Species scores or optimum
{xi} = Site scores that best explain the species data by Gaussian curves
• Gaussian ordination:
log(Eyik ) = ck - (xi - uk)2 / 2tk2
yki
i sitesk species
yik ≥ 0
abundance
↑
→moisture
yik
Gaussian ordination?• Axes cannot be extracted one after the
other; a joint fit is needed- difficult to fit more than 1 axis, but see R{VGAM}
• Axes are not nested:- solution for axis 1 changes when fitting 2 axes
jointly- this problem is common outside the eigenvalue
techniques such as PCA - CCA
• Ecologists invented a simpler approximate method: reciprocal averaging alias correspondence analysis (CA)