Cajo J. F. ter Braak Biometris, Wageningen University and Research Centre [email protected] ter+Braak History of

Cajo J. F. ter Braak

Biometris, Wageningen University and Research Centre

[email protected]

www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak

History of canonical correspondence analysis in ecology

correspondence analysis without chi-square and

the barycentric principle as ecological niche model

Biometrisquantitative methods in life and earth sciences

25 year

1986-2011

mailto:[email protected]

http://www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak

History of canonical correspondence analysis in ecology

Introduction: Papers 1986-88 & citation Example data, eigen equation & diagram

History of CA in ecology Theory, WA method From WA to CA, algorithm Transition formulas, but why optimal?

History of CCA in ecology my prehistory: absences don’t count 1,2,3,4,5 my history 1-3

Theory of CCA From WA to CA to CCA, algorithm Eigen equation and its origin My CCA derivation 1-2 Duality diagram CCA, triplets & biplots

Ordination diagram Description, Data, & example CCA in R

CCA, RC model & ideal point discriminant analysis Later landmarks Summary of CA-related methods in ecology Open questions

Canonical Correspondence Analysis (CCA) 25 years

New multivariate technique to relate community composition to known variation in the environment

Extends Correspondence Analysis (CA) Two steps (1. CA, 2. interpret axes) →one step (CCA) Ordination diagram: the pattern of community variation that can

be explained best by known environment

CCA = (M)CA with external linear restrictions on the row points

from the abstract:

1986

duality diagram of CCA (ACC)

1987

barycentric principle

vegetation

plant ecology

animal ecology

freshwater ecologymarine ecology

soil ecologymycologyother

Citations, Canoco and usage

3000+ citations to ter Braak (86,87,95) on CCA

1996 2000 2004 2008

050

100

150

200

250

300

cita

tions

Spider abundance Yn×m and environment Zn×p

n = 28 sites (pitfalls), m = 12 spider species , p = 6 environmental

variables

row/column order from CCA

CtB, Ecology 1986DATA:

discretized data

YT=

ZT=

sites

species

env. var.

Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik }n

m (e.g. abundances of

of m species in n sites)

Z quantitative and/or nominal predictor matrix {zij }np

(on p environmental variables)

Eigen equation: with

Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)

Looks like: canonical correlation analysis, but Dc/r ..?...

Special case: Z = In×n or indicator matrix correspondence analysis

Canonical Correspondence Analysis (CCA)?

Y~Z

eigen eq in appendix

(ZTY (Dc-1) YTZ – λZTDrZ)b =0

CtB, Ecology 1986

CCA ordination diagram (factorial plane)

CtB, Ecology 1986

Projection of species points on

BARE SAND arrow shows

approx. ranking of the species

centroids along this variable

biplot {u, c*} of Dc-1YT Z ={weighted average of species wrt

environment variables}

joint plot { x*, u } of Y as in CA

uk species

xi*

sites

c* env var

weighted least squares biplot

History of CA in ecology builds on (1)

Theory:

“a plant species does not grow when it is either too wet or too dry”

Liebig’s law: a species requires a minimum amount of aresource (e.g. N): agriculture

Shelford’s law of tolerance (1919): but also does not tolerate more than a certain maximum yik

→resource

Ecological niche: region where species actually grows

Niches vary among species

preference model

History of CA in ecology builds on (2)

Method of weighted averaging (WA) to obtain preference or indicator value of a species wrt to a

physical gradient, e.g. moisture WA of moisture values at sites where species occurs

estimate of moisture value at a site WA of indicator values of species that occur there

Gause (1930), Ellenberg (1948), Whittaker (1948)

Idea: iterate to replace a manifest gradient by the best latent one

→ Reciprocal averaging (Hill, 1973,74) == CA

“le principe barycentric” both ways

Independently: Roux&Roux, RSA,1967

→moisture

uk = centroid , weighted average wrt moisture, “optimum”

From weighted averaging (WA) to CA (1)

• WA: uk = i yik moisturei /i yik

yik

→moisture

i sitesk species

yik ≥ 0

→moisture

uk = centroid, weighted average wrt moisture, “optimum”



For presence/absence data yik is 1 or 0.

Example: species k found at sites with moisture values 20, 30 and 40 and nowhere else, then

uk = (1*20+1*30+1*40)/(1+1+1)= 90/3 = 30

→moisture

i sitesk species

yik ≥ 0

yik

Reverse (calibration): weighted averaging of species optima

From training data or literature, the ‘optima’ {uk} of 200 species wrt a moisture scale [0-100] are known.

Then an estimate of the moisture value at a site is obtained from the species composition at the site by the weighed average:

xi = k yikuk/ k yik -site score is (prop. to) WA of species scores

where yik is abundances (amounts,yik ≥ 0 ) of these species.

Example: Let site i contain the species with optima 75, 80, 85Then it moisture value is estimated as xi = (1*75+1*80+1*85)/(1+1+1)=80 (the average of the optima of species present)

If the abundances (amounts) of these species are 4, 1,1 then

xi = (4*75+1*80+1*85)/(4+1+1)= 465/6=77.5 (the weighted average of the optima of species present)

Algorithm CA

• Start with arbitrary site scores { xi } with zero mean

• Calculate new species scores = WA of site scores:

uk = i yikxi /i yik

• Calculate new site scores = WA of species scores:

xi = k yikuk / k yik

• Remove arbitrariness in scaling by standardizing the site scores

• Stop on convergence, i.e. when site scores stay the same after cycle

WA = Weighted Averaging; RA = Reciprocal Averaging

(Hill 1973)

==Two way weighted summation

1- bk = i yik xi

xi = k yikbk

eigenvalue >0 (prop. to variance

explained)

→Transition formulae

Correspondence analysis == Two way Weighted Averaging

1- uk = i yikxi / i yik -species score is (prop. to) WA of site

scores

xi = k yikuk / k yik -site score is (prop. to) WA of species

scores

l eigenvalue (between 0 and 1) = scaling of the axis

But what does it optimize?

(Hill 1973)

For comparison

Principal components (PCA)

Reciprocal linear regression

bk = i yik xi /i xi2

xi = k yikbk /k bk2

But what does it optimize? Seriation: recovery of a Petrie matrix, consecutive 1’s Relation with canonical correlation: max.

correlation of row and column quantification Finds block structures in two-way data → cluster analysis

(TWINSPAN), spectral clustering Also: bad features

sensitivity to rare species arch effect (Guttman effect)

Hill 1974, Heiser 1981

truthCA

DECORANA, 1979 Hill & Gauch 1980

How was CCA derived? My prehistory Benzécri’s 1973 L’analyse des donneés I and II with Hans MSc Newcastle (UK, ’79-80), learned ML eqs from RL Plackett

contacts with:

Colin Prentice (NMDS) and Mark Hill (CA/RA/DECORANA) Mark did not like my PCA biplot and diversity paper (Ecology ’83).

Ecology is not linear... Differences in niche location create diversity! See DECORANA

Second Gifi course Leiden (’81):J de Leeuw/W Heiser From ’81, for ecologists, I studied properties of

Weighted Averaging vs ML in ecological niche models (Gaussian model) (ter Braak Vegetatio ’86, Math. Biosc. ’86)

Reciprocal Averaging (CA) vs ML (ter Braak Biometrics ’85)

When might WA be (nearly) as good as ML?

1979-1984

Absences don’t count in...weighted averaging

“telephone poles have an optimum of pH of 5.5”

h(x|y>0) = probability density function of x given species presence

f(y>0|x) = occurrence probability given x

g(x) = (prior) pdf of x

Bayes’theorem

h(x|y>0) g(x) f(y>0|x)

g(x) is same for all species, so perhaps distinction does not matter in MVA...

response function in logistic regression uses

WA/CA uses

Greig-Smith 1983

Absences don’t count in... estimating optimaRegression and WA:

a) response functions for species A and B:

b) x~ Uniform or even, g(x) ~ 1

c) But for this g(x) reversal of optima

So be warned, use GLM ...

ter Braak, Vegetatio 1986

a)

b)

c)

WA for

g(x)

Absences don’t count in... estimating site conditions

Calibration and WA:

Find a model such that the weighted average

xi = k yikuk / k yik is as efficient as maximum likelihood (ML): species packing model

μik= ckexp{-½ ( xi-uk )2 /tk }; yik~Poi(μik)

-uk uniform over large interval A

-ck constant or independent of uk

-tk constant

ter Braak, Math. Biosc. 1986

WA for

Absences don’t count: CA vs Gaussian ordination

Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictor x

μik= ckexp{-½ ( xi-uk )2 /tk }; yik~Poi(μik)



-xi uniform over large interval B in A

-tk constant

ter Braak Biometrics 1985

Unimodal model

How was CCA derived? My history (1) Preparing with Colin Prentice in Uppsala (1984)

a Theory of Gradient Analysis, search for

“something like canonical correlation for niche models”

Idea: linear constraints on scores in Gaussian ordination and

approximate the ML equations, as I did for CA (Oct 1984)

1985: linear combination of Z that best separates species niches

Niche: region in x-space which species occurs (yik >0)

1984-5

How was CCA derived? My history (2) Quick first draft → report (1985) sent around. Jan de Leeuw commented quickly:

CCA is a relatively simple special case of HOMALS (or, if you like CANALS). [YES!]

this suggests directly which geometry Gifi would use. Nothing bilinear, nothing biplot, just centres of gravity [YES, for nominal predictors]

Extending DECORANA (Hill, 1979) to include CCA → CANOCO v1.0 ’85 and later PCA & redundancy analysis

Help from Onno van Tongeren and Petr Smilauer for

CANOCO v2.0 1985; v3.0 1990 (permutation testing), v4.0 1998 (1st Windows version), v4.5 2002 (2nd Windows version) Ambassadors: Colin Prentice, John Birks, Paul van de Brink

1985 & Canoco

How was CCA derived? My history (3) Partial CCA at 1st IFCS in Aachen (Germany): Y. Escoufier stood up and ..... disliked CCA and ... invited me to

Montpellier to work with JD Lebreton, D Chessel and P Sabatier, who independently invented CCA just a few months after I did.

I tried to bridge the gap: lectured in English with only French references learned duality diagrams

Later found : RH Green, Ecology 1971,1974: multi-group discriminant analysis -

a precursor of CCA (lacking environmental arrows)

Interest lost in same time that CA got popular via Mark Hill (1973-81)!

Lets celebrate now 40 years of CCA!

1986-France


How well does moisture explain the data?


δ = dispersion of species scores

δ = k y+k uk 2 / y++ wrt standardized env. var.

uk = centroid , weighted average wrt env. var., “optimum”

i sitesk species

yik ≥ 0

yik

=B/T



{xi} = Site scores that best separate the species curves as measured by the dispersion of the {uk}

• CA: uk = i yik xi /i yik

uk = Species score, centroid, weighted average “optimum”

i sitesk species

yik ≥ 0

yik

Arch effect: folded first axis has also high δ2

From weighted averaging (WA) to CCA (4)


{xi} = Site scores that are the linear combination of environmental variables {zj} that best separate the

species curves as measured by the dispersion of the {uk}

• CCA: xi = i cj zij uk = i yik xi /i yik

uk = Species score, centroid, weighted average “optimum”

i sitesk species

yik ≥ 0

yik

In multigroup disciminant: max B/T

*Site weights = { yi+ } but why?

Algorithm CA• Start with arbitrary site scores with zero mean

• Calculate new species scores = WA of site scores

• Calculate new site scores = WA of species scores

• Remove arbitrariness in scaling by standardizing the site score

• Stop on convergence, i.e. when site scores stay the same after cycle

WA = Weighted averaging; (C)CA = (Canonical)

Correspondence Analysis

and constrain the site scores by a weighted* multiple regression on the env. variables; take fitted values as new scores

C

Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik }n

m (e.g. abundances of

of m species in n sites)

Z quantitative and/or nominal predictor matrix {zij }np

(on p environmental variables)

Eigen equation: with

Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)

Looks like: canonical correlation analysis, but Dc/r ..?...

Special case: Z = In×n or indicator matrix correspondence analysis

Canonical Correspondence Analysis (CCA)?

Y~Z

eigen eq in appendix


CtB, Ecology 1986

Origin of CCA - Why is this equation appealing?

Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictor

x = Zb (b unknown)

μik= ckexp{-½ ( xi-uk )2}; yik~Poi(μik)



-xi uniform over large interval B in A

As in CA, but now with linear restrictions

ter Braak 1986,1987

CCA derivation (1): ML eqs of equi-widthGaussian response model with latent predictor

xi = j zij bj {bj } to be estimated

uk = i yikxi /y+k – [i (xi - uk )μik /y+k ] (A.1)

i zij [k yik (xi - uk )] = i [k (xi - uk ) μik ] zij (A.2)

Under the conditions

k (xi - uk ) μik ]≈0 and i (xi - uk )μik ≈-λ* uk y+k

we obtain

λ uk = i yikxi /y+k (λ = 1- λ*)

i zij [k yik (xi - uk )]=0

CtB, Ecology 1986

(CtB,Biometris 1985)

CCA derivation (2) : Transition formulas of CCA

From:

xi = j zij bj

λ uk = i yikxi /y+k

i zij [k yik (xi - uk )]=0

λ uk = i yikxi /y+k

xi*= k yikuk / yk+

b = (ZTRZ )-1ZTRx*

x = Zb

In matrix notation: u = Dc

-1YT xx* = Dr

-1Yub = (ZTDrZ )-1ZTDrx*

x = Zb

b = (ZTDrZ )-1ZTDrx*

= (ZTDrZ )-1ZTYu

= (ZTDrZ )-1ZTY Dc-1YT x-1

=(ZTDrZ )-1ZTY Dc-1YT Zb-1


CtB, Ecology 1986

Duality diagrams and transition formulas

Duality diagrams and transition formulas

CCA, PCA-triplets and biplots

fitted contingency ratios

fitted table of WA’s

low rank regression coefs

Ordination diagram (factorial plane)

Two sets of sites (row) scores x derived from Z (environmental data) x* derived from Y (species data)[chosen to be closer to Y]

Plot x*, u and interset correlations, c* = cor (Z,x* ) CA type of joint plot { x*, u } of Y WLS-biplot {u, c*} of Dc

-1YT Z ={weighted average of species wrt environment variables}

For nominal predictors: class point at centroid of x* CA type plot, but no example given except in Canoco

manual

CtB, Ecology 1986

(instead of b)

Dune meadow data

n = 20 sites (meadows),

m = 30 plant species ,

p = 8 environmental variables

Predictors mix of quantitative (3) nominal (1 with 4 classes BF,SF,NM,HF)

→ class centroids (squares) pCCA uses all power of regression

factors, interactions between pred.

row/column order from CCA

CtB, Vegetio,1987

YT=

ZT=

discretized data

CCA Dune meadow biplot of species’ centroids (WA) w.r.t. Manure

-1.0 +1.0

-1.0

+1.0

MoistureA1 hor

NM

BF

Manure

HF

Use

SF

14......

15......

16......

5.......

19......20......

6.......

13......

2.......7.......

1.......

8.......

10......

12......

11......

4.......3.......

9.......

17......18......

Pot pal

Cal cus

Ele pal

Ran fla

Emp nig

Tri pra

Che alb

Jun art

Ach mil

Bro hor

Pla lan

Rum ace

Agr sto

Sal rep

Bel per

Air pra

Ely repLol per

Vic lat

Hyp rad

Poa pra

Ant odo

Cir arv Alo gen

Sag pro

Jun buf

Poa tri

Bra rutTri rep

Leo aut

spec uk,

xi*

sites

c* env var

Barycentric principle

everywhere: projection points of

species are centroids of

the site projections on

the same environmental

variable

u = WA(x) λx= WA(u)

squares:

class centroids

CCA in R packages anacor, vegan, ade4 #e.g.: library(anacor) # Jan de Leeuw & Patrick Mair# CCAres = anacor(dune, row.covariates = dune.env)plot(res)

library(vegan)cca(Y~ Z)# also with a formula interface# e.g factors A, B and Ccca(Y~ A*B + Condition(C))

-4 -3 -2 -1 0 1

-4-3

-2-1

01

Joint plot

Dimension 1

Dim

ensi

on 2

213 4

16 6

1

8 5

17

15

10

11

9

18

3

20

14

19

12

7Belper

Empnig

Junbuf

Junart

Airpra

Elepal

Rumace

ViclatBrarutRanfla

Cirarv

Hyprad

LeoautPotpal

Poapra

Calcus

TripraTrirep

Antodo

Salrep

Achmil

Poatri

Chealb Elyrep

SagproPlalan

AgrstoLolper

AlogenBrohor

(C)CA & RC-model, ideal point discr. anal.

Goodman’s RC model Eyik = rickexp(xi T uk ) is equivalent

to Eyik = rickexp(d2(xi , uk )) with

d2(xi , uk )= (xi - uk )T(xi - uk )

which Ihm’ model B, and

with x=Zb identical to ideal point discriminant anal.

(C)CA is approx. to this model (ter Braak 1988, Takane ...)

Goodman showed this for small λ, ter Braak for large λ

See de Rooij, 2007 JCGS for plot scalings

bilinear model

distance model

Later landmarks Cointertia analysis (Dolédec & Chessel , 1994)

Linear restrictions, norm on b coefficients CCA-PLS (ter Braak & Verdonschot, 1995) Key: 1.Pre-transform:

2. apply PLS 3. post-transform!

Variance partitioning (Borcard, Legendre, Drapeau, 1992)

RLQ (Dolédec et al. 1996) ; doubly constrained CA (Lavourel...)

All (?) CA methods obtainable from standard methods by inflating the data matrices (super indicator matrices, in

ecology: unit = the individual instead of the site by pre and post transformation

→Regularized generalized canonical correlation, Tenenhaus2

and forward selection ...

Summary: CA-related methods in ecology

-

Method Abbr Responsevars

Predictors

Correspondence analysis CA Community

Canonical CA CCA Communitydata

Environmental variables

CCA Partial Least Squares CCA-PLS Community Many env.vars

Weighted averaging WA Env.var. Communitydata

WA-PLS WA-PLS Env.var(s) Community

Co-correspondenceanalysis

CO-CA Community Community

Coinertia anal., RLQ,...

Open questions

(C)CA is a chameleon: sometimes shows up as linear method: reconstitution formula & biplot unimodal method: relation with CVA & r-c distance plot

In the RC model and logratio analysis: clearly both

With arch effect: 1-d reconstitution bad....Can Greenacre 2009 power transformations shed light?

Sparse CA?

Sparse CCA?

Apply principal response curves (PRC) in CCA context

PRC: good for display of interactions

Max δ* = i,k yik (xi - uk )2 / y++

subject to i yi+ xi =0, i (yi+ /y++) xi 2=1 and x=Zb

squared distance (xi - uk )2 has meaning (unfolding) the data are a kind of weights and are not approximated

in any formal way.

See also Nishisato’s (1980) quantification of row and columns in one way ANOVA: his D2 = 1/δ*=B/W

and ideal point model

Unfolding criterion

(cf. Heiser 1987, Takane et al 1991)

As in multigroup disciminant: max W/B

CA:

Max ! δ = k y+k uk2 / y++ with

uk = i yik xi /i yik, ,weighted average of sites scores {xi }

which are Dr-standardized

i yi+ xi =0 and i (yi +/y++) xi 2=1 .

CCA: extra contraint x=Zb

δ = dispersion = weighted variance = inertia

Best separation of species niches, max dispersion δ

ter Braak 1987

In multigroup disciminant: B/W instead of δ=B/T

Gaussian regression → ordination

• Gaussian regression:

log(Eyik) = ck - (moisture - uk)2 / 2tk

2

{uk} = Species scores or optimum

{xi} = Site scores that best explain the species data by Gaussian curves

• Gaussian ordination:

log(Eyik ) = ck - (xi - uk)2 / 2tk2

yki

i sitesk species

yik ≥ 0

abundance

↑

→moisture

yik

Gaussian ordination?• Axes cannot be extracted one after the

other; a joint fit is needed- difficult to fit more than 1 axis, but see R{VGAM}

• Axes are not nested:- solution for axis 1 changes when fitting 2 axes

jointly- this problem is common outside the eigenvalue

techniques such as PCA - CCA

• Ecologists invented a simpler approximate method: reciprocal averaging alias correspondence analysis (CA)

Documents

Cajo J. F. ter Braak Biometris, Wageningen University and Research Centre [email protected] ter+Braak History of