40
A peer-reviewed version of this preprint was published in PeerJ on 9 February 2017. View the peer-reviewed version (peerj.com/articles/2969), which is the preferred citable publication unless you specifically need to cite this preprint. Washburne AD, Silverman JD, Leff JW, Bennett DJ, Darcy JL, Mukherjee S, Fierer N, David LA. 2017. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5:e2969 https://doi.org/10.7717/peerj.2969

Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

A peer-reviewed version of this preprint was published in PeerJ on 9February 2017.

View the peer-reviewed version (peerj.com/articles/2969), which is thepreferred citable publication unless you specifically need to cite this preprint.

Washburne AD, Silverman JD, Leff JW, Bennett DJ, Darcy JL, Mukherjee S,Fierer N, David LA. 2017. Phylogenetic factorization of compositional datayields lineage-level associations in microbiome datasets. PeerJ 5:e2969https://doi.org/10.7717/peerj.2969

Page 2: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Phylogenetic factorization of compositional data yields

lineage-level associations in microbiome datasets

Alex D Washburne Corresp., 1 , Justin D Silverman 2, 3, 4, 5 , Jonathan W Leff 6 , Dominic J Bennett 7, 8 , John L Darcy 9 ,

Sayan Mukherjee 2, 10 , Noah Fierer 6 , Lawrence A David 2, 4, 5

1 Nicholas School of the Environment, Duke University, Durham, North Carolina, United States

2 Program for Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States

3 Medical Scientist Training Program, Duke University, Durham, North Carolina, United States

4 Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States

5 Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States

6 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, United States

7 Department of Earth Science and Engineering, Imperial College London, London, United Kingdom

8 Institute of Zoology, Zoological Society of London, London, United Kingdom

9 Department of Ecology and Evolution, University of Colorado Boulder, Boulder, Colorado, United States

10 Department of Statistical Science, Mathematics, and Computer Science, Duke University, Durham, North Carolina, United States

Corresponding Author: Alex D Washburne

Email address: [email protected]

Marker gene sequencing of microbial communities has generated big datasets of microbial

relative abundances varying across environmental conditions, sample sites and

treatments. These data often come with putative phylogenies, providing unique

opportunities to investigate how shared evolutionary history affects microbial abundance

patterns. Here, we present a method to identify the phylogenetic factors driving patterns

in microbial community composition. We use the method, "phylofactorization", to re-

analyze datasets from the human body and soil microbial communities, demonstrating

how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool,

and an inferential tool for identifying edges in the phylogeny along which putative

functional ecological traits may have arisen.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2685v1 | CC BY 4.0 Open Access | rec: 3 Jan 2017, publ: 3 Jan 2017

Page 3: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

P❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦r✐③❛t✐♦♥ ♦❢ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛

②✐❡❧❞s ❧✐♥❡❛❣❡✲❧❡✈❡❧ ❛ss♦❝✐❛t✐♦♥s ✐♥ ♠✐❝r♦❜✐♦♠❡

❞❛t❛s❡ts

❉❡❝❡♠❜❡r ✷✼✱ ✷✵✶✻

❆✉t❤♦rs✿ ❆❧❡① ❉ ❲❛s❤❜✉r♥❡1,†✱ ❏✉st✐♥ ❉ ❙✐❧✈❡r♠❛♥2,3,4,5✱ ❏♦♥❛t❤❛♥ ❲ ▲❡✛6✱

❉♦♠✐♥✐❝ ❏ ❇❡♥♥❡tt7,8✱ ❏♦❤♥ ▲✳ ❉❛r❝②9✱ ❙❛②❛♥ ▼✉❦❤❡r❥❡❡2,10✱ ◆♦❛❤ ❋✐❡r❡r6✱ ❛♥❞

▲❛✇r❡♥❝❡ ❆ ❉❛✈✐❞2,4,5

1 ◆✐❝❤♦❧❛s ❙❝❤♦♦❧ ♦❢ t❤❡ ❊♥✈✐r♦♥♠❡♥t✱ ❉✉❦❡ ❯♥✐✈❡rs✐t②✱ ❉✉r❤❛♠✱ ◆♦rt❤ ❈❛r✲

♦❧✐♥❛✱ ❯♥✐t❡❞ ❙t❛t❡s

2 Pr♦❣r❛♠ ❢♦r ❈♦♠♣✉t❛t✐♦♥❛❧ ❇✐♦❧♦❣② ❛♥❞ ❇✐♦✐♥❢♦r♠❛t✐❝s✱ ❉✉❦❡ ❯♥✐✈❡rs✐t②✱

❉✉r❤❛♠✱ ◆♦rt❤ ❈❛r♦❧✐♥❛✱ ❯♥✐t❡❞ ❙t❛t❡s

3 ▼❡❞✐❝❛❧ ❙❝✐❡♥t✐st ❚r❛✐♥✐♥❣ Pr♦❣r❛♠✱ ❉✉❦❡ ❯♥✐✈❡rs✐t②✱ ❉✉r❤❛♠✱ ◆♦rt❤ ❈❛r✲

♦❧✐♥❛✱ ❯♥✐t❡❞ ❙t❛t❡s

4 ❈❡♥t❡r ❢♦r ●❡♥♦♠✐❝ ❛♥❞ ❈♦♠♣✉t❛t✐♦♥❛❧ ❇✐♦❧♦❣②✱ ❉✉❦❡ ❯♥✐✈❡rs✐t②✱ ❉✉r❤❛♠✱

◆♦rt❤ ❈❛r♦❧✐♥❛✱ ❯♥✐t❡❞ ❙t❛t❡s

5 ❉❡♣❛rt♠❡♥t ♦❢ ▼♦❧❡❝✉❧❛r ●❡♥❡t✐❝s ❛♥❞ ▼✐❝r♦❜✐♦❧♦❣②✱ ❉✉❦❡ ❯♥✐✈❡rs✐t②✱ ❉✉r❤❛♠✱

◆♦rt❤ ❈❛r♦❧✐♥❛✱ ❯♥✐t❡❞ ❙t❛t❡s

6 ❈♦♦♣❡r❛t✐✈❡ ■♥st✐t✉t❡ ❢♦r ❘❡s❡❛r❝❤ ✐♥ ❊♥✈✐r♦♥♠❡♥t❛❧ ❙❝✐❡♥❝❡s ✭❈■❘❊❙✮✱ ❯♥✐✲

✈❡rs✐t② ♦❢ ❈♦❧♦r❛❞♦✱ ❇♦✉❧❞❡r✱ ❈♦❧♦r❛❞♦✱ ❯♥✐t❡❞ ❙t❛t❡s

7❉❡♣❛rt♠❡♥t ♦❢ ❊❛rt❤ ❙❝✐❡♥❝❡ ❛♥❞ ❊♥❣✐♥❡❡r✐♥❣✱ ■♠♣❡r✐❛❧ ❈♦❧❧❡❣❡ ▲♦♥❞♦♥✱ ▲♦♥✲

❞♦♥ ❯❑

8■♥st✐t✉t❡ ♦❢ ❩♦♦❧♦❣②✱ ❩♦♦❧♦❣✐❝❛❧ ❙♦❝✐❡t② ♦❢ ▲♦♥❞♦♥✱ ▲♦♥❞♦♥ ❯❑

9❉❡♣❛rt♠❡♥t ♦❢ ❊❝♦❧♦❣② ❛♥❞ ❊✈♦❧✉t✐♦♥❛r② ❇✐♦❧♦❣②✱ ❯♥✐✈❡rs✐t② ♦❢ ❈♦❧♦r❛❞♦✱

❇♦✉❧❞❡r✱ ❈❖ ✽✵✸✵✾

10❉❡♣❛rt♠❡♥ts ♦❢ ❙t❛t✐st✐❝❛❧ ❙❝✐❡♥❝❡✱ ▼❛t❤❡♠❛t✐❝s✱ ❛♥❞ ❈♦♠♣✉t❡r ❙❝✐❡♥❝❡✱ ❉✉❦❡

❯♥✐✈❡rs✐t②✱ ❉✉r❤❛♠✱ ◆❈ ✷✼✼✵✽

Page 4: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

†❈♦rr❡s♣♦♥❞✐♥❣ ❛✉t❤♦r❀ ❝♦♥t❛❝t ❛t ❛❧❡①✳❞✳✇❛s❤❜✉r♥❡❅❣♠❛✐❧✳❝♦♠

Page 5: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❆❜str❛❝t✶

▼❛r❦❡r ❣❡♥❡ s❡q✉❡♥❝✐♥❣ ♦❢ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ❤❛s ❣❡♥❡r❛t❡❞ ❜✐❣ ❞❛t❛s❡ts✷

♦❢ ♠✐❝r♦❜✐❛❧ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ✈❛r②✐♥❣ ❛❝r♦ss ❡♥✈✐r♦♥♠❡♥t❛❧ ❝♦♥❞✐t✐♦♥s✱ s❛♠✲✸

♣❧❡ s✐t❡s ❛♥❞ tr❡❛t♠❡♥ts✳ ❚❤❡s❡ ❞❛t❛ ♦❢t❡♥ ❝♦♠❡ ✇✐t❤ ♣✉t❛t✐✈❡ ♣❤②❧♦❣❡♥✐❡s✱✹

♣r♦✈✐❞✐♥❣ ✉♥✐q✉❡ ♦♣♣♦rt✉♥✐t✐❡s t♦ ✐♥✈❡st✐❣❛t❡ ❤♦✇ s❤❛r❡❞ ❡✈♦❧✉t✐♦♥❛r② ❤✐st♦r②✺

❛✛❡❝ts ♠✐❝r♦❜✐❛❧ ❛❜✉♥❞❛♥❝❡ ♣❛tt❡r♥s✳ ❍❡r❡✱ ✇❡ ♣r❡s❡♥t ❛ ♠❡t❤♦❞ t♦ ✐❞❡♥t✐❢②✻

t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ❞r✐✈✐♥❣ ♣❛tt❡r♥s ✐♥ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥✳✼

❲❡ ✉s❡ t❤❡ ♠❡t❤♦❞✱ ✏♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✑✱ t♦ r❡✲❛♥❛❧②③❡ ❞❛t❛s❡ts ❢r♦♠ ❤✉♠❛♥✽

❜♦❞② ❛♥❞ s♦✐❧ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s✱ ❞❡♠♦♥str❛t✐♥❣ ❤♦✇ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s✾

❛ ❞✐♠❡♥s✐♦♥❛❧✐t②✲r❡❞✉❝✐♥❣ t♦♦❧✱ ❛♥ ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ t♦♦❧✱ ❛♥❞ ❛♥ ✐♥❢❡r❡♥✲✶✵

t✐❛❧ t♦♦❧ ❢♦r ✐❞❡♥t✐❢②✐♥❣ ❡❞❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥② ❛❧♦♥❣ ✇❤✐❝❤ ♣✉t❛t✐✈❡ ❢✉♥❝t✐♦♥❛❧✶✶

❡❝♦❧♦❣✐❝❛❧ tr❛✐ts ♠❛② ❤❛✈❡ ❛r✐s❡♥✳✶✷

■♥tr♦❞✉❝t✐♦♥✶✸

▼✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ♣❧❛② ✐♠♣♦rt❛♥t r♦❧❡s ✐♥ ❤✉♠❛♥ ❬✻❪✱ ❧✐✈❡st♦❝❦ ❬✶✽❪ ❛♥❞✶✹

♣❧❛♥t ❬✸❪ ❤❡❛❧t❤✱ ❜✐♦❣❡♦❝❤❡♠✐❝❛❧ ❝②❝❧❡s ❬✷✱ ✶✷❪✱ t❤❡ ♠❛✐♥t❡♥❛♥❝❡ ♦❢ ❡❝♦s②st❡♠✶✺

♣r♦❞✉❝t✐✈✐t② ❬✹✵❪✱ ❛♥❞ ❜✐♦r❡♠❡❞✐❛t✐♦♥ ❬✷✻❪✳ ●✐✈❡♥ t❤❡ ✐♠♣♦rt❛♥❝❡ ♦❢ ♠✐❝r♦❜✐❛❧✶✻

❝♦♠♠✉♥✐t✐❡s ❛♥❞ t❤❡ ✈❛st ♥✉♠❜❡r ♦❢ ✉♥❝✉❧t✉r❡❞ ❛♥❞ ✉♥❞❡s❝r✐❜❡❞ ♠✐❝r♦❜❡s ❛s✲✶✼

s♦❝✐❛t❡❞ ✇✐t❤ ❛♥✐♠❛❧ ❛♥❞ ♣❧❛♥t ❤♦sts ❛♥❞ ✐♥ ♥❛t✉r❛❧ ❛♥❞ ❡♥❣✐♥❡❡r❡❞ s②st❡♠s✱✶✽

✉♥❞❡rst❛♥❞✐♥❣ t❤❡ ❢❛❝t♦rs ❞❡t❡r♠✐♥✐♥❣ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② str✉❝t✉r❡ ❛♥❞ ❢✉♥❝✲✶✾

t✐♦♥ ✐s ♠❛❥♦r ❝❤❛❧❧❡♥❣❡ ❢♦r ♠♦❞❡r♥ ❜✐♦❧♦❣②✳✷✵

▼❛r❦❡r ❣❡♥❡ s❡q✉❡♥❝✐♥❣ ✭❡✳❣✳ ✶✻❙ r❘◆❆ ❣❡♥❡ s❡q✉❡♥❝✐♥❣ t♦ ❛ss❡ss ❜❛❝t❡r✐❛❧ ❛♥❞✷✶

❛r❝❤❛❡❛❧ ❞✐✈❡rs✐t② ❛♥❞ ✶✽❙ ♠❛r❦❡rs ❢♦r ❊✉❦❛r②♦t✐❝ ❞✐✈❡rs✐t②✮ ✐s ♥♦✇ ♦♥❡ ♦❢ t❤❡✷✷

♠♦st ❝♦♠♠♦♥❧② ✉s❡❞ ❛♣♣r♦❛❝❤❡s ❢♦r ❞❡s❝r✐❜✐♥❣ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s✱ q✉❛♥t✐✲✷✸

❢②✐♥❣ t❤❡ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ♦❢ ✐♥❞✐✈✐❞✉❛❧ ♠✐❝r♦❜✐❛❧ t❛①❛✱ ❛♥❞ ❝❤❛r❛❝t❡r✐③✐♥❣✷✹

❤♦✇ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ❝❤❛♥❣❡ ❛❝r♦ss s♣❛❝❡✱ t✐♠❡✱ ♦r ✐♥ r❡s♣♦♥s❡ t♦ ❦♥♦✇♥✷✺

❜✐♦t✐❝ ♦r ❛❜✐♦t✐❝ ❣r❛❞✐❡♥ts✳✷✻

❆♥❛❧②③✐♥❣ t❤❡s❡ ❞❛t❛ ✐s ❝❤❛❧❧❡♥❣✐♥❣ ❞✉❡ t♦ t❤❡ ♣❡❝✉❧✐❛r ♥♦✐s❡ str✉❝t✉r❡ ♦❢ s❡q✉❡♥❝❡✲✷✼

❝♦✉♥t ❞❛t❛ ❬✸✽❪✱ t❤❡ ✐♥❤❡r❡♥t❧② ❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛t✉r❡ ♦❢ t❤❡ ❞❛t❛ ❬✶✻❪✱ ❞❡❝✐❞✐♥❣✷✽

t❤❡ t❛①♦♥♦♠✐❝ s❝❛❧❡ ♦❢ ✐♥✈❡st✐❣❛t✐♦♥ ❬✼✱ ✽✱ ✸✾❪✱ ❛♥❞ t❤❡ ❤✐❣❤✲❞✐♠❡♥s✐♦♥❛❧✐t② ♦❢✷✾

s♣❡❝✐❡s✲r✐❝❤ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ❬✶✹❪✳ ❚❤❡r❡ ✐s ❛ ❣r❡❛t ♥❡❡❞ ❛♥❞ ♦♣♣♦rt✉♥✐t②✸✵

t♦ ❞❡✈❡❧♦♣ t♦♦❧s t♦ ♠♦r❡ ❡✣❝✐❡♥t❧② ❛♥❛❧②③❡ t❤❡s❡ ❞❛t❛s❡ts ❛♥❞ ❧❡✈❡r❛❣❡ ✐♥❢♦r✲✸✶

♠❛t✐♦♥ ♦♥ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ r❡❧❛t✐♦♥s❤✐♣s ❛♠♦♥❣ t❛①❛ t♦ ❜❡tt❡r ✐❞❡♥t✐❢② ✇❤✐❝❤✸✷

Page 6: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❝❧❛❞❡s ❛r❡ ❞r✐✈✐♥❣ ❞✐✛❡r❡♥❝❡s ✐♥ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥ ❛❝r♦ss s❛♠♣❧❡✸✸

❝❛t❡❣♦r✐❡s ♦r ♠❡❛s✉r❡❞ ❜✐♦t✐❝ ♦r ❛❜✐♦t✐❝ ❣r❛❞✐❡♥ts ❬✸✵❪✳✸✹

▼❛♥② ♦❢ t❤❡s❡ ❝❤❛❧❧❡♥❣❡s ❝❛♥ ❜❡ r❡s♦❧✈❡❞ ❜② ♣❡r❢♦r♠✐♥❣ r❡❣r❡ss✐♦♥ ♦♥ ❝❧❛❞❡s✸✺

✐❞❡♥t✐✜❡❞ ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✳ ■♥ t❤✐s ♣❛♣❡r✱ ✇❡ t❛❦❡ ♦♥ t❤❡s❡ ❝❤❛❧❧❡♥❣❡s ❜② ❞❡✲✸✻

✈❡❧♦♣✐♥❣ ❛ ♠❡❛♥s t♦ ♣❡r❢♦r♠ r❡❣r❡ss✐♦♥ ♦❢ ❜✐♦t✐❝✴❛❜✐♦t✐❝ ❣r❛❞✐❡♥ts ♦♥ ✈❛r✐❛❜❧❡s✸✼

❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❜r❛♥❝❤❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡✱ t❤❡r❡❜② ❛❧❧♦✇✐♥❣ ❞✐♠❡♥s✐♦♥✲✸✽

❛❧✐t② r❡❞✉❝t✐♦♥ ✇✐t❤ ❛ ❝❧❡❛r ♣❤②❧♦❣❡♥❡t✐❝ ✐♥t❡r♣r❡t❛t✐♦♥ ❛♥❞ ❝♦♥s✐st❡♥t ✇✐t❤ t❤❡✸✾

❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛t✉r❡ ♦❢ t❤❡ ❞❛t❛✳✹✵

❈♦♥s✐❞❡r ❛ st✉❞② ♦♥ t❤❡ ❡✛❡❝t ♦❢ ♦①❛③♦❧✐❞✐♥♦♥❡s✱ ✇❤✐❝❤ ❛✛❡❝t ❣r❛♠✲♣♦s✐t✐✈❡ ❜❛❝✲✹✶

t❡r✐❛✱ ♦♥ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥✳ ❘❛t❤❡r t❤❛♥ r❡❣r❡ss✐♦♥ ♦❢ ❛♥t✐❜✐♦t✐❝✹✷

tr❡❛t♠❡♥t ♦♥ ❛❜✉♥❞❛♥❝❡ ❛t ♥✉♠❡r♦✉s t❛①♦♥♦♠✐❝ ❧❡✈❡❧s✱ st❛t✐st✐❝❛❧ ❛♥❛❧②s✐s ♦❢✹✸

❜❛❝t❡r✐❛❧ ❝♦♠♠✉♥✐t✐❡s tr❡❛t❡❞ ✇✐t❤ ❛♥ ♦①❛③♦❧✐❞✐♥♦♥❡ s❤♦✉❧❞ ✐♥st❛♥t❧② ✐❞❡♥t✐❢②✹✹

t❤❡ s♣❧✐t ❜❡t✇❡❡♥ ❣r❛♠✲♣♦s✐t✐✈❡ ❛♥❞ ❣r❛♠✲♥❡❣❛t✐✈❡ ❜❛❝t❡r✐❛ ❛s t❤❡ ♠♦st ✐♠♣♦r✲✹✺

t❛♥t ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦r ❞❡t❡r♠✐♥✐♥❣ r❡s♣♦♥s❡ t♦ ♦①❛③♦❧✐❞✐♥♦♥❡s✳ ❙✉❜s❡q✉❡♥t✹✻

❢❛❝t♦rs s❤♦✉❧❞ t❤❡♥ ❜❡ ✐❞❡♥t✐✜❡❞ ❜② ❝♦♠♣❛r✐♥❣ ❜❛❝t❡r✐❛ ✇✐t❤✐♥ t❤❡ ♣r❡✈✐♦✉s❧②✲✹✼

✐❞❡♥t✐✜❡❞ ❣r♦✉♣s✿ ✐❞❡♥t✐❢② ❝❧❛❞❡s ✇✐t❤✐♥ ❣r❛♠✲♣♦s✐t✐✈❡s ✇❤✐❝❤ ♠❛② ❜❡ ♠♦r❡✹✽

r❡s✐st❛♥t ♦r s✉s❝❡♣t✐❜❧❡ t❤❛♥ t❤❡ r❡♠❛✐♥✐♥❣ ❣r❛♠✲♣♦s✐t✐✈❡s✳ ❙♣❧✐tt✐♥❣ t❤❡ ♣❤②✲✹✾

❧♦❣❡♥② ❛t ❡❛❝❤ ✐♥❢❡r❡♥❝❡ ❛♥❞ ♠❛❦✐♥❣ ❝♦♠♣❛r✐s♦♥s ✇✐t❤✐♥ t❤❡ s♣❧✐t ❣r♦✉♣s ❡♥s✉r❡s✺✵

t❤❛t s✉❜s❡q✉❡♥t ✐♥❢❡r❡♥❝❡s ❛r❡ ✐♥❞❡♣❡♥❞❡♥t ♦❢ t❤❡ ❣r❛♠ ♣♦s✐t✐✈❡ ✲ ❣r❛♠ ♥❡❣❛✲✺✶

t✐✈❡ s♣❧✐t ✇❤✐❝❤ ✇❡ ❤❛✈❡ ❛❧r❡❛❞② ♦❜t❛✐♥❡❞✳ ❆❧❧ ♦❢ t❤✐s ❛♥❛❧②s✐s ♠✉st ❜❡ ❞♦♥❡✺✷

❝♦♥s✐st❡♥t ✇✐t❤ t❤❡ ❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛t✉r❡ ♦❢ s❡q✉❡♥❝❡ ❝♦✉♥t ❞❛t❛✳✺✸

❍❡r❡✱ ✇❡ ♣r♦✈✐❞❡ ❛ ♠❡t❤♦❞ t♦ ❛♥❛❧②③❡ ♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧✺✹

❞❛t❛✳ ❚❤❡ ❛❧❣♦r✐t❤♠✱ r❡❢❡rr❡❞ t♦ ❛s ✏♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✑✱ ✐t❡r❛t✐✈❡❧② ✐❞❡♥t✐✜❡s✺✺

t❤❡ ♠♦st ✐♠♣♦rt❛♥t ❝❧❛❞❡s ❞r✐✈✐♥❣ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛ t❤r♦✉❣❤ t❤❡✐r ❛ss♦❝✐❛✲✺✻

t✐♦♥s ✇✐t❤ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s✳ ❈❧❛❞❡s ❛r❡ ❝❤♦s❡♥ ❜❛s❡❞ ♦♥ s♦♠❡ ♠❡tr✐❝ ♦❢✺✼

t❤❡ str❡♥❣t❤ ♦r ✐♠♣♦rt❛♥❝❡ ♦❢ t❤❡✐r r❡❣r❡ss✐♦♥s ✇✐t❤ ♠❡t❛✲❞❛t❛✱ ❛♥❞ s✉❜s❡q✉❡♥t✺✽

❝❧❛❞❡s ❛r❡ ❝❤♦s❡♥ ❜② ❝♦♠♣❛r✐s♦♥ ♦❢ s✉❜✲❝❧❛❞❡s ✇✐t❤✐♥ t❤❡ ♣r❡✈✐♦✉s❧②✲✐❞❡♥t✐✜❡❞✺✾

❜✐♥s ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ ❣r♦✉♣s✳ ❊❛❝❤ ✏❢❛❝t♦r✑ ✐❞❡♥t✐✜❡❞ ❝♦rr❡s♣♦♥❞s t♦ ❛♥ ❡❞❣❡ ✐♥✻✵

t❤❡ ♣❤②❧♦❣❡♥②✱ ❛♥❞ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❜✉✐❧❞s ♦♥ ❧✐t❡r❛t✉r❡ ❢r♦♠ ❝♦♠♣♦s✐t✐♦♥❛❧✻✶

❞❛t❛ ❛♥❛❧②s✐s t♦ ❝♦♥str✉❝t ❛ s❡t ♦❢ ♦rt❤♦❣♦♥❛❧ ❛①❡s ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤♦s❡ ❡❞❣❡s❀✻✷

t❤❡ ♦✉t♣✉t ♦rt❤♦♥♦r♠❛❧ ❜❛s✐s ❛❧❧♦✇s t❤❡ ♣r♦❥❡❝t✐♦♥ ♦❢ s❡q✉❡♥❝❡✲❝♦✉♥t r❡❧❛t✐✈❡✻✸

❛❜✉♥❞❛♥❝❡s ♦♥t♦ t❤❡s❡ ♣❤②❧♦❣❡♥❡t✐❝ ❛①❡s ❢♦r ❞✐♠❡♥s✐♦♥❛❧✐t② r❡❞✉❝t✐♦♥✱ ✈✐s✉❛❧✲✻✹

✐③❛t✐♦♥✱ ❛♥❞ st❛♥❞❛r❞ ♠✉❧t✐✈❛r✐❛t❡ st❛t✐st✐❝❛❧ ❛♥❛❧②s❡s✳ ❚❤❡ ✈✐s✉❛❧✐③❛t✐♦♥s ❛♥❞✻✺

✐♥❢❡r❡♥❝❡s ❞r❛✇♥ ❢r♦♠ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❛♥ ❜❡ t✐❡❞ ❜❛❝❦ t♦ s♣❧✐ts ✐♥ ❛ ❣✐✈❡♥✻✻

♣❤②❧♦❣❡♥❡t✐❝ tr❡❡ ❛♥❞ t❤❡r❡❜② ❛❧❧♦✇ r❡s❡❛r❝❤❡rs t♦ ❛♥♥♦t❛t❡ t❤❡ ♠✐❝r♦❜✐❛❧ ♣❤②✲✻✼

❧♦❣❡♥② ❢r♦♠ t❤❡ r❡s✉❧ts ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✳✻✽

Page 7: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❲❡ s❤♦✇ ✇✐t❤ s✐♠✉❧❛t✐♦♥s t❤❛t ♣❤②❧♦❢❛❝t♦r ✐s ❛❜❧❡ t♦ ❝♦rr❡❝t❧② ✐❞❡♥t✐❢② ❛✛❡❝t❡❞✻✾

❝❧❛❞❡s✳ ❲❡ t❤❡♥ ♣❤②❧♦❢❛❝t♦r ❛ ❞❛t❛s❡t ♦❢ ❤✉♠❛♥ ♦r❛❧ ❛♥❞ ❢❡❝❛❧ ♠✐❝r♦❜✐♦♠❡s t♦✼✵

❞❡t❡r♠✐♥❡ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ❞r✐✈✐♥❣ ✈❛r✐❛t✐♦♥ ✐♥ ❤✉♠❛♥ ❜♦❞② s✐t❡ ❬✹❪✱ ❛♥❞✼✶

❛ ❞❛t❛s❡t ♦❢ s♦✐❧ ♠✐❝r♦❜❡s ✉s✐♥❣ ❛ ♠✉❧t✐♣❧❡ r❡❣r❡ss✐♦♥ ♦❢ ♣❍✱ ❝❛r❜♦♥ ❝♦♥❝❡♥tr❛✲✼✷

t✐♦♥ ❛♥❞ ♥✐tr♦❣❡♥ ❝♦♥❝❡♥tr❛t✐♦♥ ❬✸✻❪✳ ■♥ t❤❡ ❤✉♠❛♥ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡t✱ ✇❡ ✜♥❞✼✸

t❤r❡❡ s♣❧✐ts ✐♥ t❤❡ ♣❤②❧♦❣❡♥② t❤❛t t♦❣❡t❤❡r ❝❛♣t✉r❡ ✶✼✳✻✪ ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ❝♦♠✲✼✹

♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥ ❛❝r♦ss t✇♦ ❜♦❞② s✐t❡s✳ ❲❡ ✉s❡ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ t♦ ✜♥❞✼✺

t❤❛t t❤❡ ❞♦♠✐♥❛♥t ❢❡❛t✉r❡s ❞r✐✈✐♥❣ ❜♦❞②✲s✐t❡ ✈❛r✐❛t✐♦♥ ❛r❡ ✐♥✈✐s✐❜❧❡ t♦ t❛①♦♥♦♠②✲✼✻

❜❛s❡❞ ❛♥❛❧②s❡s✱ ✐♥❝❧✉❞✐♥❣ s♣❧✐ts ❜❡t✇❡❡♥ ✉♥❝❧❛ss✐✜❡❞ ❖❚❯s✱ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡s✼✼

t❤❛t s♣❛♥ s❡✈❡r❛❧ t❛①♦♥♦♠✐❝ ❣r♦✉♣s✱ ❛♥❞ ❛ s♣❡❝tr✉♠ ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ s❝❛❧❡s ❢♦r✼✽

❜✐♥♥✐♥❣ ❖❚❯s ❜❛s❡❞ ♦♥ ❤❛❜✐t❛t ♣r❡❢❡r❡♥❝❡s✳ ■♥ t❤❡ s♦✐❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡t✱ ✇❡✼✾

✉s❡ ♣❤②❧♦❢❛❝t♦r✲❜❛s❡❞ ❞✐♠❡♥s✐♦♥❛❧✐t② r❡❞✉❝t✐♦♥ t♦ ✈✐s✉❛❧✐③❡ ❛♥❞ q✉❛♥t✐t❛t✐✈❡❧②✽✵

❝♦♥✜r♠ t❤❛t ♣❍ ❞r✐✈❡s ♠♦st ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ s♦✐❧ ❞❛t❛s❡t ❜✉t ✇❡ ❡♠♣❤❛✲✽✶

s✐③❡ t❤❛t t❤❡ ❛①❡s ❢r♦♠ ♣❤②❧♦❢❛❝t♦r ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ ♣❧♦ts ❝♦rr❡s♣♦♥❞ t♦✽✷

✐❞❡♥t✐✜❛❜❧❡ ❡❞❣❡s ♦♥ t❤❡ ♣❤②❧♦❣❡♥② t❤❛t ❤❛✈❡ ❝❧❡❛r ❜✐♦❧♦❣✐❝❛❧ ✐♥t❡r♣r❡t❛t✐♦♥s✽✸

❛♥❞ ❝❛♥ ❜❡ ✉s❡❞ ❛♥❞ t❡st❡❞ ❛❝r♦ss st✉❞✐❡s✳ ❯s❡r✲❢r✐❡♥❞❧② ❝♦❞❡ ❢♦r ✐♠♣❧❡♠❡♥t✐♥❣✱✽✹

s✉♠♠❛r✐③✐♥❣ ❛♥❞ ✈✐s✉❛❧✐③✐♥❣ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s ♣r♦✈✐❞❡❞ ✐♥ ❛♥ ❘ ♣❛❝❦❛❣❡ ✲✽✺

✬♣❤②❧♦❢❛❝t♦r✬ ❛♥❞ ❛ t✉t♦r✐❛❧ ✐s ❛✈❛✐❧❛❜❧❡ ♦♥❧✐♥❡✳✽✻

▼❛t❡r✐❛❧s ✫ ▼❡t❤♦❞s✽✼

P❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✽✽

▼✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ❛r❡ ✏♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✑✱ ❝♦♠✲✽✾

♣♦s✐t✐♦♥s ♦❢ ♣❛rts ❧✐♥❦❡❞ t♦❣❡t❤❡r ❜② ❛ ♣❤②❧♦❣❡♥② ❢♦r ✇❤✐❝❤ ♦♥❧② ✐♥❢❡r❡♥❝❡s ♦♥✾✵

r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ❝❛♥ ❜❡ ❞r❛✇♥✳ ❚❤❡ ♣❤②❧♦❣❡♥② ✐s t❤❡ s❝❛✛♦❧❞✐♥❣ ❢♦r t❤❡✾✶

❡✈♦❧✉t✐♦♥ ♦❢ ✈❡rt✐❝❛❧❧②✲tr❛♥s♠✐tt❡❞ tr❛✐ts✱ ❛♥❞ ✈❡rt✐❝❛❧❧②✲tr❛♥s♠✐tt❡❞ tr❛✐ts ♠❛②✾✷

✉♥❞❡r❧✐❡ ❛♥ ♦r❣❛♥✐s♠✬s ❢✉♥❝t✐♦♥❛❧ ❡❝♦❧♦❣② ❛♥❞ r❡s♣♦♥s❡ t♦ ♣❡rt✉r❜❛t✐♦♥s ♦r ❡♥✈✐✲✾✸

r♦♥♠❡♥t❛❧ ❣r❛❞✐❡♥ts✳ P❡r❢♦r♠✐♥❣ ✐♥❢❡r❡♥❝❡ ♦♥ t❤❡ ❡❞❣❡s ✐♥ ❛ ♣❤②❧♦❣❡♥② ❞r✐✈✐♥❣✾✹

✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛ ❝❛♥ ❜❡ ✉s❡❢✉❧ ❢♦r ✐❞❡♥t✐❢②✐♥❣ ❝❧❛❞❡s ✇✐t❤ ♣✉t❛t✐✈❡ tr❛✐ts✾✺

❝❛✉s✐♥❣ r❡❧❛t❡❞ t❛①❛ t♦ r❡s♣♦♥❞ s✐♠✐❧❛r❧② t♦ tr❡❛t♠❡♥ts✱ ❜✉t s✉❝❤ ✐♥❢❡r❡♥❝❡s✾✻

♠✉st ❛❝❝♦✉♥t ❢♦r t❤❡ ❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛t✉r❡ ♦❢ t❤❡ s❡q✉❡♥❝❡✲❝♦✉♥t ❞❛t❛✳✾✼

❆ st❛♥❞❛r❞ ❛♥❛❧②s✐s ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ✉s❡s ♦♥❧② t❤❡ ❞✐st❛❧ ❡❞❣❡s ♦❢ t❤❡ tr❡❡✾✽

✲ t❤❡ ❖❚❯s ✲ ❛♥❞ ❛ ❢❡✇ ❡❞❣❡s ✇✐t❤✐♥ t❤❡ tr❡❡ s❡♣❛r❛t✐♥❣ ▲✐♥♥❛❡❛♥ t❛①♦♥♦♠✐❝✾✾

❣r♦✉♣s✳ ❍♦✇❡✈❡r✱ ❛ ♣❤②❧♦❣❡♥② ♦❢ D t❛①❛ ❛♥❞ ♥♦ ♣♦❧②t♦♠✐❡s ✐s ❝♦♠♣♦s❡❞ ♦❢✶✵✵

Page 8: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

2D − 3 ❡❞❣❡s✱ ❡❛❝❤ ❝♦♥♥❡❝t✐♥❣ t✇♦ ❞✐s❥♦✐♥t s❡ts ♦❢ t❛①❛ ✐♥ t❤❡ tr❡❡ ✇✐t❤ ♥♦✶✵✶

❣✉❛r❛♥t❡❡ t❤❛t s♣❧✐ts ✐♥ ▲✐♥♥❛❡❛♥ t❛①♦♥♦♠② ❝♦rr❡s♣♦♥❞s t♦ ♣❤②❧♦❣❡♥❡t✐❝ s♣❧✐ts✶✵✷

❞r✐✈✐♥❣ ✈❛r✐❛t✐♦♥ ✐♥ ♦✉r ❞❛t❛s❡t✳ ❚❤✉s✱ ✐♥st❡❛❞ ♦❢ ❛♥❛❧②③✐♥❣ ❥✉st t❤❡ t✐♣s ❛♥❞ ❛✶✵✸

s❡r✐❡s ♦❢ ▲✐♥♥❛❡❛♥ s♣❧✐ts ✐♥ t❤❡ tr❡❡✱ ❛ ♠♦r❡ r♦❜✉st ❛♥❛❧②s✐s ♦❢ ♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲✶✵✹

str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ s❤♦✉❧❞ ❛♥❛❧②③❡ ❛❧❧ ♦❢ t❤❡ ❡❞❣❡s ✐♥ t❤❡ tr❡❡✳ ❚♦✶✵✺

❞♦ t❤❛t✱ ✇❡ ❞r❛✇ ♦♥ t❤❡ ✐s♦♠❡tr✐❝ ❧♦❣✲r❛t✐♦ tr❛♥s❢♦r♠ ❢r♦♠ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✶✵✻

❛♥❛❧②s✐s✱ ✇❤✐❝❤ ❤❛s ❜❡❡♥ ✉s❡❞ t♦ s❡❛r❝❤ ❢♦r ❛ t❛①♦♥♦♠✐❝ s✐❣♥❛t✉r❡ ♦❢ ♦❜❡s✐t②✶✵✼

✐♥ t❤❡ ❤✉♠❛♥ ❣✉t ✢♦r❛ ❬✶✺❪ ❛♥❞ ✐♥❝♦r♣♦r❛t❡❞ ✐♥t♦ ♣❛❝❦❛❣❡s ❢♦r ❞♦✇♥str❡❛♠✶✵✽

♣r✐♥❝✐♣❛❧ ❝♦♠♣♦♥❡♥ts ❛♥❛❧②s✐s ❬✷✹❪✳ ❍♦✇❡✈❡r✱ t♦ t❤❡ ❜❡st ♦❢ ♦✉r ❦♥♦✇❧❡❞❣❡✱✶✵✾

t❤❡ ♣r❡✈✐♦✉s ❧✐t❡r❛t✉r❡ ✉s✐♥❣ t❤❡ ✐s♦♠❡tr✐❝ ❧♦❣✲r❛t✐♦ tr❛♥s❢♦r♠ ✐♥ ♠✐❝r♦❜✐♦♠❡✶✶✵

❞❛t❛s❡ts ❤❛s ✉s❡❞ r❛♥❞♦♠ ♦r st❛♥❞❛r❞ s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥s✱ ❛♥❞ ♥♦t✶✶✶

❡①♣❧✐❝✐t❧② ✐♥❝♦r♣♦r❛t❡❞ t❤❡ ♣❤②❧♦❣❡♥② ❛s t❤❡✐r s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥✳✶✶✷

❚❤❡ ■s♦♠❡tr✐❝ ▲♦❣✲❘❛t✐♦ ❚r❛♥s❢♦r♠ ♦❢ ❛ r♦♦t❡❞ ♣❤②❧♦❣❡♥②✶✶✸

❚❤❡ ✐s♦♠❡tr✐❝ ❧♦❣✲r❛t✐♦ ✭■▲❘✮ tr❛♥s❢♦r♠ ✇❛s ❞❡✈❡❧♦♣❡❞ ❛s ❛ ✇❛② t♦ tr❛♥s❢♦r♠✶✶✹

❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ ❢r♦♠ t❤❡ s✐♠♣❧❡① ✐♥t♦ r❡❛❧ s♣❛❝❡ ✇❤❡r❡ st❛♥❞❛r❞ st❛t✐st✐❝❛❧✶✶✺

t♦♦❧s ❝❛♥ ❜❡ ❛♣♣❧✐❡❞ ❬✶✶✱ ✶✵❪✳ ❆ s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥ ✐s ✉s❡❞ t♦ ❝♦♥str✉❝t ❛✶✶✻

♥❡✇ s❡t ♦❢ ❝♦♦r❞✐♥❛t❡s✱ ❛♥❞ t❤❡ ♣❤②❧♦❣❡♥② ✐s ❛ ♥❛t✉r❛❧ ❝❤♦✐❝❡ ❢♦r t❤❡ s❡q✉❡♥t✐❛❧✶✶✼

❜✐♥❛r② ♣❛rt✐t✐♦♥ ✐♥ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✳ ■♥st❡❛❞ ♦❢ ❛♥❛❧②③✐♥❣ r❡❧❛t✐✈❡ ❛❜✉♥✲✶✶✽

❞❛♥❝❡s✱ yi✱ ♦❢ D ❞✐✛❡r❡♥t ❖❚❯s✱ t❤❡ ■▲❘ tr❛♥s❢♦r♠ ♣r♦❞✉❝❡s D−1 ❝♦♦r❞✐♥❛t❡s✱✶✶✾

x∗i ✭❝❛❧❧❡❞ ✏❜❛❧❛♥❝❡s✑✮✳ ❊❛❝❤ ❜❛❧❛♥❝❡ ❝♦rr❡s♣♦♥❞s t♦ ❛ s✐♥❣❧❡ ✐♥t❡r♥❛❧ ♥♦❞❡ ♦❢ t❤❡✶✷✵

tr❡❡ ❛♥❞ r❡♣r❡s❡♥ts t❤❡ ❛✈❡r❛❣❡❞ ❞✐✛❡r❡♥❝❡ ✐♥ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡ ❜❡t✇❡❡♥ t❤❡✶✷✶

t❛①❛ ✐♥ t❤❡ t✇♦ s✐st❡r ❝❧❛❞❡s ❞❡s❝❡♥❞✐♥❣ ❢r♦♠ t❤❛t ♥♦❞❡ ✭t❤❡ ❞✐✛❡r❡♥❝❡ ❜❡✐♥❣✶✷✷

❛♣♣r♦♣r✐❛t❡❧② ♠❡❛s✉r❡❞ ❛s ❛ ❧♦❣✲r❛t✐♦ ❞✉❡ t♦ t❤❡ ❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛t✉r❡ ♦❢ t❤❡✶✷✸

❞❛t❛❀ s❡❡ ❙■ ❢♦r ♠♦r❡ ❞❡t❛✐❧❡❞ ❞❡s❝r✐♣t✐♦♥ ♦❢ t❤❡ ■▲❘ tr❛♥s❢♦r♠✮✳ ❋♦r ❛♥ ❛r❜✐✲✶✷✹

tr❛r② ♥♦❞❡ ✐♥❞✐❝❛t✐♥❣ t❤❡ s♣❧✐t ♦❢ ❛ ❣r♦✉♣✱ R ✇✐t❤ r ❡❧❡♠❡♥ts ❢r♦♠ t❤❡ ❣r♦✉♣✱ S✶✷✺

✇✐t❤ s ❡❧❡♠❡♥ts✱ t❤❡ ■▲❘ ❜❛❧❛♥❝❡ ❝❛♥ ❜❡ ✇r✐tt❡♥ ❛s✶✷✻

x∗{R,S} =

rs

r + slog

(

g(yR)

g(yS)

)

✭✶✮

✇❤❡r❡ g(yR) ✐s t❤❡ ❣❡♦♠❡tr✐❝ ♠❡❛♥ ♦❢ ❛❧❧ yi ❢♦r i ∈ R✳✶✷✼

❲❡ r❡❢❡r t♦ t❤❡ ■▲❘ tr❛♥s❢♦r♠ ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛ r♦♦t❡❞ ♣❤②❧♦❣❡♥② ❛s t❤❡✶✷✽

✏r♦♦t❡❞ ■▲❘✑✳ ❚❤❡ r♦♦t❡❞ ■▲❘ ❝r❡❛t❡s ❛ s❡t ♦❢ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱ {x∗i }✱ ✇❤❡r❡✶✷✾

❡❛❝❤ ❝♦♦r❞✐♥❛t❡ ❝♦rr❡s♣♦♥❞s t♦ t❤❡ ✏❜❛❧❛♥❝❡✑ ❜❡t✇❡❡♥ s✐st❡r ❝❧❛❞❡s ❛t ❡❛❝❤ s♣❧✐t✶✸✵

✐♥ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡✳ ❚❤❡ ❜❛❧❛♥❝❡s ✐♥ ❛ r♦♦t❡❞ ■▲❘ tr❛♥s❢♦r♠ ✐♥ ❡q✉❛t✐♦♥ ✭✶✮✶✸✶

❝❛♥ ❜❡ ✐♥t✉✐t❡❞ ❛s t❤❡ ❛✈❡r❛❣❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥ t❛①❛ ✐♥ t✇♦ ❣r♦✉♣s✱ ❛♥❞ s♣❧✐ts✶✸✷

Page 9: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

✐♥ t❤❡ tr❡❡ ✇❤✐❝❤ ♠❡❛♥✐♥❣❢✉❧❧② ❞✐✛❡r❡♥t✐❛t❡ t❛①❛ ✇✐❧❧ ❜❡ t❤♦s❡ s♣❧✐ts ✐♥ ✇❤✐❝❤✶✸✸

t❤❡ ❛✈❡r❛❣❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥ t❛①❛ ✐♥ t✇♦ ❣r♦✉♣s ❝❤❛♥❣❡s ♣r❡❞✐❝t❛❜❧② ✇✐t❤ ❛♥✶✸✹

✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡✳ ■♥❢❡r❡♥❝❡s ♦♥ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱ t❤❡♥✱ ♠❛♣ t♦ ✐♥❢❡r❡♥❝❡s✶✸✺

♦♥ ❧✐♥❡❛❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡✳✶✸✻

❚❤❡ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡s ♣r♦✈✐❞❡ ❛ ♥❛t✉r❛❧ ✇❛② t♦ ❛♥❛❧②③❡ ♠✐❝r♦❜✐♦t❛ ❞❛t❛✶✸✼

❛s t❤❡② ♠❡❛s✉r❡ t❤❡ ❞✐✛❡r❡♥❝❡ ✐♥ t❤❡ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ♦❢ s✐st❡r ❝❧❛❞❡s ❛♥❞✶✸✽

♠❛② ❜❡ ✉s❡❢✉❧ ✐♥ ✐❞❡♥t✐❢②✐♥❣ ❡✛❡❝ts ❝♦♥t❛✐♥❡❞ ✇✐t❤✐♥ ❝❧❛❞❡s s✉❝❤ ❛s ③❡r♦✲s✉♠✶✸✾

❝♦♠♣❡t✐t✐♦♥ ♦❢ ❝❧♦s❡ r❡❧❛t✐✈❡s ♦r t❤❡ s✉❜st✐t✉t✐♦♥ ♦❢ ♦♥❡ r❡❧❛t✐✈❡ ❢♦r ❛♥♦t❤❡r✶✹✵

❛❝r♦ss ❡♥✈✐r♦♥♠❡♥ts✳ ❍♦✇❡✈❡r✱ ✐❢ ✇❡ ❞❡s✐r❡ t♦ ❧✐♥❦ t❤❡ ❡✛❡❝t ♦❢ ❛ ❞❡s✐❣♥ ✈❛r✐❛❜❧❡✶✹✶

♦r ❛♥ ❡①t❡r♥❛❧ ❝♦✈❛r✐❛t❡ ✭❡✳❣✳ ❛♥t✐❜✐♦t✐❝s ✈s✳ ♥♦ ❛♥t✐❜✐♦t✐❝ tr❡❛t♠❡♥t✮ t♦ ❝❧❛❞❡s✶✹✷

✇✐t❤✐♥ t❤❡ ♣❤②❧♦❣❡♥②✱ t❤❡ ❜❡st ❝♦♠♣❛r✐s♦♥ ♠❛② ♥♦t ❜❡ ❜❡t✇❡❡♥ s✐st❡r ❝❧❛❞❡s✱ ❜✉t✶✹✸

✐♥st❡❛❞ ❜❡t✇❡❡♥ ❛❧❧ ♦t❤❡r ❝❧❛❞❡s✱ ❝♦♥tr♦❧❧✐♥❣ ❢♦r ❛♥② ♦t❤❡r ♣❤②❧♦❣❡♥❡t✐❝ s♣❧✐ts ♦r✶✹✹

❢❛❝t♦rs ✇❡ ♠❛② ❦♥♦✇ ♦❢ ✭❡✳❣✳ ✇❡ ♠❛② ❝♦♠♣❛r❡ ❛ ❧✐♥❡❛❣❡ ✇✐t❤✐♥ ❣r❛♠✲♣♦s✐t✐✈❡s✶✹✺

✇✐t❤ ❛❧❧ ♦t❤❡r ❣r❛♠✲♣♦s✐t✐✈❡s✱ ♦♥❝❡ ✇❡✬✈❡ ✐❞❡♥t✐✜❡❞ t❤❡ ❣r❛♠✲♣♦s✐t✐✈❡ ✈s✳ ❣r❛♠✲✶✹✻

♥❡❣❛t✐✈❡ s♣❧✐t ❛s ❛♥ ✐♠♣♦rt❛♥t ❢❛❝t♦r ❢♦r ❛♥t✐❜✐♦t✐❝ s✉s❝❡♣t✐❜✐❧✐t②✮✳ ❲❡ r❡❢❡r t♦✶✹✼

t❤✐s ✉♥r♦♦t❡❞ ❛♣♣r♦❛❝❤ ❛s ✬♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✬✳✶✹✽

❋♦r t❤❡ t❛s❦ ♦❢ ❧✐♥❦✐♥❣ ❛♥ ❡①t❡r♥❛❧ ❝♦✈❛r✐❛t❡ t♦ ✐♥❞✐✈✐❞✉❛❧ ❝❧❛❞❡s ✐♥ t❤❡ ♣❤②✲✶✹✾

❧♦❣❡♥②✱ ✇❡ ❡①❛♠✐♥❡ t❤r❡❡ ❢❡❛t✉r❡s ♦❢ t❤❡ r♦♦t❡❞ ■▲❘ t❤❛t ❝❛♥ ❜❡ ✐♠♣r♦✈❡❞ ♦♥✶✺✵

❜② ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❜② ❝♦♥s✐❞❡r✐♥❣ ❛ tr❡❛t♠❡♥t t❤❛t ❞❡❝r❡❛s❡s t❤❡ ❛❜✉♥❞❛♥❝❡✶✺✶

♦❢ ♦♥❡ ❛♥❞ ♦♥❧② ♦♥❡ ❝❧❛❞❡✱ B✱ ✇❤♦s❡ ❝❧♦s❡st r❡❧❛t✐✈❡ ✐s ❝❧❛❞❡ A✳ ❘❡❣r❡ss✐♦♥ ♦♥✶✺✷

t❤❡ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡s ♠❛② ✐❞❡♥t✐❢② t❤❡ ❜❛❧❛♥❝❡ x∗{A,B} ❝♦rr❡s♣♦♥❞✐♥❣ t♦✶✺✸

t❤❡ ♠♦st r❡❝❡♥t ❝♦♠♠♦♥ ❛♥❝❡st♦r ♦❢ ❝❧❛❞❡s A ❛♥❞ B ❛s ❤❛✈✐♥❣ t❤❛t str♦♥❣❡st✶✺✹

r❡s♣♦♥s❡ t♦ t❤❡ tr❡❛t♠❡♥t✱ ❜✉t r❡❣r❡ss✐♦♥ ♦♥ t❤✐s ❝♦♦r❞✐♥❛t❡ ✇✐❧❧ s✉❣❣❡st t❤❛t✶✺✺

❝❧❛❞❡ B ❞❡❝r❡❛s❡s r❡❧❛t✐✈❡ t♦ A✱ ❧❡❛❞✐♥❣ t♦ str✉❝t✉r❡❞ r❡s✐❞✉❛❧s ✐♥ t❤❡ ♦r✐❣✐♥❛❧✶✺✻

❞❛t❛s❡t ❞✉❡ t♦ ❛♥ ✐♥❛❜✐❧✐t② t♦ ❛❝❝♦✉♥t ❢♦r t❤❡ ✐♥❝r❡❛s❡ ✐♥ ❝❧❛❞❡ B r❡❧❛t✐✈❡ t♦ t❤❡✶✺✼

r❡st ♦❢ t❤❡ ❖❚❯s ✐♥ t❤❡ ❞❛t❛ ✭❋✐❣✳ ✶❛✮✳ ❙❡❝♦♥❞✱ ❛❧❧ ♣❛rt✐t✐♦♥s ❜❡t✇❡❡♥ ❛✛❡❝t❡❞✶✺✽

❝❧❛❞❡ ❛♥❞ t❤❡ r♦♦t ✇✐❧❧ ❜❡ ❛✛❡❝t❡❞✳ ■❢ ❡❛❝❤ ❜❛❧❛♥❝❡ ✐s t❡st❡❞ ✐♥❞❡♣❡♥❞❡♥t❧②✱✶✺✾

t❤❡ r♦♦t❡❞ ■▲❘ ♠❛② ✐❞❡♥t✐❢② ♠❛♥② ❝❧❛❞❡s t❤❛t ❛r❡ ❛✛❡❝t❡❞ ❜② ❛♥t✐❜✐♦t✐❝s❀ t❤❡✶✻✵

❝♦rr❡❧❛t✐♦♥s ❜❡t✇❡❡♥ ❝♦♦r❞✐♥❛t❡s ❝❛♥ ②✐❡❧❞ ❛ ❤✐❣❤ ❢❛❧s❡✲♣♦s✐t✐✈❡ r❛t❡ ✐❢ ❥✉st ♦♥❡✶✻✶

❝❧❛❞❡ ✐s ❛✛❡❝t❡❞ ✭❋✐❣✳ ✶❜✮✳ ❋✐♥❛❧❧②✱ t❤❡ ■▲❘ tr❛♥s❢♦r♠❛t✐♦♥ ❞♦❡s ♥♦t ✇♦r❦ ✇✐t❤✶✻✷

♣♦❧②t♦♠✐❡s ❝♦♠♠♦♥ ✐♥ r❡❛❧✱ ✉♥r❡s♦❧✈❡❞ ♣❤②❧♦❣❡♥✐❡s✳ ❆♥② ♣♦❧②t♦♠② ✇✐❧❧ ♣r♦❞✉❝❡✶✻✸

❛ s♣❧✐t ✐♥ t❤❡ ♣❤②❧♦❣❡♥② ❜❡t✇❡❡♥ t❤r❡❡ ♦r ♠♦r❡ t❛①❛✱ ❛♥❞ t❤❡r❡ ✐s ♥♦ ❣❡♥❡r❛❧✶✻✹

✇❛② t♦ ❞❡s❝r✐❜❡ t❤❡ ❜❛❧❛♥❝❡ ♦❢ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ♦❢ t❤r❡❡ ♦r ♠♦r❡ ♣❛rts ✉s✐♥❣✶✻✺

♦♥❧② ♦♥❡ ❝♦♦r❞✐♥❛t❡✳✶✻✻

◆♦♥❡t❤❡❧❡ss✱ t❤❡ s✐♠♣❧✐❝✐t② ❛♥❞ t❤❡♦r❡t✐❝❛❧ ❢♦✉♥❞❛t✐♦♥s ✉♥❞❡r❧②✐♥❣ t❤❡ ■▲❘✱ ❛♥❞✶✻✼

t❤❡ ✐♥st❛♥t ❛♣♣❡❛❧ ♦❢ ❛♣♣❧②✐♥❣ ✐t t♦ t❤❡ s❡q✉❡♥t✐❛❧✲❜✐♥❛r② ♣❛rt✐t✐♦♥ ♦❢ t❤❡ ♣❤②✲✶✻✽

Page 10: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❧♦❣❡♥②✱ ♠♦t✐✈❛t❡ t❤❡ r♦♦t❡❞ ■▲❘ ❛s ❛ s✐♠♣❧❡ t♦♦❧ ❢♦r ❛♥❛❧②s✐s ♦❢ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝✶✻✾

str✉❝t✉r❡ ✐♥ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✳ ❋♦r t❤❛t r❡❛s♦♥✱ ✇❡ ✉s❡ t❤❡ r♦♦t❡❞ ■▲❘ ❛s ❛✶✼✵

❜❛s❡❧✐♥❡ ❢♦r ❝♦♠♣❛r✐s♦♥ ♦❢ ♦✉r ♠♦r❡ ❝♦♠♣❧✐❝❛t❡❞ ♠❡t❤♦❞ ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦r✲✶✼✶

✐③❛t✐♦♥✳✶✼✷

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✶✼✸

❚❤❡ s❤♦rt❝♦♠✐♥❣s ♦❢ t❤❡ r♦♦t❡❞ ■▲❘ ❝❛♥ ❜❡ r❡♠❡❞✐❡❞ ❜② ♠♦❞✐❢②✐♥❣ t❤❡ ■▲❘✶✼✹

tr❛♥s❢♦r♠ t♦ ❛♣♣❧② ♥♦t t♦ t❤❡ ♥♦❞❡s ♦r s♣❧✐ts ✐♥ ❛ ♣❤②❧♦❣❡♥②✱ ❜✉t t♦ t❤❡ ❡❞❣❡s ✐♥✶✼✺

❛♥ ✉♥r♦♦t❡❞ ♣❤②❧♦❣❡♥②✳ ❲❤✐❧❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s ♦❢ ♥♦❞❡s ❛❧❧♦✇ ❛ ❝♦♠♣❛r✐s♦♥ ♦❢✶✼✻

s✐st❡r ❝❧❛❞❡s✱ ■▲❘ ❝♦♦r❞✐♥❛t❡s ❛❧♦♥❣ ❡❞❣❡s ❛❧❧♦✇ ❝♦♠♣❛r✐s♦♥ ♦❢ t❛①❛ ✇✐t❤ ♣✉t❛t✐✈❡✶✼✼

tr❛✐ts t❤❛t ❛r♦s❡ ❛❧♦♥❣ t❤❡ ❡❞❣❡ ❛❣❛✐♥st ❛❧❧ t❛①❛ ✇✐t❤♦✉t t❤♦s❡ ♣✉t❛t✐✈❡ tr❛✐ts✳✶✼✽

❚r❛✐ts ❛r✐s❡ ❛❧♦♥❣ ❡❞❣❡s ♦❢ t❤❡ ♣❤②❧♦❣❡♥② ❛♥❞ s♦✱ ❢♦r ❛♥♥♦t❛t✐♦♥ ♦❢ ♣❤②❧♦❣❡♥✐❡s✱✶✼✾

❡✛❡❝ts ✐♥ ❛ ❝❧❛❞❡ ❛r❡ ❜❡st ♠❛♣♣❡❞ t♦ ❛ ❝❤❛✐♥ ♦❢ ❡❞❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✳✶✽✵

❍♦✇❡✈❡r✱ t❤❡ ■▲❘ tr❛♥s❢♦r♠ r❡q✉✐r❡s ❛ s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥✱ ❛♥❞ t❤❡ ❡❞❣❡s✶✽✶

❞♦♥✬t ✐♠♠❡❞✐❛t❡❧② ♣r♦✈✐❞❡ ❛ ❝❧❡❛r ❝❛♥❞✐❞❛t❡ ❢♦r ❛ s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥✳ ■♥✶✽✷

✇❤❛t ✇❡ r❡❢❡r t♦ ❛s ✏♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✑✱ ♦♥❡ ❝❛♥ ✐t❡r❛t✐✈❡❧② ❝♦♥str✉❝t ❛ s❡q✉❡♥✲✶✽✸

t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥ ❢r♦♠ t❤❡ ✉♥r♦♦t❡❞ ♣❤②❧♦❣❡♥② ❜② ✉s✐♥❣ ❛ ❣r❡❡❞② ❛❧❣♦r✐t❤♠✶✽✹

❜② s❡q✉❡♥t✐❛❧❧② ❝❤♦♦s✐♥❣ ❡❞❣❡s ✇❤✐❝❤ ♠❛①✐♠✐③❡ ❛ r❡s❡❛r❝❤❡r✬s ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥✳✶✽✺

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝♦♥s✐sts ♦❢ ✸ st❡♣s ✭❋✐❣✳ ✷✮✿ ✭✶✮ ❈♦♥s✐❞❡r t❤❡ s❡t ♦❢ ♣♦ss✐❜❧❡✶✽✻

♣r✐♠❛r② ■▲❘ ❜❛s✐s ❡❧❡♠❡♥ts ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛ ♣❛rt✐t✐♦♥ ❛❧♦♥❣ ❛♥② ❡❞❣❡ ✐♥ t❤❡✶✽✼

tr❡❡ ✭✐♥❝❧✉❞✐♥❣ t❤❡ t✐♣s✮✳ ✭✷✮ ❈❤♦♦s❡ t❤❡ ❡❞❣❡ ✇❤♦s❡ ❝♦rr❡s♣♦♥❞✐♥❣ ■▲❘ ❜❛s✐s✶✽✽

❡❧❡♠❡♥t ♠❛①✐♠✐③❡s s♦♠❡ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✲ s✉❝❤ ❛s t❤❡ t❡st✲st❛t✐st✐❝ ❢r♦♠ r❡✲✶✽✾

❣r❡ss✐♦♥ ♦r t❤❡ ♣❡r❝❡♥t ♦❢ ✈❛r✐❛t✐♦♥ ❡①♣❧❛✐♥❡❞ ✐♥ t❤❡ ♦r✐❣✐♥❛❧ ❞❛t❛s❡t ✲ ❛♥❞ t❤❡✶✾✵

❣r♦✉♣s ♦❢ t❛①❛ s♣❧✐t ❜② t❤❛t ❡❞❣❡ ❢♦r♠ t❤❡ ✜rst ♣❛rt✐t✐♦♥✳ ✭✸✮ ❘❡♣❡❛t st❡♣s ✶ ❛♥❞✶✾✶

✷✱ ❝♦♥str✉❝t✐♥❣ s✉❜s❡q✉❡♥t ■▲❘ ❜❛s✐s ❡❧❡♠❡♥ts ❝♦rr❡s♣♦♥❞✐♥❣ t♦ r❡♠❛✐♥✐♥❣ ❡❞❣❡s✶✾✷

✐♥ t❤❡ ♣❤②❧♦❣❡♥② ❛♥❞ ♠❛❞❡ ♦rt❤♦❣♦♥❛❧ t♦ ❛❧❧ ♣r❡✈✐♦✉s ♣❛rt✐t✐♦♥s ❜② ❧✐♠✐t✐♥❣ t❤❡✶✾✸

❝♦♠♣❛r✐s♦♥s t♦ t❛①❛ ✇✐t❤✐♥ t❤❡ ❣r♦✉♣s ♦❢ t❛①❛ ✉♥✲s♣❧✐t ❜② ♣r❡✈✐♦✉s ♣❛rt✐t✐♦♥s✳✶✾✹

❊①♣❧✐❝✐t❧②✱ t❤❡ ✜rst ✐t❡r❛t✐♦♥ ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝♦♥s✐❞❡rs ❛ s❡t ♦❢ ❝❛♥❞✐✲✶✾✺

❞❛t❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱ {x∗e} ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ t✇♦ ❣r♦✉♣s ♦❢ t❛①❛ s♣❧✐t ❜②✶✾✻

❡❛❝❤ ❡❞❣❡✱ e✳ ❚❤❡♥✱ r❡❣r❡ss✐♦♥ ✐s ♣❡r❢♦r♠❡❞ ♦♥ ❡❛❝❤ ♦❢ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱✶✾✼

x∗e ∼ f(X) ❢♦r ❛♥ ❛♣♣r♦♣r✐❛t❡ ❢✉♥❝t✐♦♥✱ f ❛♥❞ ❛ s❡t ♦❢ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s✱✶✾✽

X✳ ❚❤❡ ❡❞❣❡✱ e+1 ✱ ✇❤✐❝❤ ♠❛①✐♠✐③❡s t❤❡ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✐s ❝❤♦s❡♥ ❛s t❤❡ ✜rst✶✾✾

♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦r✳ ■♥ t❤✐s ♣❛♣❡r✱ ♦✉r ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✐s t❤❡ ❞✐✛❡r❡♥❝❡ ❜❡✲✷✵✵

t✇❡❡♥ t❤❡ ♥✉❧❧ ❞❡✈✐❛♥❝❡ ♦❢ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡ ❛♥❞ t❤❡ ❞❡✈✐❛♥❝❡ ♦❢ t❤❡ ❣❡♥❡r❛❧✐③❡❞✷✵✶

❧✐♥❡❛r ♠♦❞❡❧ ❡①♣❧❛✐♥✐♥❣ t❤❛t ■▲❘ ❝♦♦r❞✐♥❛t❡ ❛s ❛ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ✐♥❞❡♣❡♥❞❡♥t✷✵✷

Page 11: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

✈❛r✐❛❜❧❡s✳ ❲❡ ✉s❡ t❤✐s ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ❛s ❛ ♠❡❛s✉r❡ ♦❢ t❤❡ ❛♠♦✉♥t ♦❢ ✈❛r✐❛♥❝❡✷✵✸

❡①♣❧❛✐♥❡❞ ❜② r❡❣r❡ss✐♦♥ ♦♥ ❡❛❝❤ ❡❞❣❡ ❜❡❝❛✉s❡ t❤❡ t♦t❛❧ ✈❛r✐❛♥❝❡ ✐♥ ❛ ❝♦♠♣♦s✐✲✷✵✹

t✐♦♥❛❧ ❞❛t❛s❡t ✐s ❝♦♥st❛♥t ❛♥❞ ❡q✉❛❧ t♦ t❤❡ s✉♠ ♦❢ t❤❡ ✈❛r✐❛♥❝❡s ♦❢ ❛❧❧ ■▲❘✷✵✺

❝♦♦r❞✐♥❛t❡s ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛♥② s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥✳ ❈♦♥s❡q✉❡♥t❧②✱ ❛t✷✵✻

❡❛❝❤ ✐t❡r❛t✐♦♥ t❤❡r❡ ✐s ❛ ✜①❡❞ ❛♠♦✉♥t ♦❢ t❤❡ t♦t❛❧ ✈❛r✐❛♥❝❡ r❡♠❛✐♥✐♥❣ ✐♥ t❤❡✷✵✼

❞❛t❛s❡t✱ ❛♥❞ s♦ ❛t t❤❡ ❝❛♥❞✐❞❛t❡ ■▲❘ ❝♦♦r❞✐♥❛t❡ ✇❤✐❝❤ ❝❛♣t✉r❡s t❤❡ ❣r❡❛t❡st✷✵✽

❢r❛❝t✐♦♥ ♦❢ t❤❡ t♦t❛❧ ✈❛r✐❛♥❝❡ ✐♥ t❤❡ ❞❛t❛s❡t ✐s t❤❡ ♦♥❡ ✇✐t❤ t❤❡ ❣r❡❛t❡st ❛♠♦✉♥t✷✵✾

♦❢ ✈❛r✐❛♥❝❡ ❡①♣❧❛✐♥❡❞ ❜② t❤❡ r❡❣r❡ss✐♦♥✳ ❆❢t❡r ✐❞❡♥t✐❢②✐♥❣ e+1 ✱ ✇❡ ❝✉t t❤❡ tr❡❡ ✐♥✷✶✵

t✇♦ s✉❜✲tr❡❡s ❛❧♦♥❣ t❤❡ ❡❞❣❡✱ e+1 ✳✷✶✶

❋♦r t❤❡ s❡❝♦♥❞ ✐t❡r❛t✐♦♥✱ ❛♥♦t❤❡r s❡t ♦❢ ❝❛♥❞✐❞❛t❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s ✐s ❝♦♥str✉❝t❡❞✷✶✷

s✉❝❤ t❤❛t t❤❡✐r ✉♥❞❡r❧②✐♥❣ ❜❛❧❛♥❝✐♥❣ ❡❧❡♠❡♥ts ❛r❡ ♦rt❤♦❣♦♥❛❧ t♦ t❤❡ ✜rst ■▲❘✷✶✸

❝♦♦r❞✐♥❛t❡✳ ❖rt❤♦❣♦♥❛❧✐t② ✐s ❡♥s✉r❡❞ ❜② ❝♦♥str✉❝t✐♥❣ ■▲❘ ❝♦♦r❞✐♥❛t❡s ❝♦♥tr❛st✲✷✶✹

✐♥❣ t❤❡ ❛❜✉♥❞❛♥❝❡s ♦❢ t❛①❛ ❛❧♦♥❣ ❡❛❝❤ ❡❞❣❡✱ r❡str✐❝t✐♥❣ t❤❡ ❝♦♥tr❛st t♦ ❛❧❧ t❛①❛✷✶✺

✇✐t❤✐♥ t❤❡ s✉❜✲tr❡❡ ✐♥ ✇❤✐❝❤ t❤❡ ❡❞❣❡ ✐s ❢♦✉♥❞✳ ❆ ♥❡✇ ❡❞❣❡✱ e+2 ✱ ✇❤✐❝❤ ♠❛①✐♠✐③❡s✷✶✻

t❤❡ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✐s ❝❤♦s❡♥ ❛s t❤❡ s❡❝♦♥❞ ❢❛❝t♦r✱ t❤❡ s✉❜✲tr❡❡ ❝♦♥t❛✐♥✐♥❣ t❤✐s✷✶✼

❡❞❣❡ ✐s ❝✉t ❛❧♦♥❣ t❤✐s ❡❞❣❡ t♦ ♣r♦❞✉❝❡ t✇♦ s✉❜✲tr❡❡s✱ ❛♥❞ t❤❡ ♣r♦❝❡ss ✐s r❡♣❡❛t❡❞✷✶✽

✉♥t✐❧ ❛ ❞❡s✐r❡❞ ♥✉♠❜❡r ♦❢ ❢❛❝t♦rs ✐s r❡❛❝❤❡❞ ♦r ✉♥t✐❧ ❛ st♦♣♣✐♥❣ ❝r✐t❡r✐♦♥ ✐s ♠❡t✳✷✶✾

▼♦r❡ ❞❡t❛✐❧s ♦♥ t❤❡ ❛❧❣♦r✐t❤♠✱ ❛❧♦♥❣ ✇✐t❤ ❛ ❞✐s❝✉ss✐♦♥ ♦♥ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥s✱✷✷✵

✐s ❝♦♥t❛✐♥❡❞ ✐♥ t❤❡ ❙■✳✷✷✶

❲❤✐❧❡ ♦♥❡ ❝♦✉❧❞ ✉s❡ ♦t❤❡r ♠❡t❤♦❞s ♦❢ ❛♠❛❧❣❛♠❛t✐♥❣ ❛❜✉♥❞❛♥❝❡s ❛❧♦♥❣ ❡❞❣❡s✱✷✷✷

t❤❡ ❝♦♥❝❡♣t✉❛❧ ✐♠♣♦rt❛♥❝❡ ♦❢ ✉s✐♥❣ t❤❡ ■▲❘ tr❛♥s❢♦r♠ ✐s t✇♦❢♦❧❞✿ t❤❡ ■▲❘ tr❛♥s✲✷✷✸

❢♦r♠ ❤❛s ♣r♦✈❡♥ ❛s②♠♣t♦t✐❝ ♥♦r♠❛❧✐t② ♣r♦♣❡rt✐❡s ❢♦r ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ t♦ ❛❧❧♦✇✷✷✹

t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ st❛♥❞❛r❞ ♠✉❧t✐✈❛r✐❛t❡ ♠❡t❤♦❞s ❬✶✶❪✱ ❛♥❞ t❤❡ ■▲❘ tr❛♥s❢♦r♠✷✷✺

s❡r✈❡s ❛s ❛ ♠❡❛s✉r❡ ♦❢ ❝♦♥tr❛st ❜❡t✇❡❡♥ t✇♦ ❣r♦✉♣s✳ ❚❤❡ ❧♦❣✲r❛t✐♦ ✉s❡❞ ✐♥ ♣❤②❧♦✲✷✷✻

❢❛❝t♦r ✐s ❛♥ ❛✈❡r❛❣❡❞ r❛t✐♦ ♦❢ ❛❜✉♥❞❛♥❝❡s ♦❢ t❛①❛ ♦♥ t✇♦ s✐❞❡s ♦❢ ❛♥ ❡❞❣❡ ✭s❡❡✷✷✼

s✉♣♣❧❡♠❡♥t ❢♦r ♠♦r❡ ❞❡t❛✐❧✮✱ t❤✉s ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ s❡❛r❝❤❡s t❤❡ tr❡❡ ❢♦r t❤❡✷✷✽

❡❞❣❡ ✇❤✐❝❤ ❤❛s t❤❡ ♠♦st ♣r❡❞✐❝t❛❜❧❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥ t❛①❛ ♦♥ ❡❛❝❤ s✐❞❡ ♦❢ t❤❡✷✷✾

❡❞❣❡✱ ♦r✱ ♣✉t ❞✐✛❡r❡♥t❧②✱ t❤❡ ❡❞❣❡ ✇❤✐❝❤ ❜❡st ❞✐✛❡r❡♥t✐❛t❡s t❛①❛ ♦♥ ❡❛❝❤ s✐❞❡✳✷✸✵

❚❤✉s✱ ❡❛❝❤ ❡❞❣❡ t❤❛t ❞✐✛❡r❡♥t✐❛t❡s t❛①❛ ❛♥❞ t❤❡✐r r❡s♣♦♥s❡s t♦ ✐♥❞❡♣❡♥❞❡♥t✷✸✶

✈❛r✐❛❜❧❡s ✐s ❝♦♥s✐❞❡r❡❞ ❛ ♣❤②❧♦❣❡♥❡t✐❝ ✏❢❛❝t♦r✑ ❞r✐✈✐♥❣ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛✳✷✸✷

❚❤❡ ♦✉t♣✉t ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s ❛ s❡t ♦❢ ♦rt❤♦❣♦♥❛❧✱ s❡q✉❡♥t✐❛❧❧② ✏❧❡ss ✐♠✲✷✸✸

♣♦rt❛♥t✑ ■▲❘ ❜❛s✐s ❡❧❡♠❡♥ts✱ t❤❡✐r ♣r❡❞✐❝t❡❞ ❜❛❧❛♥❝❡s✱ ❛♥❞ ❛❧❧ ♦t❤❡r ✐♥❢♦r♠❛t✐♦♥✷✸✹

♦❜t❛✐♥❡❞ ❢r♦♠ r❡❣r❡ss✐♦♥✳ ❆❢t❡r t❤❡ ✜rst ✐t❡r❛t✐♦♥ ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✱ ✇❡ ❛r❡✷✸✺

❧❡❢t ✇✐t❤ ❛♥ ■▲❘ ❜❛s✐s ❡❧❡♠❡♥t ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ ❡❞❣❡ ✇❤✐❝❤ ♠❛①✐♠✐③❡❞ ♦✉r✷✸✻

♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ❛♥❞ s♣❧✐t t❤❡ ❞❛t❛s❡t ✐♥t♦ t✇♦ ❞✐s❥♦✐♥t s✉❜✲tr❡❡s✱ ♦r s❡ts ♦❢✷✸✼

❖❚❯s t❤❛t ✇❡ ❤❡♥❝❡❢♦rt❤ r❡❢❡r t♦ ❛s ✏❜✐♥s✑✱ ❛♥❞ ✇❡ ❤❛✈❡ ❛♥ ❡st✐♠❛t❡❞ ■▲❘✷✸✽

Page 12: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❜❛❧❛♥❝✐♥❣ ❡❧❡♠❡♥t✱ x̂∗1(X)✱ ✇❤❡r❡ X ✐s ♦✉r s❡t ♦❢ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s✳ ❙✉❜✲✷✸✾

s❡q✉❡♥t ❢❛❝t♦rs ✇✐❧❧ s♣❧✐t t❤❡ ❜✐♥s ❢r♦♠ ♣r❡✈✐♦✉s st❡♣s✱ ❛♥❞ ❛❢t❡r n ✐t❡r❛t✐♦♥s✷✹✵

♦♥❡ ❤❛s n ❢❛❝t♦rs t❤❛t ❝❛♥ ❜❡ ♠❛♣♣❡❞ t♦ t❤❡ ♣❤②❧♦❣❡♥②✱ n + 1 ❜✐♥s ❢♦r ❜✐♥✲✷✹✶

♥✐♥❣ t❛①❛ ❜❛s❡❞ ♦♥ t❤❡✐r ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs✱ n ❡st✐♠❛t❡s ♦❢ ■▲❘ ❜❛❧❛♥❝✐♥❣✷✹✷

❡❧❡♠❡♥ts✱ ❛♥❞ ❛♥ ♦rt❤♦♥♦r♠❛❧ ■▲❘ ❜❛s✐s t❤❛t ❝❛♥ ❜❡ ✉s❡❞ t♦ ♣r♦❥❡❝t t❤❡ ❞❛t❛✷✹✸

♦♥t♦ ❛ ❧♦✇❡r ❞✐♠❡♥s✐♦♥❛❧ s♣❛❝❡✳ ❚❤❡ s❡q✉❡♥t✐❛❧ s♣❧✐tt✐♥❣ ♦❢ ❜✐♥s ✐♥ ♣❤②❧♦❢❛❝t♦r✲✷✹✹

✐③❛t✐♦♥ ❡♥s✉r❡s s❡q✉❡♥t✐❛❧❧② ✐♥❞❡♣❡♥❞❡♥t ✐♥❢❡r❡♥❝❡s ✲ ❤❛✈✐♥❣ ❛❧r❡❛❞② ✐❞❡♥t✐✜❡❞✷✹✺

❣r♦✉♣ B ❛s ❤②♣❡r✲❛❜✉♥❞❛♥t r❡❧❛t✐✈❡ t♦ ❣r♦✉♣ A ✐♥ t❤❡ ❡①❛♠♣❧❡ ✐❧❧✉str❛t❡❞ ✐♥✷✹✻

❋✐❣✳ ✷✱ ❞♦✇♥str❡❛♠ ❢❛❝t♦rs ♠✉st ❛♥❛❧②③❡ s✉❜✲❝♦♠♣♦s✐t✐♦♥s ❡♥t✐r❡❧② ✇✐t❤✐♥ B✷✹✼

❛♥❞ ✇✐t❤✐♥ A✳✷✹✽

❈♦♠♣✉t❛t✐♦♥❛❧ ❚♦♦❧s✷✹✾

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇❛s ❞♦♥❡ ✉s✐♥❣ t❤❡ ❘ ♣❛❝❦❛❣❡ ✏♣❤②❧♦❢❛❝t♦r✑ ❛✈❛✐❧❛❜❧❡ ❛t✷✺✵

❤tt♣s✿✴✴❣✐t❤✉❜✳❝♦♠✴r❡♣t❛❧❡①✴♣❤②❧♦❢❛❝t♦r✳ ❚❤❡ ❘ ♣❛❝❦❛❣❡ ❝♦♥t❛✐♥s ❞❡t❛✐❧❡❞ ❤❡❧♣✷✺✶

✜❧❡s t❤❛t ❞❡♠♦ t❤❡ ✉s❡ ♦❢ t❤❡ ♣❛❝❦❛❣❡✱ ❛♥❞ t❤❡ ❡①❛❝t ❝♦❞❡ ✉s❡❞ ✐♥ ❛♥❛❧②s❡s ❛♥❞✷✺✷

✈✐s✉❛❧✐③❛t✐♦♥ ✐♥ t❤✐s ♣❛♣❡r ❛r❡ ❛✈❛✐❧❛❜❧❡ ✐♥ t❤❡ s✉♣♣❧❡♠❡♥t❛r② ♠❛t❡r✐❛❧s✳ ❚❤❡✷✺✸

r♦♦t❡❞ ■▲❘ tr❛♥s❢♦r♠ ✇❛s ♣❡r❢♦r♠❡❞ ❛s ❞❡s❝r✐❜❡❞ ✐♥ ❬✶✵❪ ✇❤❡r❡ t❤❡ s❡q✉❡♥t✐❛❧✷✺✹

❜✐♥❛r② ♣❛rt✐t✐♦♥ ✇❛s t❤❡ r♦♦t❡❞ ♣❤②❧♦❣❡♥②✳✷✺✺

P♦✇❡r ❆♥❛❧②s✐s ♦❢ ❘♦♦t❡❞ ■▲❘ ❛♥❞ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✷✺✻

❚♦ ❝♦♠♣❛r❡ t❤❡ ❛❜✐❧✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛♥❞ t❤❡ r♦♦t❡❞ ■▲❘ t♦ ✐❞❡♥t✐❢②✷✺✼

❝❧❛❞❡s ♦❢ ❖❚❯s ✇✐t❤ s❤❛r❡❞ ❛ss♦❝✐❛t✐♦♥s ✇✐t❤ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s✱ ✇❡ s✐♠✉✲✷✺✽

❧❛t❡❞ r❛♥❞♦♠ ❝♦♠♠✉♥✐t✐❡s ♦❢ D = 50 ❖❚❯s ❛♥❞ p = 40 s❛♠♣❧❡s ❜② s✐♠✉❧❛t✐♥❣✷✺✾

r❛♥❞♦♠ ❛❜s♦❧✉t❡ ❛❜✉♥❞❛♥❝❡s✱ Ni,j ✱ s✉❝❤ t❤❛t logNi,j ✇❡r❡ ✐✳✐✳❞ ●❛✉ss✐❛♥ r❛♥✲✷✻✵

❞♦♠ ✈❛r✐❛❜❧❡s ✇✐t❤ ♠❡❛♥ µ = 8 ❛♥❞ st❛♥❞❛r❞ ❞❡✈✐❛t✐♦♥ σ = 0.5✳ ❚❤❡ ❖❚❯s✷✻✶

✇❡r❡ ❝♦♥♥❡❝t❡❞ ❜② ❛ r❛♥❞♦♠ tr❡❡ ✭t❤❡ tr❡❡ r❡♠❛✐♥❡❞ ❝♦♥st❛♥t ❛❝r♦ss ❛❧❧ s✐♠✉❧❛✲✷✻✷

t✐♦♥s✮✱ ❛♥❞ t❤❡♥ ❡✐t❤❡r ✶ ♦r ✸ ❝❧❛❞❡s ✇❡r❡ r❛♥❞♦♠❧② ❝❤♦s❡♥ t♦ ❤❛✈❡ ❛ss♦❝✐❛t✐♦♥s✷✻✸

✇✐t❤ ❛ ❜✐♥❛r② ✏❡♥✈✐r♦♥♠❡♥t✑ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡ ✇✐t❤ p = 20 s❛♠♣❧❡s ❢♦r ❡❛❝❤✷✻✹

♦❢ ✐ts t✇♦ ✈❛❧✉❡s t♦ r❡♣r❡s❡♥t ❛♥ ❡q✉❛❧ s❛♠♣❧✐♥❣ ♦❢ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ❛❝r♦ss✷✻✺

t✇♦ ❡♥✈✐r♦♥♠❡♥ts✳✷✻✻

❋♦r s✐♠✉❧❛t✐♦♥s ✇✐t❤ ♦♥❡ s✐❣♥✐✜❝❛♥t ❝❧❛❞❡✱ t❤❡ ❛❜✉♥❞❛♥❝❡s ♦❢ ❛❧❧ t❤❡ ❖❚❯s✷✻✼

✇✐t❤✐♥ t❤❛t ❝❧❛❞❡ ✐♥❝r❡❛s❡❞ ❜② ❛ ❢❛❝t♦r a ✐♥ t❤❡ s❡❝♦♥❞ ❡♥✈✐r♦♥♠❡♥t ✇❤❡r❡ a ∈✷✻✽

{1.5, 3, 6}✳ ❋♦r s✐♠✉❧❛t✐♦♥s ✇✐t❤ t❤r❡❡ s✐❣♥✐✜❝❛♥t ❝❧❛❞❡s✱ t❤❡ t❤r❡❡ ❝❧❛❞❡s ✇❡r❡✷✻✾

❞r❛✇♥ ❛t r❛♥❞♦♠ ❛♥❞ r❛♥❞♦♠❧② ❛ss✐❣♥❡❞ ❛ ❢♦❧❞✲❝❤❛♥❣❡ ❢r♦♠ t❤❡ s❡t {πb, 0.5b, exp(−b)}✷✼✵

✐♥ ❛ r❛♥❞♦♠❧② ❝❤♦s❡♥ ❡♥✈✐r♦♥♠❡♥t ✇❤❡r❡ b ∈ {1, 2, 5}✳ ❋♦r ❡❛❝❤ ❢♦❧❞✲❝❤❛♥❣❡✱✷✼✶

✶✵

Page 13: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

✺✵✵ r❡♣❧✐❝❛t❡s ✇❡r❡ r✉♥ t♦ ❝♦♠♣❛r❡ t❤❡ ♣♦✇❡r ♦❢ t❤❡ r♦♦t❡❞ ■▲❘ ❛♥❞ ♣❤②❧♦❢❛❝✲✷✼✷

t♦r✐③❛t✐♦♥ ✐♥ ❝♦rr❡❝t❧② ✐❞❡♥t✐❢②✐♥❣ t❤❡ ❛✛❡❝t❡❞ ❝❧❛❞❡s✳✷✼✸

❘❡❣r❡ss✐♦♥ ♦❢ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡s ✇❛s ♣❡r❢♦r♠❡❞ ❛♥❞ t❤❡ ❝♦♦r❞✐♥❛t❡s ✇❡r❡✷✼✹

r❛♥❦❡❞ ❜② t❤❡ ❞✐✛❡r❡♥❝❡ ❜❡t✇❡❡♥ t❤❡✐r ♥✉❧❧ ❞❡✈✐❛♥❝❡ ❛♥❞ t❤❡ ♠♦❞❡❧ ❞❡✈✐❛♥❝❡✳✷✼✺

❚❤❡ ❛❜✐❧✐t② ♦❢ ❛ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡ t♦ ✐❞❡♥t✐❢② t❤❡ ❝♦rr❡❝t ✶ ❝❧❛❞❡ ♦r ✸ ❝❧❛❞❡s✷✼✻

✇❛s ♠❡❛s✉r❡❞ ❜② t❤❡ ♣❡r❝❡♥t ♦❢ ✐ts t♦♣ ✶ ♦r ✸ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱ r❡s♣❡❝t✐✈❡❧②✱✷✼✼

✇❤✐❝❤ ❝♦rr❡s♣♦♥❞❡❞ t♦ t❤❡ ♥♦❞❡ ♦♥ t❤❡ tr❡❡ ❢r♦♠ ✇❤✐❝❤ t❤❡ ❛✛❡❝t❡❞ ❝❧❛❞❡✭s✮✷✼✽

♦r✐❣✐♥❛t❡❞✳ ❚❤❡ ❛❜✐❧✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r t♦ ✐❞❡♥t✐❢② t❤❡ ❝♦rr❡❝t ✶ ❝❧❛❞❡ ♦r ✸ ❝❧❛❞❡s✷✼✾

✇❛s ♠❡❛s✉r❡❞ ❜② t❤❡ ♣❡r❝❡♥t ♦❢ t❤❡ ❢❛❝t♦rs t❤❛t ❝♦rr❡❝t❧② s♣❧✐t ❛♥ ❛✛❡❝t❡❞ ❝❧❛❞❡✷✽✵

❢r♦♠ t❤❡ r❡st ✭❡✳❣✳ t❤❡ ♣❡r❝❡♥t ♦❢ ❢❛❝t♦rs ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❡❞❣❡s ❛❧♦♥❣ ✇❤✐❝❤ ❛✷✽✶

tr❛✐t ❛r♦s❡✮✳✷✽✷

❋♦r t❤❡ ✸ ❝❧❛❞❡ s✐♠✉❧❛t✐♦♥s✱ ✇❡ ❛❧s♦ ❝♦♠♣❛r❡❞ t❤❡ ❛♠♦✉♥t ♦❢ ✈❛r✐❛♥❝❡ ❡①♣❧❛✐♥❡❞✷✽✸

❜② ✸ ❢❛❝t♦rs ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇✐t❤ t❤❡ ❛♠♦✉♥t ♦❢ ✈❛r✐❛♥❝❡ ❡①♣❧❛✐♥❡❞ ❜② t❤❡✷✽✹

t♦♣ ✸ ■▲❘ ❝♦♦r❞✐♥❛t❡s ✐♥ t❤❡ r♦♦t❡❞ ■▲❘✳ ❚❤❡ ❛♠♦✉♥t ♦❢ ✈❛r✐❛♥❝❡ ❡①♣❧❛✐♥❡❞✷✽✺

✇❛s ♠❡❛s✉r❡❞ ❛s t❤❡ ❞✐✛❡r❡♥❝❡ ✐♥ t❤❡ ♥✉❧❧ ❞❡✈✐❛♥❝❡ ❛♥❞ t❤❡ ♠♦❞❡❧ ❞❡✈✐❛♥❝❡✱✷✽✻

s✉♠♠❡❞ ❛❝r♦ss ❛❧❧ t❤r❡❡ ❢❛❝t♦rs ♦r t❤❡ t♦♣ ✸ ■▲❘ ❝♦♦r❞✐♥❛t❡s✳✷✽✼

❑❙✲❜❛s❡❞ ❙t♦♣♣✐♥❣ ❋✉♥❝t✐♦♥ ❢♦r P❤②❧♦❋❛❝t♦r✷✽✽

❲❤✐❧❡ ❛ r❡s❡❛r❝❤❡r ❝❛♥ ✐t❡r❛t❡ t❤r♦✉❣❤ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✉♥t✐❧ ❛ ❢✉❧❧ ❜❛s✐s ♦❢✷✽✾

D− 1 ■▲❘ ❝♦♦r❞✐♥❛t❡s ✐s ❝♦♥str✉❝t❡❞✱ t❤❡ r❡s❡❛r❝❤❡r ♠❛② ❜❡ ✐♥t❡r❡st❡❞ ✐♥ st♦♣✲✷✾✵

♣✐♥❣ t❤❡ ✐t❡r❛t✐♦♥ ❜❡❢♦r❡ t❤❡ ❢✉❧❧ ❜❛s✐s ✐s ❝♦♥str✉❝t❡❞ ❛♥❞ ❢♦❝✉s t❤❡✐r ❛♥❛❧②s✐s✷✾✶

❛♥❞ ✐♥t❡r♣r❡t❛t✐♦♥s ♦♥ ❛ ❝♦♥s❡r✈❛t✐✈❡ s✉❜s❡t ♦❢ t❤❡ tr✉❡ ♥✉♠❜❡r ♦❢ ♣❤②❧♦❣❡♥❡t✐❝✷✾✷

❢❛❝t♦rs✳ ❲❡ ✐♠♣❧❡♠❡♥t❡❞ ❛ st♦♣♣✐♥❣ ❢✉♥❝t✐♦♥ ❜❛s❡❞ ♦♥ ❛ ❑♦❧♠♦❣♦r♦✈✲❙♠✐r♥♦✈✷✾✸

✭❑❙✮ t❡st ♦❢ t❤❡ ❞✐str✐❜✉t✐♦♥ ♦❢ P✲✈❛❧✉❡s ❢r♦♠ ❛♥❛❧②s❡s ♦❢ ✈❛r✐❛♥❝❡ ♦❢ t❤❡ r❡✲✷✾✹

❣r❡ss✐♦♥s ♦♥ ❝❛♥❞✐❞❛t❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s✳ ■❢ t❤❡r❡ ✐s ♥♦ ♣❤②❧♦❣❡♥❡t✐❝ s✐❣♥❛❧✱ ✇❡✷✾✺

❛♥t✐❝✐♣❛t❡ t❤❡ tr✉❡ ❞✐str✐❜✉t✐♦♥ ♦❢ P✲✈❛❧✉❡s t♦ ❜❡ ✉♥✐❢♦r♠ ✭❛❧❜❡✐t ✇✐t❤ s♦♠❡✷✾✻

❞❡♣❡♥❞❡♥❝❡ ❛♠♦♥❣ t❤❡ P✲✈❛❧✉❡s ❞✉❡ t♦ ♦✈❡r❧❛♣ ✐♥ t❤❡ ❖❚❯s ✉s❡❞ ✐♥ t❤❡ ■▲❘✷✾✼

❝♦♦r❞✐♥❛t❡s✮✳ ❚❤✉s✱ ✇❡ t❡st❡❞ t❤❡ ❛❜✐❧✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r t♦ ❝♦rr❡❝t❧② ✐❞❡♥t✐❢② t❤❡✷✾✽

♥✉♠❜❡r ♦❢ ❝❧❛❞❡s ✐❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s st♦♣♣❡❞ ✇❤❡♥ ❛ ❑❙ t❡st ♦❢ t❤❡ P✲✈❛❧✉❡s✷✾✾

♣r♦❞✉❝❡s ✐ts ♦✇♥ P✲✈❛❧✉❡ PKS > 0.05✳✸✵✵

❲❡ s✐♠✉❧❛t❡❞ ✸✵✵ r❡♣❧✐❝❛t❡ ❝♦♠♠✉♥✐t✐❡s ✇✐t❤ M ❝❧❛❞❡s ❢♦r ❡❛❝❤ M ∈ {1, ..., 10}✳✸✵✶

❋♦r s✐♠✉❧❛t✐♦♥s ✇✐t❤ M ❝❧❛❞❡s✱ D = 50 ❛♥❞ p = 40 ❝♦♠♠✉♥✐t✐❡s ✇❡r❡ s✐♠✉❧❛t❡❞✸✵✷

❛s ❛❜♦✈❡ ❛♥❞ ❢♦❧❞ ❝❤❛♥❣❡s✱ c✱ ✇❡r❡ ❞r❛✇♥ ❛s ❧♦❣✲♥♦r♠❛❧ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ✇❤❡r❡✸✵✸

log(ck) ✇❡r❡ ✐✳✐✳❞ ●❛✉ss✐❛♥ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ✇✐t❤ µ = 0 ❛♥❞ σ = 3 ❢♦r k =✸✵✹

1, ...,M ✳ ❚❤❡ ♥✉♠❜❡r ♦❢ ❝❧❛❞❡s ✐❞❡♥t✐✜❡❞ ❜② ♣❤②❧♦❢❛❝t♦r ❢♦r ❛ ❣✐✈❡♥ tr✉❡ ♥✉♠❜❡r✸✵✺

✶✶

Page 14: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

♦❢ ❝❧❛❞❡s✱ KM,r✱ ✇❛s t❛❧❧✐❡❞ ❢♦r r = 1, ..., 300✳ ❲❡ ❝❛❧❝✉❧❛t❡ t❤❡ ♠❡❛♥ K̄M ❛❝r♦ss✸✵✻

❛❧❧ r❡♣❧✐❝❛t❡s ❛♥❞✱ ❢♦r ✈✐s✉❛❧✐③❛t✐♦♥ ♣✉r♣♦s❡s✱ ✐♥t❡r♣♦❧❛t❡ t❤❡ α = 0.025 ❛♥❞✸✵✼

α = 0.975 q✉❛♥t✐❧❡s ❜② ✜♥❞✐♥❣ t❤❡ ❜❡st ✜t ♦❢ ❛ ❧♦❣✐st✐❝ ❢✉♥❝t✐♦♥ t♦ t❤❡ ❝✉♠✉❧❛t✐✈❡✸✵✽

❞✐str✐❜✉t✐♦♥ ♦❢ {KM,r}r=300r=1 ❢♦r ❡❛❝❤ M ✳✸✵✾

❆♥❛❧②s✐s ♦❢ ❋❡❝❛❧✴❖r❛❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛✸✶✵

✶✻❙ ❛♠♣❧✐❝♦♥ s❡q✉❡♥❝✐♥❣ ❞❛t❛ ❢r♦♠ ❈❛♣♦r❛s♦ ❡t ❛❧✳ ✭✷✵✶✶✮ ❬✹❪ ✇❡r❡ ❞♦✇♥❧♦❛❞❡❞✸✶✶

❢r♦♠ t❤❡ ▼●✲❘❆❙❚ ❞❛t❛❜❛s❡ ✭❤tt♣✿✴✴♠❡t❛❣❡♥♦♠✐❝s✳❛♥❧✳❣♦✈✴✮ ❛❧♦♥❣ ✇✐t❤ ❛s✲✸✶✷

s♦❝✐❛t❡❞ ♠❡t❛❞❛t❛✳ ◗■■▼❊ ❬✺❪ ✇❛s ✉s❡❞ t♦ tr✐♠ ♣r✐♠❡rs ❢r♦♠ t❤❡s❡ ❞❛t❛✱ ❛♥❞✸✶✸

t♦ ❝❧✉st❡r ❖❚❯s ✇✐t❤ t❤❡ ●r❡❡♥❣❡♥❡s r❡❢❡r❡♥❝❡ ❞❛t❛❜❛s❡ ✭▼❛② ✷✵✶✸ ✈❡rs✐♦♥❀✸✶✹

❤tt♣✿✴✴❣r❡❡♥❣❡♥❡s✳❧❜❧✳❣♦✈✮✳ ▲♦♥❣❡r s❡q✉❡♥❝❡ ❧❡♥❣t❤s ✐♥ t❤❡ ❣r❡❡♥❣❡♥❡s ❞❛t❛❜❛s❡✸✶✺

✭⑦✶✹✵✵ ❇P✮ ❝♦♠♣❛r❡❞ t♦ t❤❡ ♦r✐❣✐♥❛❧ ■❧❧✉♠✐♥❛ s❡q✉❡♥❝❡s ✭⑦✶✷✸ ❇P✮ ❛❧❧♦✇s ♠♦r❡✸✶✻

✐♥❢♦r♠❛t✐✈❡ ❜❛s❡ ♣❛✐rs ❢♦r ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡ ❝♦♥str✉❝t✐♦♥✳ ❲❡ ✉s❡❞ t❤❡ ♣❤②❧♦✲✸✶✼

❣❡♥❡t✐❝ tr❡❡ t❤❛t ✐s ✐♥❝❧✉❞❡❞ ✇✐t❤ t❤❡ ❣r❡❡♥❣❡♥❡s ❞❛t❛❜❛s❡ ❢♦r ❛❧❧ ❛♥❛❧②s❡s✳ ❚❤❡✸✶✽

r❡s✉❧t✐♥❣ ❖❚❯ t❛❜❧❡ ✇❛s r❛r❡✜❡❞ t♦ ✻✵✵✵ s❡q✉❡♥❝❡s ♣❡r s❛♠♣❧❡✳✸✶✾

✶✵ t✐♠❡ ♣♦✐♥ts ✇❡r❡ r❛♥❞♦♠❧② ❞r❛✇♥ ❢r♦♠ ❡❛❝❤ ♦❢ t❤❡ ♠❛❧❡ t♦♥❣✉❡✱ ❢❡♠❛❧❡✸✷✵

t♦♥❣✉❡✱ ♠❛❧❡ ❢❡❝❡s ❛♥❞ ❢❡♠❛❧❡ ❢❡❝❡s ❞❛t❛s❡ts✱ ❣✐✈✐♥❣ ❛ t♦t❛❧ ♦❢ ♥❂✷✵ s❛♠♣❧❡s ❛t✸✷✶

❡❛❝❤ s✐t❡✳ ❚❛①❛ ♣r❡s❡♥t ✐♥ ❢❡✇❡r t❤❛♥ ✸✵ ♦❢ t❤❡ ✹✵ s❛♠♣❧❡s ✇❡r❡ ❞✐s❝❛r❞❡❞✱ ❛♥❞✸✷✷

♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇❛s ❞♦♥❡ ❜② ❛❞❞✐♥❣ ♣s❡✉❞♦✲❝♦✉♥ts ♦❢ ✵✳✻✺ t♦ ❛❧❧ ✵ ❡♥tr✐❡s ✐♥✸✷✸

t❤❡ ❞❛t❛s❡t ❬✶❪✱ ❝♦♥✈❡rt✐♥❣ ❝♦✉♥ts ✐♥ ❡❛❝❤ s❛♠♣❧❡ t♦ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s✱ ❛♥❞✸✷✹

t❤❡♥ r❡❣r❡ss✐♥❣ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s ❛❣❛✐♥st ❜♦❞② s✐t❡✳ ❚❤❡ ❝♦♠♣❧❡t❡ ❘ s❝r✐♣t✸✷✺

✐s ❛✈❛✐❧❛❜❧❡ ✐♥ t❤❡ ✜❧❡ ✏❉❛t❛ ❆♥❛❧②s✐s ♣✐♣❡❧✐♥❡ ♦❢ t❤❡ ❋❚ ♠✐❝r♦❜✐♦♠❡✑✳✸✷✻

❈♦♠♣❧❡t❡ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ t❤✐s ❞❛t❛s❡t ✇❛s ♣❡r❢♦r♠❡❞ ❜② st♦♣♣✐♥❣ t❤❡ ❛❧✲✸✷✼

❣♦r✐t❤♠ ✇❤❡♥ ❛ ❑❙✲t❡st ♦♥ t❤❡ ✉♥✐❢♦r♠✐t② ♦❢ P✲✈❛❧✉❡s ❢r♦♠ ❛♥❛❧②s❡s ♦❢ ✈❛r✐❛♥❝❡✸✷✽

♦❢ r❡❣r❡ss✐♦♥ ♦♥ ❝❛♥❞✐❞❛t❡ ■▲❘✲❝♦♦r❞✐♥❛t❡s ②✐❡❧❞❡❞ PKS > 0.05✳ ❚❤❡s❡ r❡s✉❧ts✸✷✾

✇❡r❡ ❝♦♠♣❛r❡❞ ✇✐t❤ ❛ st❛♥❞❛r❞✱ ♠✉❧t✐♣❧❡ ❤②♣♦t❤❡s✐s✲t❡st✐♥❣ ❛♥❛❧②s✐s ♦❢ ❈▲❘✲✸✸✵

tr❛♥s❢♦r♠❡❞ ❞❛t❛✳ ❚❤❡ s✉♠♠❛r② ♦❢ t❤❡ t❛①♦♥♦♠✐❝ ❞❡t❛✐❧ ❛t t❤❡ ✜rst t❤r❡❡ ❢❛❝t♦rs✸✸✶

✐s ♣r♦✈✐❞❡❞ ✐♥ t❤❡ r❡s✉❧ts s❡❝t✐♦♥✱ ❛♥❞ ❛ ❢✉❧❧ ❧✐st ♦❢ t❤❡ t❛①❛ ❢❛❝t♦r❡❞ ❛t ❡❛❝❤ st❡♣✸✸✷

✐s ❛✈❛✐❧❛❜❧❡ ✐♥ t❤❡ s✉♣♣❧❡♠❡♥t ❛♥❞ ❝❛♥ ❜❡ ❢✉rt❤❡r ❡①♣❧♦r❡❞ ✉s✐♥❣ t❤❡ ❘ ♣✐♣❡❧✐♥❡✸✸✸

♣r♦✈✐❞❡❞✳✸✸✹

❆♥❛❧②s✐s ♦❢ ❙♦✐❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛✸✸✺

❚❤❡ s♦✐❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡t ❢r♦♠ ❬✸✻❪ ✇❛s ✐♥❝❧✉❞❡❞ t♦ ✐❧❧✉str❛t❡ t❤❡ ❛❜✐❧✐t②✸✸✻

♦❢ ♣❤②❧♦❢❛❝t♦r t♦ ✇♦r❦ ♦♥ ❜✐❣❣❡r ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ✇✐t❤ ❝♦♥t✐♥✉♦✉s ✐♥❞❡♣❡♥✲✸✸✼

❞❡♥t ✈❛r✐❛❜❧❡s ❛♥❞ ♠✉❧t✐♣❧❡ r❡❣r❡ss✐♦♥✳ ❉❡t❛✐❧s ♦♥ s❛♠♣❧❡ ❝♦❧❧❡❝t✐♦♥✱ s❡q✉❡♥❝✐♥❣✱✸✸✽

✶✷

Page 15: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

♠❡t❛✲❞❛t❛ ♠❡❛s✉r❡♠❡♥ts ❛♥❞ ❖❚❯ ❝❧✉st❡r✐♥❣ ❛r❡ ❛✈❛✐❧❛❜❧❡ ✐♥ ❬✸✻❪✳ ❚❤❡ ♣❤②✲✸✸✾

❧♦❣❡♥② ✇❛s ❝♦♥str✉❝t❡❞ ❜② ❛❧✐❣♥✐♥❣ r❡♣r❡s❡♥t❛t✐✈❡ s❡q✉❡♥❝❡s ✉s✐♥❣ ❙■◆❆ ❬✸✹❪✱✸✹✵

tr✐♠♠✐♥❣ ❜❛s❡s t❤❛t r❡♣r❡s❡♥t❡❞ ❣❛♣s ✐♥ ≥✷✵✪ ♦❢ s❡q✉❡♥❝❡s✱ ❛♥❞ ✉s✐♥❣ ❢❛sttr❡❡✸✹✶

❬✸✸❪✳✸✹✷

❚❤❡ ❝♦♠♣❧❡t❡ ❞❛t❛s❡t ❝♦♥t❛✐♥❡❞ ✶✷✸✱✽✺✶ ❖❚❯s ❛♥❞ ✺✽✵ s❛♠♣❧❡s✳ ❉❛t❛ ✇❡r❡✸✹✸

✜❧t❡r❡❞ t♦ ✐♥❝❧✉❞❡ ❛❧❧ ❖❚❯s ✇✐t❤ ♦♥ ❛✈❡r❛❣❡ ✷ ♦r ♠♦r❡ s❡q✉❡♥❝❡s ❝♦✉♥t❡❞ ❛❝r♦ss✸✹✹

❛❧❧ s❛♠♣❧❡s✱ s❤r✐♥❦✐♥❣ t❤❡ ❞❛t❛s❡t t♦ ❉❂✸✱✸✼✾ ❖❚❯s✳ ❚❤❡ ❞❛t❛ ✇❡r❡ ❢✉rt❤❡r✸✹✺

tr✐♠♠❡❞ t♦ ✐♥❝❧✉❞❡ ♦♥❧② t❤♦s❡ s❛♠♣❧❡s ✇✐t❤ ❛✈❛✐❧❛❜❧❡ ♣❍✱ ❈ ❛♥❞ ◆ ♠❡t❛✲❞❛t❛✱✸✹✻

r❡❞✉❝✐♥❣ t❤❡ s❛♠♣❧❡ s✐③❡ t♦ ♥❂✺✺✶✳✸✹✼

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇❛s ❞♦♥❡ ❜② ❛❞❞✐♥❣ ♣s❡✉❞♦✲❝♦✉♥ts ♦❢ ✵✳✻✺ t♦ ❛❧❧ ✵ ❡♥tr✐❡s ✐♥✸✹✽

t❤❡ ❞❛t❛s❡t ❬✶❪✱ ❝♦♥✈❡rt✐♥❣ ❝♦✉♥ts ✐♥ ❡❛❝❤ s❛♠♣❧❡ t♦ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s✱ ❛♥❞✸✹✾

♣❡r❢♦r♠✐♥❣ ♠✉❧t✐♣❧❡ r❡❣r❡ss✐♦♥ ♦❢ ♣❍✱ ❈ ❛♥❞ ◆ ♦♥ ■▲❘ ❝♦♦r❞✐♥❛t❡s✳ ❚❤❡ ✜rst✸✺✵

t❤r❡❡ ❢❛❝t♦rs ❛r❡ ✉s❡❞ ❢♦r ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥✳ ❚♦ ❞❡t❡r♠✐♥❡ t❤❡ r❡❧❛t✐✈❡✸✺✶

✐♠♣♦rt❛♥❝❡ ♦❢ ❡❛❝❤ ❛❜✐♦t✐❝ ✈❛r✐❛❜❧❡ ✐♥ ❞r✐✈✐♥❣ ♣❤②❧♦❣❡♥❡t✐❝ ♣❛tt❡r♥s ♦❢ ♠✐❝r♦❜✐❛❧✸✺✷

❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥✱ ✇❡ ✉s❡❞ t❤❡ lmg ♠❡t❤♦❞ ❢r♦♠ t❤❡ ❘ ♣❛❝❦❛❣❡ ✬r❡❧❛✐♠♣♦✬✸✺✸

❬✶✾❪ ✇❤✐❝❤ ❛✈❡r❛❣❡s t❤❡ s❡q✉❡♥t✐❛❧ s✉♠s ♦❢ sq✉❛r❡s ♦✈❡r ❛❧❧ ♦r❞❡r✐♥❣s ♦❢ r❡❣r❡ss♦rs✸✺✹

t♦ ♦❜t❛✐♥ ❛ ♠❡❛s✉r❡ ♦❢ r❡❧❛t✐✈❡ ✐♠♣♦rt❛♥❝❡ ♦❢ ❡❛❝❤ r❡❣r❡ss♦r ✐♥ t❤❡ ♠✉❧t✐✈❛r✐❛t❡✸✺✺

♠♦❞❡❧✳✸✺✻

❘❡s✉❧ts✸✺✼

❲❡ ✜♥❞ t❤r❡❡ ♠❛✐♥ r❡s✉❧ts✳ ❋✐rst✱ ✇❡ ✜♥❞ t❤❛t ♦✉r ❛❧❣♦r✐t❤♠ ♦✉t✲♣❡r❢♦r♠s ❛✸✺✽

st❛♥❞❛r❞ t♦♦❧ ❢♦r ❛♥❛❧②③✐♥❣ ❝♦♠♣♦s✐t✐♦♥s ♦❢ ♣❛rts r❡❧❛t❡❞ ❜② ❛ tr❡❡ ✲ ✇❤❛t ✇❡✸✺✾

r❡❢❡r t♦ ❛s t❤❡ ✏r♦♦t❡❞ ■▲❘✑ tr❛♥s❢♦r♠ ✲ ❛♥❞ t❤❛t ✇❡ ❝❛♥ ♦❜t❛✐♥ ❛ ❝♦♥s❡r✈❛t✐✈❡✸✻✵

❡st✐♠❛t❡ ♦❢ t❤❡ ♥✉♠❜❡r ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ✐♥ s✐♠✉❧❛t❡❞ ❞❛t❛s❡ts ❛ ✇✐t❤ ❛✸✻✶

❦♥♦✇♥ ♥✉♠❜❡r ♦❢ ❛✛❡❝t❡❞ ❝❧❛❞❡s✳ ❙❡❝♦♥❞✱ ✇❡ ♣❤②❧♦❢❛❝t♦r ❛ ❞❛t❛s❡t ♦❢ t❤❡ ❤✉✲✸✻✷

♠❛♥ ♦r❛❧ ❛♥❞ ❢❡❝❛❧ ♠✐❝r♦❜✐♦♠❡s ❛♥❞ ✜♥❞ t❤❛t t❤❡ t❤r❡❡ ❞♦♠✐♥❛♥t ❡❞❣❡s ✐♥ t❤❡✸✻✸

♣❤②❧♦❣❡♥② ❛❝❝♦✉♥t ❢♦r ✶✼✳✻✪ ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ✐♥ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s ❛❝r♦ss✸✻✹

t❤❡s❡ s❛♠♣❧❡ s✐t❡s✱ ❡❞❣❡s ✇❤✐❝❤ ❛r❡ ♥♦t ❛ss✐❣♥❡❞ ❛ ✉♥✐q✉❡ t❛①♦♥♦♠✐❝ ❧❛❜❡❧ ❛♥❞✸✻✺

❛r❡ t❤✉s ❞✐✣❝✉❧t ♦r ✐♠♣♦ss✐❜❧❡ t♦ ♦❜t❛✐♥ ❢r♦♠ t❛①♦♥♦♠✐❝✲❜❛s❡❞ ❛♥❛❧②s❡s✳ ❚❤✐r❞✱✸✻✻

✇❡ s❤♦✇ t❤❛t ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❛♥ ❜❡ ❝♦♠❜✐♥❡❞ ✇✐t❤ ♠✉❧t✐♣❧❡ r❡❣r❡ss✐♦♥ t♦✸✻✼

r❡✈❡❛❧ t❤❛t ♣❍ ❞r✐✈❡s t❤❡ ♠❛✐♥ ♣❤②❧♦❣❡♥❡t✐❝ ♣❛tt❡r♥s ♦❢ ❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐✲✸✻✽

t✐♦♥ ✐♥ s♦✐❧ ♠✐❝r♦❜✐♦♠❡s✱ ❛♥❞ s❤♦✇ t❤❛t ✐♥ ❢♦✉r ❢❛❝t♦rs ✇❡ s♣❧✐t t❤❡ ❆❝✐❞♦❜❛❝✲✸✻✾

t❡r✐❛ t❤r❡❡ t✐♠❡s ✲ ✐♥❝❧✉❞✐♥❣ ♦♥❡ s♣❧✐t t❤❛t ✐❞❡♥t✐✜❡s ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡ ♦❢✸✼✵

❆❝✐❞♦❜❛❝t❡r✐❛ t❤❛t ❝♦♥s✐sts ♦❢ ❛❧❦❛❧✐♣❤✐❧❡s✳ ❋✐♥❛❧❧②✱ ✉s✐♥❣ t❤❡ s♦✐❧ ❞❛t❛s❡t✱ ✇❡✸✼✶

✶✸

Page 16: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❞❡♠♦♥str❛t❡ ❤♦✇ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ②✐❡❧❞s t✇♦ ❝♦♠♣❧✐♠❡♥t❛r② ♠❡t❤♦❞s ❢♦r ❞✐✲✸✼✷

♠❡♥s✐♦♥❛❧✐t② r❡❞✉❝t✐♦♥ ❛♥❞ ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ t❤❛t t❡❧❧ ❛ s✐♠♣❧✐✜❡❞ st♦r②✸✼✸

♦❢ ❤♦✇ t❤❡ ♠❛❥♦r ♣❤②❧♦❣❡♥❡t✐❝ ❣r♦✉♣s ♦❢ ❖❚❯s ❝❤❛♥❣❡ ✇✐t❤ ♣❍✳ ❚❤r♦✉❣❤♦✉t ♦✉r✸✼✹

r❡s✉❧ts✱ ✇❡ ❡♠♣❤❛s✐③❡ t❤❛t t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ ✐♥❢❡r❡♥❝❡s ♠❛❞❡ ❜② ♣❤②❧♦❢❛❝t♦r✐③❛✲✸✼✺

t✐♦♥ ❝♦✉❧❞ ❜❡ ✐♥✈✐s✐❜❧❡ t♦ t❛①♦♥♦♠✐❝✲❜❛s❡❞ ❛♥❛❧②s❡s✱ ❛♥❞ ✇❡ ❝♦♥❝❡♣t✉❛❧❧② ❝♦♠✲✸✼✻

♣❛r❡ t❤❡ ❞✐♠❡♥s✐♦♥❛❧✐t②✲r❡❞✉❝t✐♦♥s ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ t♦ t❤❡ ❧❡ss✲✐♥t❡r♣r❡t❛❜❧❡✸✼✼

♦✉t♣✉t ❢r♦♠ st❛♥❞❛r❞ ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ t♦♦❧s✳✸✼✽

P♦✇❡r ❆♥❛❧②s✐s ❛♥❞ ❈♦♥s❡r✈❛t✐✈❡ ❙t♦♣♣✐♥❣ ♦❢ P❤②❧♦❢❛❝t♦r✲✸✼✾

✐③❛t✐♦♥✸✽✵

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ r❡♠❡❞✐❡s t❤❡ str✉❝t✉r❡❞ r❡s✐❞✉❛❧s ❢r♦♠ t❤❡ r♦♦t❡❞ ■▲❘ r❡✲✸✽✶

❣r❡ss✐♦♥ ♦♥ ❞❛t❛ ✇✐t❤ ❢♦❧❞✲❝❤❛♥❣❡s ✐♥ ❛❜✉♥❞❛♥❝❡s ✇✐t❤✐♥ ❝❧❛❞❡s✳ P❤②❧♦❢❛❝t♦r✲✸✽✷

✐③❛t✐♦♥ ❛❧s♦ r❡♠❡❞✐❡s t❤❡ ♣r♦❜❧❡♠ ♦❢ ❤✐❣❤ ❢❛❧s❡✲♣♦s✐t✐✈❡ r❛t❡s ❛r✐s✐♥❣ ❢r♦♠ t❤❡✸✽✸

♥❡st❡❞✲❞❡♣❡♥❞❡♥❝❡ ❛♥❞ ❝♦rr❡❧❛t❡❞ ❝♦♦r❞✐♥❛t❡s ♦❢ t❤❡ r♦♦t❡❞ ■▲❘ tr❛♥s❢♦r♠✱ ❛s✸✽✹

s❡q✉❡♥t✐❛❧ ✐♥❢❡r❡♥❝❡s ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛r❡ ✐♥❞❡♣❡♥❞❡♥t✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✸✽✺

♦✉t✲♣❡r❢♦r♠s t❤❡ r♦♦t❡❞ ■▲❘ ✐♥ ✐❞❡♥t✐❢②✐♥❣ t❤❡ ❝♦rr❡❝t ❝❧❛❞❡s ✇✐t❤ ❛ ❣✐✈❡♥ ❢♦❧❞✲✸✽✻

❝❤❛♥❣❡ ✐♥ ❛❜✉♥❞❛♥❝❡ ✭❋✐❣s✳ ✸❛ ❛♥❞ ✸❜✮✱ ❛♥❞ ❝❛♥ ❜❡ ♣❛✐r❡❞ ✇✐t❤ ♦t❤❡r ❛❧❣♦r✐t❤♠s✸✽✼

❛ss❡ss✐♥❣ r❡s✐❞✉❛❧ str✉❝t✉r❡ t♦ st♦♣ ❢❛❝t♦r✐③❛t✐♦♥ ✇❤❡♥ t❤❡r❡ ✐s ♥♦ r❡s✐❞✉❛❧ str✉❝✲✸✽✽

t✉r❡ ❛♥❞ t❤✉s ❛❝❝✉r❛t❡❧② ✐❞❡♥t✐❢② t❤❡ ♥✉♠❜❡r ♦❢ ❛✛❡❝t❡❞ ❝❧❛❞❡s ✭❋✐❣✳ ✸❝✮✳ ❋✐✲✸✽✾

♥❛❧❧②✱ ❜② ❢♦❝✉s✐♥❣ t❤❡ ✐♥❢❡r❡♥❝❡s ♦♥ ❡❞❣❡s ✐♥st❡❛❞ ♦❢ ♥♦❞❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✱ t❤✐s✸✾✵

❛❧❣♦r✐t❤♠ ✇♦r❦s ♦♥ tr❡❡s ✇✐t❤ ♣♦❧②t♦♠✐❡s ❛♥❞ ❞♦❡s♥✬t r❡q✉✐r❡ ❛ ❢♦r❝❡❞ r❡s♦❧✉t✐♦♥✸✾✶

♦❢ ♣♦❧②t♦♠✐❡s t♦ ❝♦♥str✉❝t ❛ s❡q✉❡♥t✐❛❧ ❜✐♥❛r② ♣❛rt✐t✐♦♥ ♦❢ t❤❡ ❖❚❯s✳✸✾✷

❖r❛❧✲❋❡❝❛❧ ▼✐❝r♦❜✐♦♠❡✸✾✸

P❡r❢♦r♠✐♥❣ r❡❣r❡ss✐♦♥ ♦❢ s❛♠♣❧❡ s✐t❡ ♦♥ ❝❡♥t❡r❡❞ ❧♦❣✲r❛t✐♦ ✭❈▲❘✮ tr❛♥s❢♦r♠❡❞✸✾✹

❖❚❯ t❛❜❧❡s✱ ✇✐t❤ ✷✾✵ ❖❚❯s ❛♥❞ ✹✵ s❛♠♣❧❡s✱ ②✐❡❧❞❡❞ ✷✸✻ s✐❣♥✐✜❝❛♥t ❖❚❯s✸✾✺

❛t ❛ ❢❛❧s❡✲❞✐s❝♦✈❡r② r❛t❡ ♦❢ ✶✪❀ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ s✐❣♥❛❧ ♦❢ t❤❡s❡ ❖❚❯s ❢r♦♠✸✾✻

❈▲❘✲❜❛s❡❞ ✐♥❢❡r❡♥❝❡s ♠❛② ❜❡ ❞✐✣❝✉❧t t♦ ♣❛rs❡ ♦✉t✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ t❤❡✸✾✼

♦r❛❧✲❢❡❝❛❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡t ②✐❡❧❞❡❞ ✶✹✷ ❢❛❝t♦rs✱ t❤❡ t♦♣ t❤r❡❡ ♦❢ ✇❤✐❝❤ ❡①♣❧❛✐♥✸✾✽

✶✼✳✻✪ ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛s❡t✱ ❢❛❝t♦rs ✇❤✐❝❤ ❝♦rr❡s♣♦♥❞ t♦ ❝❧❡❛r❧② ✈✐s✐❜❧❡✸✾✾

❜❧♦❝❦s ✐♥ ♣❤②❧♦❣❡♥❡t✐❝ ❤❡❛t♠❛♣s ♦❢ t❤❡ ❖❚❯ t❛❜❧❡ ✭❋✐❣✳ ✹✮✳ ❚❤❡ ❢❛❝t♦rs s♣❛♥✹✵✵

❛ r❛♥❣❡ ♦❢ t❛①♦♥♦♠✐❝ s❝❛❧❡s ❛♥❞ ❛❧❧ ♦❢ t❤❡♠ ✇♦✉❧❞ ❜❡ ✐♥✈✐s✐❜❧❡ t♦ t❛①♦♥♦♠✐❝✲✹✵✶

❜❛s❡❞ ❛♥❛❧②s❡s✳ ❇❡❧♦✇✱ ✇❡ s✉♠♠❛r✐③❡ t❤❡ ❢❛❝t♦rs ✲ t❤❡ P✲✈❛❧✉❡s ❢r♦♠ r❡❣r❡ss✐♦♥✱✹✵✷

t❤❡ t❛①❛ s♣❧✐t ❛t ❡❛❝❤ ❢❛❝t♦r✱ t❤❡ ❜♦❞② s✐t❡ ❛ss♦❝✐❛t✐♦♥s ♣r❡❞✐❝t❡❞ ❜② ❣❡♥❡r❛❧✲✹✵✸

✐③❡❞ ❧✐♥❡❛r ♠♦❞❡❧✐♥❣ ♦❢ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡ ❛❣❛✐♥st ❜♦❞② s✐t❡✱ ❛♥❞ ✜♥❡r ❞❡t❛✐❧✹✵✹

✶✹

Page 17: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❛❜♦✉t t❤❡ t❛①♦♥♦♠✐❝ ✐❞❡♥t✐t✐❡s ❛♥❞ ❦♥♦✇♥ ❡❝♦❧♦❣② ♦❢ ♠♦♥♦♣❤②❧❡t✐❝ t❛①❛ ❜❡✐♥❣✹✵✺

s♣❧✐t✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ t❤❡s❡ ❞❛t❛ ✐♥❞✐❝❛t❡s t❤❛t ❛ ❢❡✇ ❝❧❛❞❡s ❡①♣❧❛✐♥ ❛ ❧❛r❣❡✹✵✻

❢r❛❝t✐♦♥ ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛✱ ❛♥❞ ♠❛♥② ♠♦r❡ ❝❧❛❞❡s ❝❛♥ ❜❡ ✐❞❡♥t✐✜❡❞ ❛s✹✵✼

❝♦♥t❛✐♥✐♥❣ t❤❡ s❛♠❡ ✐♥tr✐❝❛t❡ ❞❡t❛✐❧ ❛s t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ♣r❡s❡♥t❡❞ ❜❡❧♦✇✳✹✵✽

❚❤❡ ❜✐♦❧♦❣② ♦❢ ♠✐❝r♦❜✐❛❧ ❤✉♠❛♥✲❜♦❞②✲s✐t❡ ❛ss♦❝✐❛t✐♦♥ ❝❛♥ ❢♦❝✉s ♦♥ t❤❡s❡ ❞♦♠✲✹✵✾

✐♥❛♥t ❢❛❝t♦rs ✲ ✇❤✐❝❤ tr❛✐ts ❛♥❞ ❡✈♦❧✉t✐♦♥❛r② ❤✐st♦r② ❞r✐✈❡ t❤❡s❡ ♠♦♥♦♣❤②❧❡t✐❝✹✶✵

❣r♦✉♣s✬ str♦♥❣✱ ❝♦♠♠♦♥ ❛ss♦❝✐❛t✐♦♥ ✇✐t❤ ❜♦❞② s✐t❡s❄✹✶✶

❚❤❡ ✜rst ❢❛❝t♦r ✭P = 4.90 × 10−30✮ s♣❧✐t ❆❝t✐♥♦❜❛❝t❡r✐❛ ❛♥❞ ❆❧♣❤❛✲✱ ❇❡t❛✲✹✶✷

✱ ●❛♠♠❛✲✱ ❛♥❞ ❉❡❧t❛✲♣r♦t❡♦❜❛❝t❡r✐❛ ❢r♦♠ ❊♣s✐❧♦♥♣r♦t❡♦❜❛❝t❡r✐❛ ❛♥❞ t❤❡ r❡st✹✶✸

✭❋✐❣✳ ❙✹✮✳ ❚❤❡ ✉♥❞❡r❧②✐♥❣ ❣❡♥❡r❛❧✐③❡❞ ❧✐♥❡❛r ♠♦❞❡❧ ♣r❡❞✐❝ts t❤❡ ❆❝t✐♥♦❜❛❝t❡r✐❛✹✶✹

❛♥❞ ♥♦♥✲❊♣s✐❧♦♥✲♣r♦t❡♦❜❛❝t❡r✐❛ t♦ ❜❡ ✵✳✹① ❛s ❛❜✉♥❞❛♥t ❛s t❤❡ r❡st ✐♥ t❤❡ ❣✉t ❛♥❞✹✶✺

✸✳✻① ❛s ❛❜✉♥❞❛♥t ❛s t❤❡ r❡st ✐♥ t❤❡ t♦♥❣✉❡✳ ❚❤❡ ❆❝t✐♥♦❜❛❝t❡r✐❛ ✐❞❡♥t✐✜❡❞ ❛s ♠♦r❡✹✶✻

❛❜✉♥❞❛♥t ✐♥ t❤❡ t♦♥❣✉❡ ✐♥❝❧✉❞❡ ❢♦✉r ♠❡♠❜❡rs ♦❢ t❤❡ ♣❧❛q✉❡✲❛ss♦❝✐❛t❡❞ ❢❛♠✐❧②✹✶✼

❆❝t✐♥♦♠②❝❡t❛❝❡❛❡✱ ♦♥❡ ✉♥❝❧❛ss✐✜❡❞ s♣❡❝✐❡s ♦❢ ❈♦r♥②❜❛❝t❡r✐✉♠✱ t❤r❡❡ ♠❡♠❜❡rs✹✶✽

♦❢ t❤❡ ♠♦✉t❤✲❛ss♦❝✐❛t❡❞ ❣❡♥✉s ❘♦t❤✐❛ ❬✷✸❪✱ ❛♥❞ ♦♥❡ ✉♥❝❧❛ss✐✜❡❞ s♣❡❝✐❡s ♦❢ t❤❡✹✶✾

✈❛❣✐♥❛❧✲❛ss♦❝✐❛t❡❞ ❣❡♥✉s ❆t♦♣♦❜✐✉♠ ❬✾❪✳ ❲✐t❤ ❛ st❛♥❞❛r❞ ♠✉❧t✐✈❛r✐❛t❡ ❛♥❛❧②s✐s✹✷✵

♦❢ t❤❡ ❈▲❘✲tr❛♥s❢♦r♠❡❞ ❞❛t❛✱ ❛❧❧ ♥✐♥❡ ♦❢ t❤❡s❡ ❆❝t✐♥♦❜❛❝t❡r✐❛ ✇❡r❡ ✐❞❡♥t✐✜❡❞✹✷✶

❛s s✐❣♥✐✜❝❛♥t❧② ♠♦r❡ ❛❜✉♥❞❛♥t ✐♥ t❤❡ t♦♥❣✉❡ ❢r♦♠ r❡❣r❡ss✐♦♥ ♦❢ t❤❡ ✐♥❞✐✈✐❞✉❛❧✹✷✷

❖❚❯s ✇❤❡♥ ✉s✐♥❣ ❡✐t❤❡r ❛ ✶✪ ❢❛❧s❡✲❞✐s❝♦✈❡r② r❛t❡ ♦r ❛ ❇♦♥❢❡r♦♥♥✐ ❝♦rr❡❝t✐♦♥ ✲✹✷✸

t❤❡s❡ ♠♦♥♦♣❤②❧❡t✐❝ t❛①❛ ❛❧❧ ✐♥❞✐✈✐❞✉❛❧❧② s❤♦✇ ❛ str♦♥❣ ♣r❡❢❡r❡♥❝❡ ❢♦r t❤❡ s❛♠❡✹✷✹

❜♦❞② s✐t❡✱ ❛♥❞ t❤❡✐r ❜❛s❛❧ ❜r❛♥❝❤ ✇❛s ✐❞❡♥t✐✜❡❞ ❛s ♦✉r ✜rst ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝✲✹✷✺

t♦r✳ ❚❤❡ r❡♠❛✐♥✐♥❣ ❆❧♣❤❛✲✱ ❇❡t❛✲✱ ●❛♠♠❛✲ ❛♥❞ ❉❡❧t❛✲♣r♦t❡♦❜❛❝t❡r✐❛ ❣r♦✉♣❡❞✹✷✻

✇✐t❤ t❤❡ ❆❝t✐♥♦❜❛❝t❡r✐❛ ❝♦♥s✐st❡❞ ♦❢ ✸✶ ❖❚❯s✱ ❛♥❞ t❤❡ ❊♣s✐❧♦♥♣r♦t❡♦❜❛❝t❡✐r❛✹✷✼

s♣❧✐t ❢r♦♠ t❤❡ r❡st ✇❡r❡ t❤r❡❡ ✉♥❝❧❛ss✐✜❡❞ s♣❡❝✐❡s ♦❢ t❤❡ ❣❡♥✉s ❈❛♠♣②❧♦❜❛❝t❡r✳✹✷✽

❚❤❡ ❣r♦✉♣✐♥❣ ♦❢ ❆❝t✐♥♦❜❛❝t❡r✐❛ ✇✐t❤ t❤❡ ♥♦♥✲❊♣s✐❧♦♥ Pr♦t❡♦❜❛❝t❡r✐❛ ♠♦t✐✈❛t❡s✹✷✾

t❤❡ ♥❡❡❞ ❢♦r ❛❝❝✉r❛t❡ ♣❤②❧♦❣❡♥✐❡s ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✱ ❜✉t ❛❧s♦ ✐❧❧✉str❛t❡s t❤❡✹✸✵

♣r♦♠✐s❡ ♦❢ ✐❞❡♥t✐❢②✐♥❣ ❝❧❛❞❡s ♦❢ ✐♥t❡r❡st ✇❤❡r❡ t❤❡ ♣❤②❧♦❣❡♥② ✐s ❝♦rr❡❝t ❛♥❞ t❤❡✹✸✶

t❛①♦♥♦♠② ✐s ♥♦t✳✹✸✷

❚❤❡ s❡❝♦♥❞ ❢❛❝t♦r ✭P = 1.15 × 10−31✮ s♣❧✐ts ✶✻ ❋✐r♠✐❝✉t❡s ♦❢ t❤❡ ❝❧❛ss ❇❛❝✐❧❧✐✹✸✸

❢r♦♠ t❤❡ ♦❜❧✐❣❛t❡❧② ❛♥❛❡r♦❜✐❝ ❋✐r♠✐❝✉t❡s ❝❧❛ss ❈❧♦str✐❞✐❛ ❛♥❞ t❤❡ r❡♠❛✐♥✐♥❣✹✸✹

♣❛r❛♣❤②❧❡t✐❝ ❣r♦✉♣ ❝♦♥t❛✐♥✐♥❣ ❊♣s✐❧♦♥♣r♦t❡♦❜❛❝t❡r✐❛ ❛♥❞ t❤❡ r❡st✳ ❚❤❡ ❇❛❝✐❧❧✐✹✸✺

❛r❡✱ ♦♥ ❛✈❡r❛❣❡✱ ✵✳✸① ❛s ❛❜✉♥❞❛♥t ✐♥ t❤❡ ❣✉t ❛s t❤❡ ♣❛r❛♣❤②❧❡t✐❝ r❡♠❛✐♥✐♥❣ ❖❚❯s✹✸✻

❛♥❞ ✸✳✾① ❛s ❛❜✉♥❞❛♥t ✐♥ t❤❡ t♦♥❣✉❡✳ ❚❤❡ ✶✻ ❇❛❝✐❧❧✐ ❖❚❯s ❢❛❝t♦r❡❞ ❤❡r❡ ❝♦♥t❛✐♥✹✸✼

✶✷ ✉♥❝❧❛ss✐✜❡❞ s♣❡❝✐❡s ♦❢ t❤❡ ❣❡♥✉s ❙tr❡♣t♦❝♦❝❝✉s✱ ✇❡❧❧ ❦♥♦✇♥ ❢♦r ✐ts ❛ss♦❝✐❛t✐♦♥✹✸✽

✇✐t❤ t❤❡ ♠♦✉t❤ ❬✷✵❪✱ ♦♥❡ ♠❡♠❜❡r ♦❢ t❤❡ ❣❡♥✉s ▲❛❝t♦❝♦❝❝✉s✱ ♦♥❡ ✉♥❝❧❛ss✐✜❡❞✹✸✾

s♣❡❝✐❡s ♦❢ t❤❡ ♠✉❝♦s❛❧✲❛ss♦❝✐❛t❡❞ ❣❡♥✉s ●❡♠❡❧❧❛✱ ❛♥❞ t✇♦ ♠❡♠❜❡rs t❤❡ ❢❛♠✐❧②✹✹✵

✶✺

Page 18: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❈❛r♥♦❜❛❝t❡r✐❛❝❡❛❡ ♦❢t❡♥ ❛ss♦❝✐❛t❡❞ ✇✐t❤ ✜s❤ ❛♥❞ ♠❡❛t ♣r♦❞✉❝ts ❬✷✺❪✳✹✹✶

❚❤❡ t❤✐r❞ ❢❛❝t♦r ✭P = 1.37× 10−28✮ s❡♣❛r❛t❡❞ ✶✺ ♠❡♠❜❡rs ♦❢ t❤❡ ❇❛❝t❡r♦✐❞❡t❡s✹✹✷

❢❛♠✐❧② Pr❡✈♦t❡❧❧❛❝❡❛❡ ❢r♦♠ ❛❧❧ ♦t❤❡r ❇❛❝t❡r♦✐❞❡t❡s ❛♥❞ t❤❡ r❡♠❛✐♥✐♥❣ ♣❛r❛✲✹✹✸

♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ ❖❚❯s ♥♦t s♣❧✐t ❜② ♣r❡✈✐♦✉s ❢❛❝t♦rs✳ ❚❤❡ Pr❡✈♦t❡❧❧❛❝❡❛❡ s♣❧✐t ✐♥✹✹✹

t❤❡ t❤✐r❞ ❢❛❝t♦r ✇❡r❡ ❛❧❧ ♦❢ t❤❡ ❣❡♥✉s Pr❡✈♦t❡❧❧❛✱ ✐♥❝❧✉❞✐♥❣ t❤❡ s♣❡❝✐❡s Pr❡✈♦t❡❧❧❛✹✹✺

♠❡❧❛♥✐♥♦❣❡♥✐❝❛ ❛♥❞ Pr❡✈♦t❡❧❧❛ ♥❛♥❝❡✐❡♥s✐s ❢♦✉♥❞ t♦ ❤❛✈❡ ❛❜✉♥❞❛♥❝❡s ✵✳✸① ❛s✹✹✻

❛❜✉♥❞❛♥t ✐♥ t❤❡ ❣✉t ❛♥❞ ✹✳✵① ❛s ❛❜✉♥❞❛♥t ✐♥ t❤❡ t♦♥❣✉❡ r❡❧❛t✐✈❡ t♦ t❤❡ ♦t❤❡r✹✹✼

t❛①❛ ❢r♦♠ ✇❤✐❝❤ t❤❡② ✇❡r❡ s♣❧✐t✳✹✹✽

❚❤❡s❡ ✜rst t❤r❡❡ ❢❛❝t♦rs ❝❛♣t✉r❡ ♠❛❥♦r ❜❧♦❝❦s ✈✐s✐❜❧❡ ✐♥ t❤❡ ❞❛t❛s❡t ❝❛♥ ❜❡ ✉s❡❞✹✹✾

❛s ❞✐♠❡♥s✐♦♥❛❧✐t② r❡❞✉❝t✐♦♥ t♦♦❧ ✇✐t❤ ❛ ♣❤②❧♦❣❡♥❡t✐❝ ✐♥t❡r♣r❡t❛t✐♦♥ ✭❋✐❣✳ ✹✮✳✹✺✵

❲❤✐❧❡ tr❛❞✐t✐♦♥❛❧ ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ t♦♦❧s ♠❛② ❝❛♣t✉r❡ ❧❛r❣❡r ❢r❛❝t✐♦♥s ♦❢✹✺✶

✈❛r✐❛t✐♦♥ ♦❢ t❤❡ ❞❛t❛✱ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦r✐③❛t✐♦♥ ②✐❡❧❞s ❛ ❢❡✇ ✈❛r✐❛❜❧❡s ✲ r❛t✐♦s ♦❢✹✺✷

❝❧❛❞❡s ✲ ✇❤✐❝❤ ❝❛♣t✉r❡ ❧❛r❣❡ ❜❧♦❝❦s ♦❢ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛ ❛♥❞ ❝❛♥ ❜❡ tr❛❝❡❞ t♦✹✺✸

s✐♥❣❧❡ ❡❞❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥② ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ♠❡❛♥✐♥❣❢✉❧ s♣❧✐ts ❜❡t✇❡❡♥ t❛①❛✱✹✺✹

❡❞❣❡s ✇❤❡r❡ tr❛✐ts ❧✐❦❡❧② ❛r♦s❡ ✇❤✐❝❤ ❣♦✈❡r♥ t❤❡ ❞✐✛❡r❡♥t✐❛❧ ❛❜✉♥❞❛♥❝❡s ❛❝r♦ss✹✺✺

s❛♠♣❧❡ s✐t❡s ❛♥❞ ❡♥✈✐r♦♥♠❡♥t❛❧ ❣r❛❞✐❡♥ts ♦r r❡s♣♦♥s❡s t♦ tr❡❛t♠❡♥ts ✭❋✐❣✳ ✹❜✱✹✺✻

s✉♣♣❧❡♠❡♥t❛❧ ❋✐❣s✳ ❙✹✲❙✽✮✳✹✺✼

❯s✐♥❣ t❤❡ ❑❙✲t❡st st♦♣♣✐♥❣ ❝r✐t❡r✐♦♥✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇❛s t❡r♠✐♥❛t❡❞ ❛t ✶✹✷✹✺✽

❢❛❝t♦rs✱ ❡❛❝❤ ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛ ❜r❛♥❝❤ ✐♥ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡ s❡♣❛r❛t✐♥❣ t✇♦✹✺✾

❣r♦✉♣s ♦❢ ❖❚❯s ❜❛s❡❞ ♦♥ t❤❡✐r ❞✐✛❡r❡♥t✐❛❧ ❛❜✉♥❞❛♥❝❡s ✐♥ t❤❡ t♦♥❣✉❡ ❛♥❞ ❢❡✲✹✻✵

❝❡s✳ ❚❤❡s❡ ✶✹✷ ❢❛❝t♦rs ❞❡✜♥❡ ✶✹✸ ❣r♦✉♣s✱ ♦r ✇❤❛t ✇❡ ❝❛❧❧ ✬❜✐♥s✬✱ ♦❢ t❛①❛ ✇❤✐❝❤✹✻✶

r❡♠❛✐♥ ✉♥s♣❧✐t ❜② t❤❡ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✳ ❚❤❡ ❜✐♥s ✈❛r② ✐♥ s✐③❡❀ ✶✶✷ ❜✐♥s ❝♦♥✲✹✻✷

t❛✐♥❡❞ ♦♥❧② s✐♥❣❧❡ ❖❚❯s✱ ✇❤❡r❡❛s ✽ ✇❡r❡ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡s ❛♥❞ t❤❡ r❡st ❛r❡✹✻✸

♣❛r❛♣❤②❧❡t✐❝ ❣r♦✉♣s ♦❢ ❖❚❯s✱ t❤❡ r❡s✉❧t ♦❢ t❛①❛ ✇✐t❤✐♥ ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣✹✻✹

❜❡✐♥❣ ❢❛❝t♦r❡❞✱ ②✐❡❧❞✐♥❣ ♦♥❡ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ❛♥❞ ♦♥❡ ♣❛r❛♣❤②❧❡t✐❝ ❣r♦✉♣✳ ❖❢✹✻✺

t❤❡ ✶✶✷ s✐♥❣❧❡✲❖❚❯ ❜✐♥s ❡①tr❛❝t❡❞ ❢r♦♠ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✱ ✼✽ ✇❡r❡ ❛❧s♦ ✐❞❡♥✲✹✻✻

t✐✜❡❞ ❛s s✐❣♥✐✜❝❛♥t ❛t ❛ ❢❛❧s❡✲❞✐s❝♦✈❡r② r❛t❡ ♦❢ ✶✪✳ ❙♦♠❡ ♠♦♥♦♣❤②❧❡t✐❝ ❜✐♥s✹✻✼

✐♥❝❧✉❞❡❞ ❣r♦✉♣s ♦❢ ✉♥❝❧❛ss✐✜❡❞ ❣❡♥❡r❛ t❤❛t ✇♦✉❧❞ ♥♦t ❜❡ ❣r♦✉♣❡❞ ❛t t❤❡ ❣❡♥✉s✹✻✽

❧❡✈❡❧ ✉♥❞❡r st❛♥❞❛r❞ t❛①♦♥♦♠②✲❜❛s❡❞ ❛♥❛❧②s❡s✳ ❋♦r ✐♥st❛♥❝❡✱ t✇♦ ♠♦♥♦♣❤②❧❡t✐❝✹✻✾

❝❧❛❞❡s ♦❢ t❤❡ ❋✐r♠✐❝✉t❡s ❢❛♠✐❧② ▲❛❝❤♥♦s♣✐r❛❝❡❛❡ ✇❡r❡ ✐❞❡♥t✐✜❡❞ ❛s ❤❛✈✐♥❣ ❞✐❢✲✹✼✵

❢❡r❡♥t ♣r❡❢❡rr❡❞ ❜♦❞② s✐t❡s✱ ②❡t ❜♦t❤ ❝❧❛❞❡s ✇❡r❡ ✉♥❝❧❛ss✐✜❡❞ ❛t t❤❡ ❣❡♥✉s ❧❡✈❡❧✳✹✼✶

❚❛①♦♥♦♠✐❝✲❜❛s❡❞ ❛♥❛❧②s❡s ✇♦✉❧❞ ❡✐t❤❡r ♦♠✐t t❤❡s❡ ✉♥❝❧❛ss✐✜❡❞ ❣❡♥❡r❛✱ ♦r ❣r♦✉♣✹✼✷

t❤❡♠ t♦❣❡t❤❡r ❛♥❞ ♠❛❦❡ ✐t ❞✐✣❝✉❧t t♦ ♦❜s❡r✈❡ ❛ s✐❣♥❛❧ ❞✉❡ t♦ t❤❡ t✇♦ s✉❜✲❣r♦✉♣s✹✼✸

❤❛✈✐♥❣ ❞✐✛❡r❡♥t r❡s♣♦♥s❡s t♦ ❜♦❞② s✐t❡✳✹✼✹

✶✻

Page 19: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❙♦✐❧ ▼✐❝r♦❜✐♦♠❡✹✼✺

❚❤❡ s♦✐❧ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡t ✇❛s ♠✉❝❤ ❧❛r❣❡r ✲ ✸✱✸✼✾ ❖❚❯s ❛♥❞ ✺✽✵ s❛♠♣❧❡s✹✼✻

✲ ❛♥❞ ❛ ♠✉❝❤ s♠❛❧❧❡r ❢r❛❝t✐♦♥ ♦❢ t❤❡ ✈❛r✐❛t✐♦♥ ❝♦✉❧❞ ❜❡ ❡①♣❧❛✐♥❡❞ ❜② t❤❡ ❞♦♠✲✹✼✼

✐♥❛♥t ❢❛❝t♦rs r❡s✉❧t✐♥❣ ❢r♦♠ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝♦♥✜r♠❡❞✹✼✽

t❤❛t t❤❡ ♣❍ ♦❢ t❤❡ ❡♥✈✐r♦♥♠❡♥t ♣❧❛②s ❛ ❞♦♠✐♥❛♥t r♦❧❡ ✐♥ t❤❡ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉✲✹✼✾

♥✐t② ❝♦♠♣♦s✐t✐♦♥✱ ❝♦♥s✐st❡♥t ✇✐t❤ ♣r❡✈✐♦✉s ❛♥❛❧②s❡s ❜❛s❡❞ ♦♥ ▼❛♥t❡❧ t❡sts ❬✸✻❪✳✹✽✵

❉♦♠✐♥❛♥❝❡ ❛♥❛❧②s✐s ♦❢ t❤❡ ❣❡♥❡r❛❧✐③❡❞ ❧✐♥❡❛r ♠♦❞❡❧s ❛ss♦❝✐❛t❡❞ ✇✐t❤ ❡❛❝❤ ❢❛❝t♦r✹✽✶

❞❡t❡r♠✐♥❡❞ ♣❍ t♦ ❛❝❝♦✉♥t ❢♦r ❛♣♣r♦①✐♠❛t❡❧② ✾✷✳✽✼✪✱ ✽✾✳✼✽✪✱ ❛♥❞ ✾✷✳✾✹✪ ♦❢ t❤❡✹✽✷

❡①♣❧❛✐♥❡❞ ✈❛r✐❛♥❝❡ ✐♥ t❤❡ ✜rst✱ s❡❝♦♥❞✱ ❛♥❞ t❤✐r❞ ❢❛❝t♦r✱ r❡s♣❡❝t✐✈❡❧②✳ ❈ ❛♥❞ ◆✹✽✸

✇❡r❡ r❡❧❛t✐✈❡❧② ✉♥✐♠♣♦rt❛♥t✱ ❛♥❞ t❤❡ ❞♦♠✐♥❛♥❝❡ ♦❢ ♣❍ ✐♥ t❤❡ ✜rst t❤r❡❡ ❢❛❝t♦rs✹✽✹

❝❛♥ ❜❡ ✈✐s✉❛❧✐③❡❞ ❜② ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ ♣❧♦ts ♦❢ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s ♦❢ t❤❡✹✽✺

✜rst t❤r❡❡ ❢❛❝t♦rs ✭❋✐❣✳ ✺❛✮✳✹✽✻

❚❤❡ ✜rst ❢❛❝t♦r s♣❧✐ts ❛ ❣r♦✉♣ ♦❢ ✷✵✻ ❖❚❯s ✐♥ t✇♦ ❝❧❛ss❡s ♦❢ ❆❝✐❞♦❜❛❝t❡r✐❛ ❢r♦♠✹✽✼

❛❧❧ ♦t❤❡r ❜❛❝t❡r✐❛✿ ❝❧❛ss ❆❝✐❞♦❜❛❝t❡r✐✐❛ ❛♥❞ ❝❧❛ss ❉❆✵✺✷ ❛r❡ s❤♦✇♥ t♦ ❞❡❝r❡❛s❡ ✐♥✹✽✽

r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡ ✇✐t❤ ✐♥❝r❡❛s✐♥❣ ♣❍✳ ❚❤❡ s❡❝♦♥❞ ❢❛❝t♦r s♣❧✐t ✸✶ ❖❚❯s ✐♥ t❤❡✹✽✾

♦r❞❡r ❆❝t✐♥♦♠②❝❡t❛❧❡s ✭s♦♠❡ ❢r♦♠ t❤❡ ❢❛♠✐❧② ❚❤❡r♠♦♠♦♥♦s♣♦r❛❝❡❛❡ ❛♥❞ t❤❡✹✾✵

r❡st ✉♥❝❧❛ss✐✜❡❞ ❛t t❤❡ ❢❛♠✐❧② ❧❡✈❡❧✮ ❢r♦♠ t❤❡ r❡♠❛✐♥❞❡r ♦❢ ❛❧❧ ♦t❤❡r ❜❛❝t❡r✐❛✱✹✾✶

❛♥❞ t❤❡s❡ ♠♦♥♦♣❤②❧❡t✐❝ ❆❝t✐♥♦♠②❝❡t❛❧❡s ❛❧s♦ ❞❡❝r❡❛s❡ ✐♥ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡✹✾✷

✇✐t❤ ✐♥❝r❡❛s✐♥❣ ♣❍✳ ❚❤❡ t❤✐r❞ ❢❛❝t♦r ✐❞❡♥t✐✜❡❞ ❛♥♦t❤❡r ❝❧❛❞❡ ✇✐t❤✐♥ t❤❡ ♣❤②❧✉♠✹✾✸

❆❝✐❞♦❜❛❝t❡r✐❛ t♦ ❞❡❝r❡❛s❡ ✇✐t❤ ♣❍✿ ✶✶✺ ❜❛❝t❡r✐❛ ❢r♦♠ t❤❡ ❝❧❛ss❡s ❙♦❧✐❜❛❝t❡r❡s✹✾✹

❛♥❞ ❚▼✶✳✹✾✺

❚❤❡ ❢♦✉rt❤ ❢❛❝t♦r ✐❞❡♥t✐✜❡s ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡ ♦❢ ✶✾✸ ❖❚❯s ✐♥ t❤❡ r❡♠❛✐♥❞❡r✹✾✻

♦❢ ♣❤②❧✉♠ ❆❝✐❞♦❜❛❝t❡r✐❛ ✭✐✳❡✳ t❤♦s❡ ❆❝✐❞♦❜❛❝t❡r✐❛ ♥♦t ♠❡♥t✐♦♥❡❞ ❛❜♦✈❡ ✐♥ ❢❛❝✲✹✾✼

t♦rs ✶ ❛♥❞ ✸✮ ❛s ❤❛✈✐♥❣ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s t❤❛t ✐♥❝r❡❛s❡ ✇✐t❤ ♣❍ ✭❞♦♠✐♥❛♥❝❡✹✾✽

❛♥❛❧②s✐s✿ ✾✹✳✼✾✪ ♦❢ ❡①♣❧❛✐♥❡❞ ✈❛r✐❛♥❝❡ ❛ttr✐❜✉t❛❜❧❡ t♦ ♣❍✮✳ ❯♥❧✐❦❡ t❤❡ ♣r❡✈✐♦✉s✹✾✾

t❤r❡❡ ❢❛❝t♦rs ❛❜♦✈❡ ✇❤✐❝❤ ✇❡r❡ ❛❝✐❞♦♣❤✐❧❡s✱ t❤✐s ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ ❆❝✐✲✺✵✵

❞♦❜❛❝t❡r✐❛ ❝♦♥s✐sts ♦❢ ❛❧❦❛❧✐♣❤✐❧❡s✱ ✇❤✐❝❤ ✐♥❝❧✉❞❡s t❤❡ ❝❧❛ss❡s ❆❝✐❞♦❜❛❝t❡r✐❛✲✻✱✺✵✶

❈❤❧♦r❛❝✐❞♦❜❛❝t❡r✐❛✱ ❙✵✺✸ ❛♥❞ t❤r❡❡ ❖❚❯s ✉♥❝❧❛ss✐✜❡❞ ❛t t❤❡ ❝❧❛ss ❧❡✈❡❧✳✺✵✷

❚❤❡ ✜rst ❢♦✉r ❢❛❝t♦rs ❞❡✜♥❡ ✺ ❜✐♥s ♦❢ ❖❚❯s t❤❛t ✇❡ r❡❢❡r t♦ ❛s ✏❜✐♥♥❡❞ ♣❤②❧♦✲✺✵✸

❣❡♥❡t✐❝ ✉♥✐ts✑ ♦r ❇P❯s✿ ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ ❆❝✐❞♦❜❛❝t❡r✐❛ ✭❝❧❛ss❡s ❈❤❧♦✲✺✵✹

r❛❝✐❞♦❜❛❝t❡r✐❛✱ ❆❝✐❞♦❜❛❝t❡r✐❛✲✻✱ ❛♥❞ ❙✵✸✺✮✱ ❛♥♦t❤❡r ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ ❆❝✐✲✺✵✺

❞♦❜❛❝t❡r✐❛ ✭❝❧❛ss❡s ❙♦❧✐❜❛❝t❡r❡s ❛♥❞ ❚▼✶✮✱ ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ s❡✈❡r❛❧✺✵✻

❢❛♠✐❧✐❡s ♦❢ t❤❡ ♦r❞❡r ❆❝t✐♥♦♠②❝❡t❛❧❡s✱ ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢ ❆❝✐❞♦❜❛❝t❡r✐❛✺✵✼

✭❝❧❛ss❡s ❆❝✐❞♦❜❛❝t❡r✐✐❛ ❛♥❞ ❉❆✵✺✷✷✮✱ ❛♥❞ ❛ ♣❛r❛♣❤②❧❡t✐❝ ❛♠❛❧❣❛♠❛t✐♦♥ ♦❢ t❤❡✺✵✽

r❡♠❛✐♥✐♥❣ t❛①❛✳ ❇✐♥♥✐♥❣ t❤❡ ❖❚❯s ❜❛s❡❞ ♦♥ t❤❡s❡ ❇P❯s t❡❧❧s ❛ s✐♠♣❧✐✜❡❞ st♦r②✺✵✾

♦❢ ❤♦✇ ♣❍ ❞r✐✈❡s ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥ ✭❋✐❣✳ ✺❜✮✳✺✶✵

✶✼

Page 20: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❉✐s❝✉ss✐♦♥✺✶✶

❖✈❡r✈✐❡✇✺✶✷

❲❡ ❤❛✈❡ ✐♥tr♦❞✉❝❡❞ ❛ s✐♠♣❧❡ ❛♥❞ ❣❡♥❡r❛❧✐③❛❜❧❡ ❡①♣❧♦r❛t♦r② ❞❛t❛ ❛♥❛❧②s✐s ❛❧✲✺✶✸

❣♦r✐t❤♠✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✱ t♦ ✐❞❡♥t✐❢② ❝❧❛❞❡s ❞r✐✈✐♥❣ ✈❛r✐❛t✐♦♥ ✐♥ ♠✐❝r♦❜✐♦♠❡✺✶✹

❞❛t❛s❡ts✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐♥t❡❣r❛t❡s ❜♦t❤ t❤❡ ❝♦♠♣♦s✐t✐♦♥❛❧ ❛♥❞ ♣❤②❧♦❣❡♥❡t✐❝✺✶✺

str✉❝t✉r❡ ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ❛♥❞ ♣r♦❞✉❝❡s ♦✉t♣✉ts t❤❛t ❝♦♥t❛✐♥ ❜✐♦❧♦❣✐❝❛❧✺✶✻

✐♥❢♦r♠❛t✐♦♥✿ ❡✛❡❝ts ♦❢ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s ♦♥ ❡❞❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✱ ✐♥❝❧✉❞✲✺✶✼

✐♥❣ t❤❡ t✐♣s ♦❢ t❤❡ tr❡❡ tr❛❞✐t✐♦♥❛❧❧② ❛♥❛❧②③❡❞✳ ❚❤❡ ♦✉t♣✉t ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✺✶✽

❝♦♥t❛✐♥s ❛ s❡q✉❡♥❝❡ ♦❢ ✏❢❛❝t♦rs✑✱ ♦r s♣❧✐ts ✐♥ t❤❡ tr❡❡ ✐❞❡♥t✐❢②✐♥❣ s✉❜✲❣r♦✉♣s ♦❢✺✶✾

t❛①❛ ✇❤✐❝❤ r❡s♣♦♥❞ ❞✐✛❡r❡♥t❧② t♦ tr❡❛t♠❡♥t r❡❧❛t✐✈❡ t♦ ♦♥❡✲❛♥♦t❤❡r✳ ❚❤❡ s♣❧✐ts✺✷✵

✐❞❡♥t✐✜❡❞ ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♥❡❡❞ ♥♦t ❜❡ s♣❧✐ts ✐♥ t❤❡ ▲✐♥♥❛❡❛♥ t❛①♦♥♦♠②✺✷✶

❜✉t ❝❛♥ ✐❞❡♥t✐❢② str♦♥❣ r❡s♣♦♥s❡s ✐♥ ❝❧❛❞❡s ♦❢ ✉♥❝❧❛ss✐✜❡❞ t❛①❛✳ ❚❤❡ r❡s❡❛r❝❤❡r✺✷✷

❞♦❡s ♥♦t ♥❡❡❞ t♦ ❝❤♦♦s❡ ❛ t❛①♦♥♦♠✐❝ ❧❡✈❡❧ ❛t ✇❤✐❝❤ t♦ ♣❡r❢♦r♠ ❛♥❛❧②s✐s ✲ t❤♦s❡✺✷✸

t❛①♦♥♦♠✐❝ ❧❡✈❡❧s ❛r❡ ♦✉t♣✉t ❜❛s❡❞ ♦♥ ✇❤✐❝❤❡✈❡r ❝❧❛❞❡s ♠❛①✐♠✐③❡ t❤❡ ♦❜❥❡❝t✐✈❡✺✷✹

❢✉♥❝t✐♦♥✱ ❛♥❞ s♦ r❡s❡❛r❝❤❡rs ✇✐❧❧ ❜❡ ❛❜❧❡ t♦ ✐❞❡♥t✐❢② ♠✉❧t✐♣❧❡ t❛①♦♥♦♠✐❝ s❝❛❧❡s✺✷✺

♦❢ ✐♠♣♦rt❛♥❝❡✳✺✷✻

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦✉t♣✉ts ❛♥ ✐s♦♠❡tr✐❝ ❧♦❣✲r❛t✐♦ tr❛♥s❢♦r♠ ♦❢ t❤❡ ❞❛t❛ ✇✐t❤✺✷✼

❦♥♦✇♥ ❛s②♠♣t♦t✐❝ ♥♦r♠❛❧✐t② ♣r♦♣❡rt✐❡s✱ ❝♦♦r❞✐♥❛t❡s t❤❛t ❝❛♥ ❜❡ ❛♥❛❧②③❡❞ ✇✐t❤✺✷✽

st❛♥❞❛r❞ ♠✉❧t✐✈❛r✐❛t❡ ♠❡t❤♦❞s ❬✸✷❪✳ ❚❤❡ r❡s✉❧t✐♥❣ ❝♦♦r❞✐♥❛t❡s ❝♦rr❡s♣♦♥❞ t♦✺✷✾

♣❛rt✐❝✉❧❛r ❡❞❣❡s ❜❡t✇❡❡♥ ❝❧❡❛r❧② ✐❞❡♥t✐✜❛❜❧❡ ♥♦❞❡s ✐♥ t❤❡ tr❡❡ ♦❢ ❧✐❢❡✱ ❛❧❧♦✇✲✺✸✵

✐♥❣ r❡s❡❛r❝❤❡rs t♦ ❛♥♥♦t❛t❡ ❛ ❣✐✈❡♥ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡ ✇✐t❤ ❝♦rr❡❧❛t✐♦♥s ❜❡t✇❡❡♥✺✸✶

❝❧❛❞❡s ❛♥❞ ✈❛r✐♦✉s ❡♥✈✐r♦♥♠❡♥t❛❧ ♠❡t❛✲❞❛t❛✱ s❛♠♣❧❡ ❝❛t❡❣♦r✐❡s✱ ♦r ❡①♣❡r✐♠❡♥t❛❧✺✸✷

tr❡❛t♠❡♥ts✳✺✸✸

❚❤❡ ♣❤②❧♦❣❡♥❡t✐❝ ✐♥❢❡r❡♥❝❡s ♦❜t❛✐♥❡❞ ❜② ♣❤②❧♦❢❛❝t♦r ✇♦✉❧❞ ❜❡ ❞✐✣❝✉❧t t♦ ♦❜t❛✐♥✺✸✹

✇✐t❤ ♦t❤❡r ♣r❡✲❡①✐st✐♥❣ ♠❡t❤♦❞s✳ ❚❤r❡❡ ✐t❡r❛t✐♦♥s ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛✐t♦♥ ♦♥ t❤❡✺✸✺

♦r❛❧✴❢❡❝❛❧ ❞❛t❛s❡t ②✐❡❧❞❡❞ t❤❡ t❤r❡❡ ♠❛❥♦r s♣❧✐ts ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✱ ❛❧❧ ♦❢ ✇❤✐❝❤✺✸✻

❛r❡ ❝♦♥s✐st❡♥t ✇✐t❤ ❦♥♦✇♥ ❞✐str✐❜✉t✐♦♥s ♦❢ t❛①❛✱ ♥♦♥❡ ♦❢ ✇❤✐❝❤ ✇♦✉❧❞ ❜❡ r❡✈❡❛❧❡❞✺✸✼

❞✐r❡❝t❧② ❢r♦♠ ❛ t❛①♦♥♦♠✐❝✲❜❛s❡❞ ❛♥❛❧②s✐s✳ ❆❧❣♦r✐t❤♠s s✉❝❤ ❛s ♣❤②❧♦s✐❣♥❛❧ ❬✷✷❪✱✺✸✽

✇❤✐❝❤ tr❛❝❦ P✲✈❛❧✉❡s ✉♣ t❤❡ tr❡❡✱ ②✐❡❧❞ ✐♥❢❡r❡♥❝❡s ✇✐t❤ ♥❡st❡❞ ❞❡♣❡♥❞❡♥❝❡ ❛♥❞✺✸✾

✐❞❡♥t✐❢② ❝❧❛❞❡s ✇✐t❤ ❝♦♠♠♦♥ s✐❣♥✐✜❝❛♥❝❡✱ ②❡t ♥♦t ♥❡❝❡ss❛r✐❧② ❝❧❛❞❡s ✇✐t❤ ❝♦♠✲✺✹✵

♠♦♥ s✐❣♥❛❧ ♦r ❞✐r❡❝t✐♦♥ ♦❢ tr❡♥❞ ✲ ✐t ✐s ❛ ❝♦♠♠♦♥ s✐❣♥❛❧ s✉❝❤ ❛s ❛ s❤❛r❡❞ ❤❛❜✐t❛t✺✹✶

❛ss♦❝✐❛t✐♦♥✱ ♥♦t ❛ ❝♦♠♠♦♥ s✐❣♥✐✜❝❛♥❝❡ s✉❝❤ ❛s t❤❡ ❡①✐st❡♥❝❡ ♦❢ ❛ ❤❛❜✐t❛t ❛s✲✺✹✷

s♦❝✐❛t✐♦♥✱ ✇❤✐❝❤ ❜❡tt❡r ✐♥❞✐❝❛t❡s ❛ ♣✉t❛t✐✈❡ tr❛✐t ❞r✐✈✐♥❣ s❤❛r❡❞ r❡s♣♦♥s❡s ✐♥✺✹✸

♠✐❝r♦❜❡s✳ ■♥ t❤❡ ✶✹✷ ❢❛❝t♦rs ❛❜♦✈❡✱ ♣❤②❧♦❢❛❝t♦r ✐❞❡♥t✐✜❡❞ ♥✉♠❡r♦✉s ❝❧❛❞❡s ✇✐t❤✺✹✹

✶✽

Page 21: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❝♦♠♠♦♥ s✐❣♥✐✜❝❛♥❝❡ ②❡t ❞✐✛❡r❡♥t s✐❣♥❛❧s✳ P❤②❧♦❣❡♥❡t✐❝ ❦❡r♥❡❧✲❜❛s❡❞ ♠❡t❤♦❞s✺✹✺

❬✷✼✱ ✸✺✱ ✸✶❪ ✉s❡ t❤❡ ♣❤②❧♦❣❡♥② t♦ ♠♦❞✐❢② ❞✐st❛♥❝❡s✴❞✐ss✐♠✐❧❛r✐t✐❡s ❜❡t✇❡❡♥ s❛♠✲✺✹✻

♣❧❡s ❛♥❞ ❝❛♥ ❡❛s✐❧② ❞✐✛❡r❡♥t✐❛t❡ ❜❡t✇❡❡♥ ♦r❛❧ ❛♥❞ ❢❡❝❛❧ ❜♦❞② s✐t❡s✱ ❜✉t s✉❝❤✺✹✼

♥♦♠✐♥❛❧❧② s✐♠✐❧❛r ♠❡t❤♦❞s ❝❧❛ss✐❢② ❛♥❞ ❞✐✛❡r❡♥t✐❛t❡ s✐t❡s✱ ✇❤✐❝❤ ✐s ❛ ❝♦♥❝❡♣t✉✲✺✹✽

❛❧❧② ❞✐✛❡r❡♥t ❢r♦♠ ❝❧❛ss✐❢②✐♥❣ ❛♥❞ ♣r❡❞✐❝t✐♥❣ ❖❚❯ ❛❜✉♥❞❛♥❝❡s ❣✐✈❡♥ t❤❡✐r ♣❧❛❝❡✺✹✾

✐♥ t❤❡ ♣❤②❧♦❣❡♥②✳ ❙✐♠✐❧❛r❧②✱ ♣❤②❧♦❣❡♥❡t✐❝ ❝♦♠♣❛r❛t✐✈❡ ♠❡t❤♦❞s ✭P❈▼s✮ ❬✶✸✱ ✷✶❪✺✺✵

s✉❝❤ ❛s ♣❤②❧♦❣❡♥❡t✐❝ ❣❡♥❡r❛❧✐③❡❞ ❧❡❛st sq✉❛r❡s ❬✶✼✱ ✷✾❪ ❛r❡ ♥♦♠✐♥❛❧❧② s✐♠✐❧❛r ❜✉t✺✺✶

❝♦♥❝❡♣t✉❛❧❧② ❞✐st✐♥❝t ❢r♦♠ ♣❤②❧♦❢❛❝t♦r❀ ✇❤✐❧❡ P❈▼s ❛✐♠ ❝♦rr❡❝t ❢♦r t❤❡ ❞❡♣❡♥✲✺✺✷

❞❡♥❝❡ ♦❢ ♦❜s❡r✈❛t✐♦♥s ♦❢ ❡✈♦❧✉t✐♦♥❛r✐❧② r❡❧❛t❡❞ t❛①❛✱ ♦✉r ❣♦❛❧ ✐s ♥♦t t♦ ❝♦rr❡❝t✺✺✸

❢♦r s✉❝❤ ❞❡♣❡♥❞❡♥❝❡ ❜✉t t♦ ✐♥❢❡r ♣r❡❝✐s❡❧② ✇❤❡r❡ ♦♥ t❤❡ tr❡❡ s✉❝❤ ❞❡♣❡♥❞❡♥❝❡✺✺✹

❛♣♣❡❛rs t♦ ❛r✐s❡✳ ❲❤✐❧❡ ♠❛♥② t♦♦❧s ❡①✐st ❢♦r t❤❡ ♥♦♠✐♥❛❧❧② s✐♠✐❧❛r t❛s❦ ♦❢ ✏✉s✲✺✺✺

✐♥❣ t❤❡ ♣❤②❧♦❣❡♥② t♦ ❛♥❛❧②③❡ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛✑✱ t❤❡ ✉t✐❧✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✺✺✻

❧✐❡s ✐♥ ✐ts ❛❜✐❧✐t② t♦ ❝♦♥str✉❝t ✈❛r✐❛❜❧❡s ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❡❞❣❡s ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✺✺✼

❛❧♦♥❣ ✇❤✐❝❤ ♣✉t❛t✐✈❡ ❢✉♥❝t✐♦♥❛❧ ❡❝♦❧♦❣✐❝❛❧ tr❛✐ts ♠❛② ❤❛✈❡ ❛r✐s❡♥✱ ❝♦♥str✉❝t❡❞✺✺✽

s♦ ❛s t♦ ❛✈♦✐❞ ♥❡st❡❞ ❞❡♣❡♥❞❡♥❝❡ ❛♥❞ ♦✈❡r❧❛♣♣✐♥❣ ❝♦♠♣❛r✐s♦♥s t❤❛t ❢r❡q✉❡♥t t❤❡✺✺✾

❛♥❛❧②s✐s ♦❢ ❤✐❡r❛r❝❤✐❝❛❧❧② str✉❝t✉r❡❞✱ ❛♥❞ ❝♦♥str✉❝t❡❞ ✐♥ ❛ ♠❛♥♥❡r ❛♣♣r♦♣r✐❛t❡✺✻✵

❢♦r ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✳✺✻✶

❋✉t✉r❡ ❲♦r❦✺✻✷

❚❤❡ ❣❡♥❡r❛❧✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦♣❡♥s t❤❡ ❞♦♦r t♦ ❢✉t✉r❡ ✇♦r❦ ❡♠♣❧♦②✐♥❣✺✻✸

♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇✐t❤ ♦t❤❡r ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥s✳ ❆s ✇❡ s❤♦✇❡❞ ✇✐t❤ t❤❡ ❤✉✲✺✻✹

♠❛♥ ♦r❛❧✴❢❡❝❛❧ ♠✐❝r♦❜✐♦♠❡s✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s ♥♦t r❡str✐❝t❡❞ t♦ ❜❛s❛❧ ❝❧❛❞❡s✱✺✻✺

❜✉t ✐♥❝❧✉❞❡s t❤❡ t✐♣s ❛s ♣♦ss✐❜❧❡ ❝❧❛❞❡s ♦❢ ✐♥t❡r❡st✱ ❜✉t t❤❡ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥✺✻✻

✇❡ ✉s❡❞ ♠✐♥✐♠✐③❡❞ r❡s✐❞✉❛❧ ✈❛r✐❛♥❝❡ ✐♥ t❤❡ ✇❤♦❧❡ ❝♦♠♠✉♥✐t② ❛♥❞ t❤❡r❡❜② ♠❛②✺✻✼

♣r✐♦r✐t✐③❡ ❞❡❡♣❧② r♦♦t❡❞ ❡❞❣❡s ♦r ❛❜✉♥❞❛♥t t❛①❛ ✇✐t❤ ✇❡❛❦❡r ❡✛❡❝ts ♦✈❡r ✐♥❞✐✈✐❞✲✺✻✽

✉❛❧ ❖❚❯s ✇✐t❤ str♦♥❣❡r ❡✛❡❝ts✳ ❖t❤❡r ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥s ❝♦✉❧❞ ❜❡ ❝♦♥str✉❝t❡❞✺✻✾

t♦ ♠❡❡t t❤❡ ♥❡❡❞s ♦❢ t❤❡ r❡s❡❛r❝❤❡r✳ ■❢ r❡s❡❛r❝❤❡rs ❛r❡ ✐♥t❡r❡st❡❞ ✐♥ ✐❞❡♥t✐❢②✐♥❣✺✼✵

❜❛s❛❧ ❧✐♥❡❛❣❡s✱ t❤❡✐r ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ❝❛♥ ✇❡✐❣❤t ❡❞❣❡s ❜❛s❡❞ ♦♥ ❞✐st❛♥❝❡ ❢r♦♠✺✼✶

t❤❡ t✐♣s✳ ❆ r❡s❡❛r❝❤❡r ✐♥t❡r❡st❡❞ ✐♥ ✜♥❡✲t✉♥✐♥❣ t❤❡ ❡✈♦❧✉t✐♦♥❛r② ❛ss✉♠♣t✐♦♥s✺✼✷

✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❛♥ ❞❡✜♥❡ ❛♥ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ t❤❛t ✐♥❝r❡❛s❡s ❡①♣❧✐❝✐t❧②✺✼✸

✇✐t❤ t❤❡ ❧❡♥❣t❤ ♦❢ t❤❡ ❡❞❣❡ ❜❡✐♥❣ ❝♦♥s✐❞❡r❡❞ t♦ r❡✢❡❝t ❛♥ ❛ss✉♠♣t✐♦♥ t❤❛t t❤❡✺✼✹

♣r♦❜❛❜✐❧✐t② ♦❢ ❛ tr❛✐t ❛r✐s✐♥❣ ✐♥❝r❡❛s❡s ✇✐t❤ t❤❡ ❛♠♦✉♥t ♦❢ t✐♠❡ ❡❧❛♣s❡❞✳✺✼✺

❊❛❝❤ ❡❞❣❡ ✐❞❡♥t✐✜❡❞ ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝♦rr❡s♣♦♥❞s t♦ t✇♦ ❜✐♥s ♦❢ t❛①❛ ♦♥✺✼✻

❡❛❝❤ s✐❞❡ ♦❢ t❤❡ ❡❞❣❡✱ ❛♥❞ ❝♦♥s❡q✉❡♥t❧② ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❜r✐♥❣s ✐♥ t✇♦ ❝♦♠✲✺✼✼

♣❧❡♠❡♥t❛r② ♣❡rs♣❡❝t✐✈❡s ❢♦r ❛♥❛❧②③✐♥❣ t❤❡ ❞❛t❛✿ ❢❛❝t♦r✲❜❛s❡❞ ❛♥❛❧②s✐s ❛♥❞ ❜✐♥✲✺✼✽

❜❛s❡❞ ❛♥❛❧②s✐s✳ ❋❛❝t♦r✲❜❛s❡❞ ❛♥❛❧②s✐s ❧♦♦❦s ❛t t❤❡ ❡❛❝❤ ❢❛❝t♦r ❛s ❛♥ ✐♥❢❡r❡♥❝❡✺✼✾

✶✾

Page 22: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

♦♥ ❛♥ ❡❞❣❡ ✐♥ t❤❡ ♣❤②❧♦❣❡♥②✱ ❝♦♥❞✐t✐♦♥❡❞ ♦♥ t❤❡ ♣r❡✈✐♦✉s ✐♥❢❡r❡♥❝❡s ❛❧r❡❛❞②✺✽✵

♠❛❞❡✱ ❛♥❞ ✐♥❞✐❝❛t✐♥❣ t❤❛t t❛①❛ ♦♥ ♦♥❡ s✐❞❡ ♦❢ ❛♥ ❡❞❣❡ r❡s♣♦♥❞ ❞✐✛❡r❡♥t❧② t♦✺✽✶

t❤❡ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡ ❝♦♠♣❛r❡❞ t♦ t❛①❛ ♦♥ t❤❡ ♦t❤❡r s✐❞❡ ♦❢ t❤❡ ❡❞❣❡✳ ❇✐♥✲✺✽✷

❜❛s❡❞ ❛♥❛❧②s✐s✱ ♦♥ t❤❡ ♦t❤❡r ❤❛♥❞✱ ❧♦♦❦s ❛t t❤❡ s❡t ♦❢ ❝❧❛❞❡s r❡s✉❧t✐♥❣ ❢r♦♠ ❛✺✽✸

❝❡rt❛✐♥ ♥✉♠❜❡r ♦❢ ❢❛❝t♦rs ✲ ✇❤❛t ✇❡ ❝❛❧❧ ❛ ✏❜✐♥♥❡❞ ♣❤②❧♦❣❡♥❡t✐❝ ✉♥✐t✑ ✭❇P❯✮✳✺✽✹

❚❤❡s❡ ❜✐♥s ✇✐❧❧ ❝r❡❛t❡ ❛ ❧♦✇❡r✲❞✐♠❡♥s✐♦♥❛❧✱ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛s❡t ❛♥❞ ❝❛♥ ❜❡✺✽✺

❢r❡❡❞ ❢r♦♠ t❤❡ ✉♥❞❡r❧②✐♥❣ ■▲❘ ❝♦♦r❞✐♥❛t❡s ❢♦r ❞✐✛❡r❡♥t ❛♥❛❧②s❡s ♦♥ t❤❡s❡ ❛♠❛❧✲✺✽✻

❣❛♠❛t❡❞ ❝❧❛❞❡s✳✳ ❇P❯✲❜❛s❡❞ ❛♥❛❧②s✐s ❝❛♥ ✐♥❢♦r♠ s❡q✉❡♥❝❡ ❜✐♥♥✐♥❣ ✐♥ ❢✉t✉r❡✺✽✼

r❡s❡❛r❝❤ ❛✐♠❡❞ ❛t ❝♦♥tr♦❧❧✐♥❣ ❢♦r ♣r❡✈✐♦✉s❧②✲✐❞❡♥t✐✜❡❞ ♣❤②❧♦❣❡♥❡t✐❝ ❝❛✉s❡s ♦❢✺✽✽

✈❛r✐❛t✐♦♥✱ ❛♥❞ ❝♦♠❜✐♥❡ t❤❡ ❡✛❡❝ts ♦❢ ♠✉❧t✐♣❧❡ ✉♣✲str❡❛♠ ❢❛❝t♦rs ❢♦r ♣r❡❞✐❝t✐♦♥s✺✽✾

♦❢ ❖❚❯ ❛❜✉♥❞❛♥❝❡✳ ❙❡❡ t❤❡ s✉♣♣❧❡♠❡♥t❛r② t❡①t ❢♦r ❛ ♠♦r❡ ❞❡t❛✐❧❡❞ ❞✐s❝✉ss✐♦♥✺✾✵

♦❢ ❢❛❝t♦r✲❜❛s❡❞ ❛♥❞ ❜✐♥✲❜❛s❡❞ ❛♥❛❧②s❡s✳✺✾✶

❖♥❡ ❝♦♥❝❡♣t✉❛❧ ❝❤❛❧❧❡♥❣❡ ✇✐t❤ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝r♦ss✲✈❛❧✐❞❛t✐♦♥ ❛♥❞ ❇P❯✲✺✾✷

❜❛s❡❞ ❛♥❛❧②s✐s ✐s t❤❡ tr❡❛t♠❡♥t ♦❢ ♣❛r❛♣❤②❧❡t✐❝ ✏r❡♠❛✐♥❞❡r✑ ❝❧❛❞❡s ✇❤✐❝❤ ♦❝❝✉r✺✾✸

❛❢t❡r ❢❛❝t♦r✐③❛t✐♦♥ ♦❢ ❡❞❣❡s ✇✐t❤✐♥ ❣r♦✉♣s t❤❛t s♣❧✐t ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡ ❢r♦♠✺✾✹

❛ ✏r❡♠❛✐♥❞❡r✑✱ ♣❛r❛♣❤②❧❡t✐❝ ❝❧❛❞❡ ♦r ❇P❯✳ ❚❤❡s❡ ♣❛r❛♣❤②❧❡t✐❝ ❇P❯s ❛r❡ ✐♥✲✺✾✺

❢♦r♠❛t✐✈❡ ❛s t❤❡② ❛r❡ t❤❡ ♥❡❝❡ss❛r② ❝♦♥tr❛sts ✇✐t❤ t❤❡ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣✱✺✾✻

❝♦♥❞✐t✐♦♥❡❞ ♦♥ ♣r❡✈✐♦✉s❧② ✐❞❡♥t✐✜❡❞ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs✱ ❛♥❞ t❤✉s s❡r✈❡ ❛s ❛✺✾✼

♠❡❛♥s ♦❢ ✐❞❡♥t✐❢②✐♥❣ t❤❡ ♣r❡s❡♥❝❡ ♦r ❛❜s❡♥❝❡ ♦❢ ❛ tr❛✐t ✐♥ t❤❡ ♠♦♥♦♣❤②❧❡t✐❝✺✾✽

❣r♦✉♣✳ ❋♦r ✐♥st❛♥❝❡✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ ✈❡rt❡❜r❛t❡s ❜❛s❡❞ ♦♥ t❤❡✐r r❡❧❛t✐✈❡✺✾✾

❛❜✉♥❞❛♥❝❡ ✐♥ t❤❡ ❛✐r s❤♦✉❧❞ ✐❞❡♥t✐❢② t✇♦ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣s✿ ❜✐r❞s ❛♥❞ ❜❛ts✳✻✵✵

❚❤❡ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣s ❛r❡ ❞❡✜♥❡❞ ❜② t❤❡ ♣r❡s❡♥❝❡ ♦❢ ❛ s♣❡❝✐❛❧✐③❡❞ ❛❞❛♣t❛t✐♦♥✻✵✶

✲ ✇✐♥❣s ✲ ❛♥❞ t❤❡ ♣❛r❛♣❤②❧❡t✐❝ r❡♠❛✐♥❞❡r ✐s ❞❡✜♥❡❞ ❜② t❤❡ ❛❜s❡♥❝❡ ♦❢ ✇✐♥❣s❀ ❛♥②✻✵✷

❢✉t✉r❡ tr❡❛t♠❡♥t t❤❛t ❞✐✛❡r❡♥t✐❛❧❧② ❛✛❡❝ts ✇✐♥❣❡❞ ✈s✳ ♥♦♥✲✇✐♥❣❡❞ ♦r❣❛♥✐s♠s✻✵✸

✇✐❧❧ ❞✐✛❡r❡♥t✐❛❧❧② ❛✛❡❝t t❤❡ ❛❜✉♥❞❛♥❝❡s ♦❢ t❤❡s❡ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣s r❡❧❛t✐✈❡✻✵✹

t♦ t❤❡ ♣❛r❛♣❤②❧❡t✐❝ r❡♠❛✐♥❞❡r✳ ❈♦♥✈❡rs❡❧②✱ ✏❧❡❣s✑ ✐♥ t❤❡ ♦r❞❡r ❙q✉❛♠❛t❛ ❞✐s❛♣✲✻✵✺

♣❡❛r❡❞ ❛❧♦♥❣ ❡❞❣❡s ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❧❡❣❧❡ss ❧✐③❛r❞s ❛♥❞ s♥❛❦❡s❀ ✐♥ t❤✐s ❝❛s❡✱ t❤❡✻✵✻

♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡s ❛r❡ ❞❡✜♥❡❞ ❜② t❤❡ ❛❜s❡♥❝❡ ♦❢ ❛ s♣❡❝✐❛❧✐③❡❞ ❛❞❛♣t❛t✐♦♥✱ ❧❡❣s✱✻✵✼

❛♥❞ t❤❡ ♣❛r❛♣❤②❧❡t✐❝ r❡♠❛✐♥❞❡r ✐s ❞❡✜♥❡❞ ❜② t❤❡ ♣r❡s❡♥❝❡ ♦❢ ❧❡❣s✳ ❲❤❡t❤❡r ❛✻✵✽

tr❛✐t ❝❛♥ ❜❡ ✢❡①✐❜❧② ❞❡✜♥❡❞ ❛s ❡✐t❤❡r t❤❡ ♣r❡s❡♥❝❡ ♦❢ ❛❜s❡♥❝❡ ♦❢ ❛ s♣❡❝✐❛❧✐③❡❞✻✵✾

❛❞❛♣t❛t✐♦♥ ✐s ❛♥ ✐♠♣♦rt❛♥t t❤❡♦r❡t✐❝❛❧ ❝♦♥s✐❞❡r❛t✐♦♥✱ ❜✉t ❝♦♥s✐❞❡r✐♥❣ ♠♦♥♦✲✻✶✵

♣❤②❧❡t✐❝ ❇P❯s ❛♥❞ t❤❡✐r ❝♦♠♣❧❡♠❡♥t❛r② ♣❛r❛♣❤②❧❡t✐❝ ❇P❯s ❛s ❡q✉❛❧❧② ❧✐❦❡❧②✻✶✶

t♦ ❤❛✈❡ t❤❡ ♣r❡s❡♥❝❡ ♦r ❛❜s❡♥❝❡ ♦❢ ❛ tr❛✐t ❛♥❞ ❛s t❤❡ ♥❡❝❡ss❛r② ❝♦♠♣❛r✐s♦♥ ❢♦r✻✶✷

❝r♦ss✲✈❛❧✐❞❛t✐♦♥ ✐s ❛♥ ✐♠♣♦rt❛♥t ❡♠♣✐r✐❝❛❧ ❝♦♥s✐❞❡r❛t✐♦♥✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♣✐♥✲✻✶✸

♣♦✐♥ts t❤❡ ❝♦♥tr❛sts t♦ ❜❡ ♠❛❞❡ ❢♦r ♠✐❝r♦❜✐❛❧ ❣❡♥♦♠✐❝ ❛♥❞ ♣❤②s✐♦❧♦❣✐❝❛❧ st✉❞✐❡s✻✶✹

❛✐♠✐♥❣ t♦ ✐❞❡♥t✐❢② t❤❡ ❝❛✉s❡s ♦❢ ❞✐✛❡r❡♥t✐❛❧ ❛❜✉♥❞❛♥❝❡s ♦r r❡s♣♦♥s❡s t♦ tr❡❛t✲✻✶✺

✷✵

Page 23: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

♠❡♥ts ❛❝r♦ss ❖❚❯s✳ ●❡♥♦♠✐❝✴♣❤②s✐♦❧♦❣✐❝❛❧ ✐♥✈❡st✐❣❛t✐♦♥s ❛♥❞ ❝r♦ss✲✈❛❧✐❞❛t✐♦♥✻✶✻

♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♠✉st ✉♥❞❡rst❛♥❞ ❛♥❞ ❣r❛♣♣❧❡ ✇✐t❤ t❤❡ ❝❤♦✐❝❡ ♦❢ ❛♣♣r♦♣r✐✲✻✶✼

❛t❡ ❝♦♠♣❛r✐s♦♥s ♦❢ t❛①❛✱ ❛♥❞ t❤❛t ♠❛② r❡q✉✐r❡ ❛ ❢❛✐r ❝♦♥tr❛st ♦❢ ❛ ♠♦♥♦♣❤②❧❡t✐❝✻✶✽

❝❧❛❞❡ t♦ ❛ ♣❛r❛♣❤②❧❡t✐❝ ♦♥❡✳✻✶✾

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇✐❧❧ ❜❡♥❡✜t ❢r♦♠ ❝♦♠♠✉♥✐t② ❞✐s❝✉ss✐♦♥ ❛♥❞ ❢✉rt❤❡r r❡s❡❛r❝❤✻✷✵

♦✈❡r❝♦♠✐♥❣ ❣❡♥❡r❛❧ st❛t✐st✐❝❛❧ ❝❤❛❧❧❡♥❣❡s ❝♦♠♠♦♥ t♦ ❣r❡❡❞② ❛❧❣♦r✐t❤♠s ❛♥❞ ❛♥❛❧✲✻✷✶

②s✐s ♦❢ ♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✳ ❋♦r ✐♥st❛♥❝❡✱ t❤❡ ❧♦❣✲✻✷✷

r❛t✐♦ tr❛♥s❢♦r♠ ❛t t❤❡ ❤❡❛rt ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ r❡q✉✐r❡s r❡s❡❛r❝❤❡rs ❞❡❛❧ ✇✐t❤✻✷✸

③❡r♦s ✐♥ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛s❡ts✳ ❲❤✐❧❡ t❤❡r❡ ❛r❡ ♠❛♥② ♠❡t❤♦❞s ❢♦r ❞❡❛❧✐♥❣ ✇✐t❤✻✷✹

③❡r♦s ❬✶✱ ✷✽✱ ✸✷❪✱ ✐t✬s ✉♥❝❧❡❛r ✇❤✐❝❤ ♠❡t❤♦❞ ✐s ♠♦st r♦❜✉st ❢♦r ❞♦✇♥str❡❛♠ ♣❤②❧♦✲✻✷✺

❢❛❝t♦r✐③❛t✐♦♥ ♦❢ s♣❛rs❡ ❖❚❯ t❛❜❧❡s✳ ❙❡❝♦♥❞✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛s ♣r❡s❡♥t❡❞ ❤❡r❡✻✷✻

♦♥❧② s❡q✉❡♥t✐❛❧❧② ✐♥❢❡rs ■▲❘ ❡❧❡♠❡♥ts ❛♥❞ ❞♦❡s ♥♦t ❛❧❧♦✇ ❢♦r s✐♠✉❧t❛♥❡♦✉s ✐♥✲✻✷✼

❢❡r❡♥❝❡ ♦❢ ■▲❘ ❜❛s✐s ❡❧❡♠❡♥ts ✲ t❤❡ s❡t ♦❢ ❢❛❝t♦rs ✐❞❡♥t✐✜❡❞ ❛❢t❡r n ✐t❡r❛t✐♦♥s✻✷✽

♠❛② ❡①♣❧❛✐♥ ❧❡ss ✈❛r✐❛t✐♦♥ ❝♦♠❜✐♥❡❞ t❤❛♥ ❛♥ ❛❧t❡r♥❛t✐✈❡ s❡t ♦❢ ❢❛❝t♦rs t❤❛t ❞✐❞✻✷✾

♥♦t ♠❛①✐♠✐③❡ t❤❡ ❡①♣❧❛✐♥❡❞ ✈❛r✐❛♥❝❡ ❛t ❡❛❝❤ ✐t❡r❛t✐♦♥✳ ❚❤✐s ❧✐♠✐t❛t✐♦♥ ♠❛②✻✸✵

❜❡ ♦✈❡r❝♦♠❡ ❜② r✉♥♥✐♥❣ ♠❛♥② r❡♣❧✐❝❛t❡s ♦❢ ❛ st♦❝❤❛st✐❝ ❣r❡❡❞② ❛❧❣♦r✐t❤♠ ❛♥❞✻✸✶

❝❤♦♦s✐♥❣ t❤❛t ✇❤✐❝❤ ♠❛①✐♠✐③❡s t❤❡ ❡①♣❧❛✐♥❡❞ ✈❛r✐❛♥❝❡ ❛❢t❡r n ❢❛❝t♦rs✳ ❚❤✐r❞✱✻✸✷

t❤❡ r❡s❡❛r❝❤❡r ♠✉st ❝❤♦♦s❡ ❛♥ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✇❤✐❝❤ ♠❛t❝❤❡s ❤❡r q✉❡st✐♦♥✱✻✸✸

❛♥❞ ❢✉t✉r❡ r❡s❡❛r❝❤ ❝❛♥ ♠❛♣ ♦✉t ✇❤✐❝❤ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥s ❛r❡ ❛♣♣r♦♣r✐❛t❡ ❢♦r✻✸✹

✇❤✐❝❤ q✉❡st✐♦♥s ✐♥ ♠✐❝r♦❜✐❛❧ ❡❝♦❧♦❣②✳ ❋♦✉rt❤✱ ❧✐❦❡ ❛♥② ♠❡t❤♦❞ ♣❡r❢♦r♠✐♥❣ ✐♥❢❡r✲✻✸✺

❡♥❝❡ ❜❛s❡❞ ♦♥ ♣❤②❧♦❣❡♥❡t✐❝ str✉❝t✉r❡✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛ss✉♠❡s ❛♥ ❛❝❝✉r❛t❡✻✸✻

♣❤②❧♦❣❡♥②✳ ❆❝❝✉r❛t❡ st❛t✐st✐❝❛❧ st❛t❡♠❡♥ts ❛❜♦✉t ❛ r❡s❡❛r❝❤❡r✬s ❝♦♥✜❞❡♥❝❡ ✐♥✻✸✼

♣❤②❧♦❢❛❝t♦rs ♠✉st ✐♥❝♦r♣♦r❛t❡ t❤❡ ✉♥❝❡rt❛✐♥t② ✐♥ ♦✉r ❝♦♥str✉❝t❡❞ ♣❤②❧♦❣❡♥②✳✻✸✽

❋✐❢t❤✱ ♣❤②❧♦❣❡♥❡t✐❝✲❜❛s❡❞ ♠❡t❤♦❞s ♠❛② ❜❡ s❡♥s✐t✐✈❡ t♦ t❤❡ ❜✐♥♥✐♥❣ ❛♥❞ ✜❧t❡r✐♥❣✻✸✾

♠❡t❤♦❞s ❢♦r s❡q✉❡♥❝❡✲❝♦✉♥t ❞❛t❛✳ ❖✉r ✜❧t❡r✐♥❣ ♠❡t❤♦❞s ❤❡r❡ ✇❡r❡ ❝❤♦s❡♥ t♦✻✹✵

❛❧❧♦✇ s♦♠❡✇❤❛t st❛♥❞❛r❞ ❛♥❞ s✐♠♣❧✐✜❡❞ ❛♥❛❧②s✐s ♦❢ ❝♦♠♠♦♥ ❖❚❯s ♣r❡s❡♥t ✐♥✻✹✶

♠❛♥② ♦❢ t❤❡ s❛♠♣❧❡s✱ ❜✉t t❤❡s❡ ✜❧t❡r✐♥❣ ♣r♦t♦❝♦❧s ♠✐❣❤t ♥♦t ❜❡ ♦♣t✐♠❛❧ ❢♦r r❡✲✻✹✷

s❡❛r❝❤❡rs ✇✐t❤ ♦t❤❡r r❡s❡❛r❝❤ q✉❡st✐♦♥s ❛♥❞ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥s✳ ❋✉t✉r❡ ✇♦r❦ ❝❛♥✻✹✸

✐♥✈❡st✐❣❛t❡ t❤❡ s❡♥s✐t✐✈✐t② ♦❢ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ t♦ ♠②r✐❛❞ ❜✐♥♥✐♥❣ ❛♥❞ ✜❧t❡r✐♥❣✻✹✹

♠❡t❤♦❞s ❣✐✈❡♥ t❤❡ ♦❜❥❡❝t✐✈❡s ❛♥❞ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ r❡s❡❛r❝❤❡r✳ ❋✐♥❛❧❧②✱✻✹✺

❢✉t✉r❡ r❡s❡❛r❝❤ ❝❛♥ ✐♥✈❡st✐❣❛t❡ t❤❡ ✉♥✐q✉❡ ❦✐♥❞s ♦❢ ❡rr♦rs ✐♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✿✻✹✻

✐♥ ❛❞❞✐t✐♦♥ t♦ t❤❡ ♠✉❧t✐♣❧❡✲❤②♣♦t❤❡s✐s t❡st✐♥❣ ♦❢ ❡❞❣❡s✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♠❛②✻✹✼

♣r♦♣❛❣❛t❡ ❡rr♦rs ✐♥ t❤❡ ❣r❡❡❞② ❛❧❣♦r✐t❤♠✱ ❛♥❞✱ ❡✈❡♥ ✇❤❡♥ t❛①❛ ❛r❡ ❝♦rr❡❝t❧②✻✹✽

❢❛❝t♦r❡❞ ✐♥t♦ t❤❡ ❛♣♣r♦♣r✐❛t❡ ❢✉♥❝t✐♦♥❛❧ ❜✐♥s✱ t❤❡ ♣r❡s❡♥❝❡ ♦❢ ♠✉❧t✐♣❧❡ ❢❛❝t♦rs✻✹✾

✐♥ t❤❡ s❛♠❡ r❡❣✐♦♥ ♦❢ t❤❡ tr❡❡ ❝❛♥ ❧❡❛❞ t♦ ✉♥❝❡rt❛✐♥t② ❛❜♦✉t t❤❡ ❡①❛❝t ❡❞❣❡✻✺✵

❛❧♦♥❣ ✇❤✐❝❤ ❛ ♣✉t❛t✐✈❡ tr❛✐t ❛r♦s❡ ✭s❡❡ s✉♣♣❧❡♠❡♥t ❢♦r ♠♦r❡ ❞✐s❝✉ss✐♦♥ ♦♥ t❤❡✻✺✶

✷✶

Page 24: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

✉♥❝❡rt❛✐♥t② ♦❢ ✇❤✐❝❤ ❡❞❣❡ t♦ ❛♥♥♦t❛t❡✮✳✻✺✷

■♥❝♦r♣♦r❛t✐♥❣ t❤❛t ♣❤②❧♦❣❡♥❡t✐❝ str✉❝t✉r❡ ✐♥t♦ t❤❡ ❛♥❛❧②s✐s ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✻✺✸

❤❛s ❜❡❡♥ ❛ ♠❛❥♦r ❝❤❛❧❧❡♥❣❡ ❬✸✵❪✱ ❛♥❞ ♥♦✇ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♣r♦✈✐❞❡s ❛ ❣❡♥❡r❛❧✻✺✹

❢r❛♠❡✇♦r❦ ❢♦r r✐❣♦r♦✉s ❡①♣❧♦r❛t✐♦♥ ♦❢ ♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧✻✺✺

❞❛t❛s❡ts✳ ❚❤❡ s♦✐❧ ❞❛t❛s❡t ❛♥❛❧②③❡❞ ❛❜♦✈❡✱ ❢♦r ✐♥st❛♥❝❡✱ ❝♦♥t❛✐♥s ✸✱✸✼✾ ❖❚❯s✻✺✻

❛♥❞ ✺✽✵ s❛♠♣❧❡s✱ ❛♥❞ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ t❤❡ ❝❧❛❞❡s ❛✛❡❝t❡❞ ❜② ♣❍ ✐♥ t❤❡✻✺✼

s♦✐❧ ❞❛t❛s❡t ②✐❡❧❞❡❞ ♥♦t ❥✉st t❤❡ t❤r❡❡ ❞♦♠✐♥❛♥t ❢❛❝t♦rs ✉s❡❞ ❢♦r ♦r❞✐♥❛t✐♦♥✲✻✺✽

✈✐s✉❛❧✐③❛t✐♦♥✱ ❜✉t ✷✱✹✸✵ ❢❛❝t♦rs ✐♥ ❛❧❧✱ ❡❛❝❤ ✇✐t❤ ❛♥ ✐♥tr✐❝❛t❡ ♣❤②❧♦❣❡♥❡t✐❝ st♦r②✳✻✺✾

▼❛♥② ❆❝✐❞♦❜❛❝t❡r✐❛ ❛r❡ ❛❝✐❞♦♣❤✐❧❡s✱ ❜✉t s♦♠❡ ✲ ❈❤❧♦r❛❝✐❞♦❜❛❝t❡r✐❛✱ ❆❝✐❞♦❜❛❝t❡r✐❛✲✻✻✵

✻✱ ❙✵✸✺✱ ❛♥❞ s♦♠❡ ✉♥❞❡s❝r✐❜❡❞ ❝❧❛ss❡s ♦❢ ❜❛❝t❡r✐❛ ❢❛❝t♦r❡❞ ❤❡r❡ ✲ ❛♣♣❡❛r t♦ ❜❡ ❛❧✲✻✻✶

❦❛❧✐♣❤✐❧❡s✳ ❇② ✐♥❝♦r♣♦r❛t✐♥❣ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ str✉❝t✉r❡ ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✱✻✻✷

t❤❡ ❜✐❣ ❞❛t❛ ♦❢ t❤❡ ♠♦❞❡r♥ s❡q✉❡♥❝❡✲❝♦✉♥t ❜♦♦♠ ❥✉st ❣♦t ❜✐❣❣❡r✱ ❛♥❞ ❢✉t✉r❡✻✻✸

r❡s❡❛r❝❤ ✇✐❧❧ ♥❡❡❞ t♦ ❝♦♥s✐❞❡r ❤♦✇ t♦ ♦r❣❛♥✐③❡✱ ❛♥❛❧②③❡ ❛♥❞ ✈✐s✉❛❧✐③❡ t❤❡ ❧❛r❣❡✻✻✹

❛♠♦✉♥ts ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ ❞❡t❛✐❧ t❤❛t ❝❛♥ ♥♦✇ ❜❡ ♦❜t❛✐♥❡❞ ❢r♦♠ t❤❡ ❛♥❛❧②s✐s ♦❢✻✻✺

♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✳✻✻✻

❈♦♥❝❧✉s✐♦♥s✻✻✼

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐s ❛ r♦❜✉st t♦♦❧ ❢♦r ❛♥❛❧②③✐♥❣ ♠❛r❦❡r ❣❡♥❡ s❡q✉❡♥❝❡✲❝♦✉♥t✻✻✽

❞❛t❛s❡ts ❢♦r ♣❤②❧♦❣❡♥❡t✐❝ ♣❛tt❡r♥s ✉♥❞❡r❧②✐♥❣ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t② r❡s♣♦♥s❡s✻✻✾

t♦ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡s✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛❝❝♦✉♥ts ❢♦r t❤❡ ❝♦♠♣♦s✐t✐♦♥❛❧ ♥❛✲✻✼✵

t✉r❡ ♦❢ t❤❡ ❞❛t❛ ❛♥❞ t❤❡ ✉♥❞❡r❧②✐♥❣ ♣❤②❧♦❣❡♥② ❛♥❞ ♣r♦❞✉❝❡s ✐♥❢❡r❡♥❝❡s t❤❛t ❛r❡✻✼✶

✐♥❞❡♣❡♥❞❡♥t ❛♥❞ ♠♦r❡ ♣♦✇❡r❢✉❧ t❤❛♥ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ t❤❡ ■▲❘ tr❛♥s❢♦r♠ t♦ t❤❡✻✼✷

r♦♦t❡❞ ♣❤②❧♦❣❡♥②✳ ❚❤❡ ❘ ♣❛❝❦❛❣❡ ✬♣❤②❧♦❢❛❝t♦r✬ ❤❛s ❜✉✐❧t✲✐♥ ♣❛r❛❧❧❡❧✐③❛t✐♦♥ t❤❛t✻✼✸

❝❛♥ ❜❡ ✉s❡❞ t♦ ❛♥❛❧②③❡ ❧❛r❣❡ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✱ ❛♥❞ ❛❧❧♦✇s ❣❡♥❡r❛❧✐③❡❞ ❧✐♥❡❛r✻✼✹

♠♦❞❡❧✐♥❣ t♦ ✐❞❡♥t✐❢② ❝❧❛❞❡s ✇❤✐❝❤ r❡s♣♦♥❞ t♦ tr❡❛t♠❡♥ts ♦r ♠✉❧t✐♣❧❡ ❡♥✈✐r♦♥✲✻✼✺

♠❡♥t❛❧ ❣r❛❞✐❡♥ts✳✻✼✻

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❛♥ ❝♦♥♥❡❝t t❤❡ ♣✐♣❡❧✐♥❡ ♦❢ ♠✐❝r♦❜✐♦♠❡ st✉❞✐❡s t♦ ❢♦❝✉s❡❞✻✼✼

st✉❞✐❡s ♦❢ ♠✐❝r♦❜✐❛❧ ♣❤②s✐♦❧♦❣②✳ ❆s r❡s❡❛r❝❤❡rs ✐❞❡♥t✐❢② ❧✐♥❡❛❣❡s ✇✐t❤ ♣✉t❛t✐✈❡✻✼✽

❢✉♥❝t✐♦♥❛❧ ❡❝♦❧♦❣✐❝❛❧ r❡s♣♦♥s❡s✱ t❛①❛ ✇✐t❤✐♥ t❤♦s❡ ❧✐♥❡❛❣❡s ✲ ❡✈❡♥ ✐❢ t❤❡② ❛r❡ ♥♦t✻✼✾

t❤❡ s❛♠❡ ❖❚❯s ✲ ❝❛♥ ❜❡ ❝✉❧t✐✈❛t❡❞ ❛♥❞ t❤❡✐r ❣❡♥♦♠❡s s❝r❡❡♥❡❞ t♦ ✉♥❝♦✈❡r t❤❡✻✽✵

♣❤②s✐♦❧♦❣✐❝❛❧ ♠❡❝❤❛♥✐s♠s ✉♥❞❡r❧②✐♥❣ t❤❡ ❧✐♥❡❛❣❡s✬ s❤❛r❡❞ r❡s♣♦♥s❡✳✻✽✶

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✐♠♣r♦✈❡s t❤❡ ♣✐♣❡❧✐♥❡ ❢♦r ❛♥❛❧②③✐♥❣ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ❜②✻✽✷

❛❧❧♦✇✐♥❣ r❡s❡❛r❝❤❡rs t♦ ♦❜❥❡❝t✐✈❡❧② ❞❡t❡r♠✐♥❡ t❤❡ ❛♣♣r♦♣r✐❛t❡ ♣❤②❧♦❣❡♥❡t✐❝ s❝❛❧❡s✻✽✸

❢♦r ❛♥❛❧②③✐♥❣ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts ✲ ❛ ❢❛♠✐❧② ❤❡r❡✱ ❛♥ ✉♥❝❧❛ss✐✜❡❞ s♣❧✐t t❤❡r❡ ✲✻✽✹

✷✷

Page 25: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

✐♥st❡❛❞ ♦❢ ♣❡r❢♦r♠✐♥❣ ♠✉❧t✐♣❧❡ ❝♦♠♣❛r✐s♦♥s ❛t ❡❛❝❤ t❛①♦♥♦♠✐❝ ❧❡✈❡❧✳ ■♥st❡❛❞ ♦❢✻✽✺

♣r✐♥❝✐♣❧❡ ❝♦♠♣♦♥❡♥ts ❛♥❛❧②s✐s ♦r ♣r✐♥❝✐♣❧❡ ❝♦♦r❞✐♥❛t❡s ❛♥❛❧②s✐s✱ ♣❤②❧♦❢❛❝t♦r✐③❛✲✻✽✻

t✐♦♥ ❝❛♥ ❜❡ ✉s❡❞ ❛s ❢♦r ❡①♣❧♦r❛t♦r② ❞❛t❛ ❛♥❛❧②s✐s ❛♥❞ ❞✐♠❡♥s✐♦♥❛❧✐t② r❡❞✉❝t✐♦♥✻✽✼

t♦♦❧ ✐♥ ✇❤✐❝❤ t❤❡ ✏❝♦♠♣♦♥❡♥ts✑ ❛r❡ ✐❞❡♥t✐✜❛❜❧❡ ❝❧❛❞❡s ✐♥ t❤❡ tr❡❡ ♦❢ ❧✐❢❡✱ ❛ ❢❛r✻✽✽

♠♦r❡ ✐♥t✉✐t✐✈❡ ❛♥❞ ✐♥❢♦r♠❛t✐✈❡ ❝♦♠♣♦♥❡♥t ❢♦r ❜✐♦❧♦❣✐❝❛❧ ✈❛r✐❛t✐♦♥ t❤❛♥ ♠✉❧t✐✲✻✽✾

s♣❡❝✐❡s ❧♦❛❞✐♥❣s✳✻✾✵

P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❛♥ ❛❧❧♦✇ r❡s❡❛r❝❤❡rs t♦ ❛♥♥♦t❛t❡ ♦♥❧✐♥❡ ❞❛t❛❜❛s❡s ♦❢ t❤❡✻✾✶

♠✐❝r♦❜✐❛❧ tr❡❡ ♦❢ ❧✐❢❡✱ ♣❡r♠✐tt✐♥❣ ♣r❡❞✐❝t✐♦♥s ❛❜♦✉t t❤❡ ♣❤②s✐♦❧♦❣② ♦❢ ✉♥❝❧❛ss✐✲✻✾✷

✜❡❞ ❛♥❞ ✉♥❝❤❛r❛❝t❡r✐③❡❞ ❧✐❢❡ ❢♦r♠s ❜❛s❡❞ ♦♥ ♣r❡✈✐♦✉s ♣❤②❧♦❣❡♥❡t✐❝ ✐♥❢❡r❡♥❝❡s ✐♥✻✾✸

s❡q✉❡♥❝❡✲❝♦✉♥t ❞❛t❛✳ ❇② ❛❧❧♦✇✐♥❣ r❡s❡❛r❝❤❡rs t♦ ♠❛❦❡ ✐♥❢❡r❡♥❝❡s ♦♥ t❤❡ s❛♠❡✻✾✹

tr❡❡ ❛♥❞ ♣♦t❡♥t✐❛❧❧② ❛♥♥♦t❛t❡ ❛♥ ♦♥❧✐♥❡ tr❡❡ ♦❢ ❧✐❢❡✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♠❛② ❜r✐♥❣✻✾✺

♦♥ ❛ ♥❡✇ ❡r❛ ♦❢ ❝❤❛r❛❝t❡r✐③✐♥❣ ❤✐❣❤✲t❤r♦✉❣❤♣✉t ♣❤②❧♦❣❡♥❡t✐❝ ❛♥♥♦t❛t✐♦♥s✱ ✜❧❧✐♥❣✻✾✻

✐♥ t❤❡ ❣❛♣s t❤❡ ♠✐❝r♦❜✐❛❧ tr❡❡ ♦❢ ❧✐❢❡✳✻✾✼

❆♥ ❘ ♣❛❝❦❛❣❡ ❢♦r ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇✐t❤ ✉s❡r✲❢r✐❡♥❞❧② ♣❛r❛❧❧❡❧✐③❛t✐♦♥ ✐s ♥♦✇✻✾✽

❛✈❛✐❧❛❜❧❡ ♦♥❧✐♥❡ ❛t ❤tt♣s✿✴✴❣✐t❤✉❜✳❝♦♠✴r❡♣t❛❧❡①✴♣❤②❧♦❢❛❝t♦r✳✻✾✾

❆❝❦♥♦✇❧❡❞❣♠❡♥ts✼✵✵

❆❉❲ ✇♦✉❧❞ ❧✐❦❡ t♦ ❛❝❦♥♦✇❧❡❞❣❡ ▲✳ ▼❛ ❢♦r ❤✐s ❢❡❡❞❜❛❝❦ ❛♥❞ ❤❡❧♣ ✐♥❝♦r♣♦r❛t✐♥❣✼✵✶

t❤✐s ♠❡t❤♦❞ ✐♥t♦ t❤❡ st❛t✐st✐❝❛❧ ❧✐t❡r❛t✉r❡✳ ❚❤✐s ♣❛♣❡r ✐s ♣✉❜❧✐s❤❡❞ ❜② s✉♣♣♦rt✼✵✷

❢r♦♠ ❛♥❞ ✐♥ ❧♦✈✐♥❣ ♠❡♠♦r② ♦❢ ❉✳ ◆❡♠❡r❣✉t✳✼✵✸

❘❡❢❡r❡♥❝❡s✼✵✹

❬✶❪ ❏♦❤♥ ❆✐t❝❤✐s♦♥✳ ❚❤❡ st❛t✐st✐❝❛❧ ❛♥❛❧②s✐s ♦❢ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✳ ✶✾✽✻✳✼✵✺

❬✷❪ ❘✐❝❤❛r❞ ❉ ❇❛r❞❣❡tt✱ ❈❤r✐s ❋r❡❡♠❛♥✱ ❛♥❞ ◆✐❝❤♦❧❛s ❏ ❖st❧❡✳ ▼✐❝r♦❜✐❛❧ ❝♦♥✲✼✵✻

tr✐❜✉t✐♦♥s t♦ ❝❧✐♠❛t❡ ❝❤❛♥❣❡ t❤r♦✉❣❤ ❝❛r❜♦♥ ❝②❝❧❡ ❢❡❡❞❜❛❝❦s✳ ❚❤❡ ■❙▼❊✼✵✼

❏♦✉r♥❛❧✱ ✷✭✽✮✿✽✵✺✕✽✶✹✱ ✷✵✵✽✳✼✵✽

❬✸❪ ❘♦❡❧❛♥❞ ▲ ❇❡r❡♥❞s❡♥✱ ❈♦r♥❡ ▼❏ P✐❡t❡rs❡✱ ❛♥❞ P❡t❡r ❆❍▼ ❇❛❦❦❡r✳ ❚❤❡ r❤✐✲✼✵✾

③♦s♣❤❡r❡ ♠✐❝r♦❜✐♦♠❡ ❛♥❞ ♣❧❛♥t ❤❡❛❧t❤✳ ❚r❡♥❞s ✐♥ ♣❧❛♥t s❝✐❡♥❝❡✱ ✶✼✭✽✮✿✹✼✽✕✼✶✵

✹✽✻✱ ✷✵✶✷✳✼✶✶

❬✹❪ ❏ ●r❡❣♦r② ❈❛♣♦r❛s♦✱ ❈❤r✐st✐❛♥ ▲ ▲❛✉❜❡r✱ ❊❧✐③❛❜❡t❤ ❑ ❈♦st❡❧❧♦✱ ❉♦♥♥❛✼✶✷

❇❡r❣✲▲②♦♥s✱ ❆♥t♦♥✐♦ ●♦♥③❛❧❡③✱ ❏❡ss❡ ❙t♦♠❜❛✉❣❤✱ ❉❛♥ ❑♥✐❣❤ts✱ P❛✇❡❧✼✶✸

✷✸

Page 26: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

●❛❥❡r✱ ❏❛❝q✉❡s ❘❛✈❡❧✱ ◆♦❛❤ ❋✐❡r❡r✱ ❡t ❛❧✳ ▼♦✈✐♥❣ ♣✐❝t✉r❡s ♦❢ t❤❡ ❤✉♠❛♥✼✶✹

♠✐❝r♦❜✐♦♠❡✳ ●❡♥♦♠❡ ❇✐♦❧✱ ✶✷✭✺✮✿❘✺✵✱ ✷✵✶✶✳✼✶✺

❬✺❪ ❏ ●r❡❣♦r② ❈❛♣♦r❛s♦✱ ❈❤r✐st✐❛♥ ▲ ▲❛✉❜❡r✱ ❲✐❧❧✐❛♠ ❆ ❲❛❧t❡rs✱ ❉♦♥♥❛ ❇❡r❣✲✼✶✻

▲②♦♥s✱ ❏❛♠❡s ❍✉♥t❧❡②✱ ◆♦❛❤ ❋✐❡r❡r✱ ❙❛r❛❤ ▼ ❖✇❡♥s✱ ❏❛s♦♥ ❇❡t❧❡②✱ ▲♦✉✐s❡✼✶✼

❋r❛s❡r✱ ▼❛r❦✉s ❇❛✉❡r✱ ❡t ❛❧✳ ❯❧tr❛✲❤✐❣❤✲t❤r♦✉❣❤♣✉t ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t②✼✶✽

❛♥❛❧②s✐s ♦♥ t❤❡ ✐❧❧✉♠✐♥❛ ❤✐s❡q ❛♥❞ ♠✐s❡q ♣❧❛t❢♦r♠s✳ ❚❤❡ ■❙▼❊ ❥♦✉r♥❛❧✱✼✶✾

✻✭✽✮✿✶✻✷✶✕✶✻✷✹✱ ✷✵✶✷✳✼✷✵

❬✻❪ ❍✉♠❛♥ ▼✐❝r♦❜✐♦♠❡ Pr♦❥❡❝t ❈♦♥s♦rt✐✉♠ ❡t ❛❧✳ ❙tr✉❝t✉r❡✱ ❢✉♥❝t✐♦♥ ❛♥❞✼✷✶

❞✐✈❡rs✐t② ♦❢ t❤❡ ❤❡❛❧t❤② ❤✉♠❛♥ ♠✐❝r♦❜✐♦♠❡✳ ◆❛t✉r❡✱ ✹✽✻✭✼✹✵✷✮✿✷✵✼✕✷✶✹✱✼✷✷

✷✵✶✷✳✼✷✸

❬✼❪ ❏♦❡❧ ❈r❛❝r❛❢t✳ ❙♣❡❝✐❡s ❝♦♥❝❡♣ts ❛♥❞ s♣❡❝✐❛t✐♦♥ ❛♥❛❧②s✐s✳ ■♥ ❈✉rr❡♥t ♦r✲✼✷✹

♥✐t❤♦❧♦❣②✱ ♣❛❣❡s ✶✺✾✕✶✽✼✳ ❙♣r✐♥❣❡r✱ ✶✾✽✸✳✼✷✺

❬✽❪ ❏♦❡❧ ❈r❛❝r❛❢t✳ ❙♣❡❝✐❡s ❝♦♥❝❡♣ts ✐♥ t❤❡♦r❡t✐❝❛❧ ❛♥❞ ❛♣♣❧✐❡❞ ❜✐♦❧♦❣②✿ ❛ s②st❡♠✲✼✷✻

❛t✐❝ ❞❡❜❛t❡ ✇✐t❤ ❝♦♥s❡q✉❡♥❝❡s✳ ❙♣❡❝✐❡s ❝♦♥❝❡♣ts ❛♥❞ ♣❤②❧♦❣❡♥❡t✐❝ t❤❡♦r②✿✼✷✼

❆ ❞❡❜❛t❡✱ ♣❛❣❡s ✸✵✕✹✸✱ ✷✵✵✵✳✼✷✽

❬✾❪ ❚❛♦ ❉✐♥❣ ❛♥❞ P❛tr✐❝❦ ❉ ❙❝❤❧♦ss✳ ❉②♥❛♠✐❝s ❛♥❞ ❛ss♦❝✐❛t✐♦♥s ♦❢ ♠✐❝r♦❜✐❛❧✼✷✾

❝♦♠♠✉♥✐t② t②♣❡s ❛❝r♦ss t❤❡ ❤✉♠❛♥ ❜♦❞②✳ ◆❛t✉r❡✱ ✺✵✾✭✼✺✵✵✮✿✸✺✼✱ ✷✵✶✹✳✼✸✵

❬✶✵❪ ❏✉❛♥ ❏♦sé ❊❣♦③❝✉❡ ❛♥❞ ❱❡r❛ P❛✇❧♦✇s❦②✲●❧❛❤♥✳ ●r♦✉♣s ♦❢ ♣❛rts ❛♥❞ t❤❡✐r✼✸✶

❜❛❧❛♥❝❡s ✐♥ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ ❛♥❛❧②s✐s✳ ▼❛t❤❡♠❛t✐❝❛❧ ●❡♦❧♦❣②✱ ✸✼✭✼✮✿✼✾✺✕✼✸✷

✽✷✽✱ ✷✵✵✺✳✼✸✸

❬✶✶❪ ❏✉❛♥ ❏♦sé ❊❣♦③❝✉❡✱ ❱❡r❛ P❛✇❧♦✇s❦②✲●❧❛❤♥✱ ●❧òr✐❛ ▼❛t❡✉✲❋✐❣✉❡r❛s✱ ❛♥❞✼✸✹

❈❛r❧❡s ❇❛r❝❡❧♦✲❱✐❞❛❧✳ ■s♦♠❡tr✐❝ ❧♦❣r❛t✐♦ tr❛♥s❢♦r♠❛t✐♦♥s ❢♦r ❝♦♠♣♦s✐t✐♦♥❛❧✼✸✺

❞❛t❛ ❛♥❛❧②s✐s✳ ▼❛t❤❡♠❛t✐❝❛❧ ●❡♦❧♦❣②✱ ✸✺✭✸✮✿✷✼✾✕✸✵✵✱ ✷✵✵✸✳✼✸✻

❬✶✷❪ P❛✉❧ ● ❋❛❧❦♦✇s❦✐✱ ❚♦♠ ❋❡♥❝❤❡❧✱ ❛♥❞ ❊❞✇❛r❞ ❋ ❉❡❧♦♥❣✳ ❚❤❡ ♠✐❝r♦❜✐❛❧✼✸✼

❡♥❣✐♥❡s t❤❛t ❞r✐✈❡ ❡❛rt❤✬s ❜✐♦❣❡♦❝❤❡♠✐❝❛❧ ❝②❝❧❡s✳ s❝✐❡♥❝❡✱ ✸✷✵✭✺✽✼✾✮✿✶✵✸✹✕✼✸✽

✶✵✸✾✱ ✷✵✵✽✳✼✸✾

❬✶✸❪ ❏♦s❡♣❤ ❋❡❧s❡♥st❡✐♥✳ P❤②❧♦❣❡♥✐❡s ❛♥❞ t❤❡ ❝♦♠♣❛r❛t✐✈❡ ♠❡t❤♦❞✳ ❆♠❡r✐❝❛♥✼✹✵

◆❛t✉r❛❧✐st✱ ♣❛❣❡s ✶✕✶✺✱ ✶✾✽✺✳✼✹✶

❬✶✹❪ ◆♦❛❤ ❋✐❡r❡r ❛♥❞ ❘♦❜❡rt ❇ ❏❛❝❦s♦♥✳ ❚❤❡ ❞✐✈❡rs✐t② ❛♥❞ ❜✐♦❣❡♦❣r❛♣❤② ♦❢ s♦✐❧✼✹✷

❜❛❝t❡r✐❛❧ ❝♦♠♠✉♥✐t✐❡s✳ Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ◆❛t✐♦♥❛❧ ❆❝❛❞❡♠② ♦❢ ❙❝✐❡♥❝❡s ♦❢✼✹✸

t❤❡ ❯♥✐t❡❞ ❙t❛t❡s ♦❢ ❆♠❡r✐❝❛✱ ✶✵✸✭✸✮✿✻✷✻✕✻✸✶✱ ✷✵✵✻✳✼✹✹

✷✹

Page 27: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❬✶✺❪ ▼❛r✐❡❧ ▼ ❋✐♥✉❝❛♥❡✱ ❚❤♦♠❛s ❏ ❙❤❛r♣t♦♥✱ ❚✐♠♦t❤② ❏ ▲❛✉r❡♥t✱ ❛♥❞ ❑❛t❤❡r✲✼✹✺

✐♥❡ ❙ P♦❧❧❛r❞✳ ❆ t❛①♦♥♦♠✐❝ s✐❣♥❛t✉r❡ ♦❢ ♦❜❡s✐t② ✐♥ t❤❡ ♠✐❝r♦❜✐♦♠❡❄ ❣❡tt✐♥❣✼✹✻

t♦ t❤❡ ❣✉ts ♦❢ t❤❡ ♠❛tt❡r✳ P❧♦❙ ♦♥❡✱ ✾✭✶✮✿❡✽✹✻✽✾✱ ✷✵✶✹✳✼✹✼

❬✶✻❪ ❏♦♥❛t❤❛♥ ❋r✐❡❞♠❛♥ ❛♥❞ ❊r✐❝ ❏ ❆❧♠✳ ■♥❢❡rr✐♥❣ ❝♦rr❡❧❛t✐♦♥ ♥❡t✇♦r❦s ❢r♦♠✼✹✽

❣❡♥♦♠✐❝ s✉r✈❡② ❞❛t❛✳ P▲♦❙ ❈♦♠♣✉t ❇✐♦❧✱ ✽✭✾✮✿❡✶✵✵✷✻✽✼✱ ✷✵✶✷✳✼✹✾

❬✶✼❪ ❆❧❛♥ ●r❛❢❡♥✳ ❚❤❡ ♣❤②❧♦❣❡♥❡t✐❝ r❡❣r❡ss✐♦♥✳ P❤✐❧♦s♦♣❤✐❝❛❧ ❚r❛♥s❛❝t✐♦♥s ♦❢✼✺✵

t❤❡ ❘♦②❛❧ ❙♦❝✐❡t② ♦❢ ▲♦♥❞♦♥✳ ❙❡r✐❡s ❇✱ ❇✐♦❧♦❣✐❝❛❧ ❙❝✐❡♥❝❡s✱ ✸✷✻✭✶✷✸✸✮✿✶✶✾✕✼✺✶

✶✺✼✱ ✶✾✽✾✳✼✺✷

❬✶✽❪ ❑❡✐t❤ ●r❡❣❣✳ ❊♥❣✐♥❡❡r✐♥❣ ❣✉t ✢♦r❛ ♦❢ r✉♠✐♥❛♥t ❧✐✈❡st♦❝❦ t♦ r❡❞✉❝❡ ❢♦r❛❣❡✼✺✸

t♦①✐❝✐t②✿ ♣r♦❣r❡ss ❛♥❞ ♣r♦❜❧❡♠s✳ ❚r❡♥❞s ✐♥ ❜✐♦t❡❝❤♥♦❧♦❣②✱ ✶✸✭✶✵✮✿✹✶✽✕✹✷✶✱✼✺✹

✶✾✾✺✳✼✺✺

❬✶✾❪ ❯❧r✐❦❡ ●rö♠♣✐♥❣✳ ❘❡❧❛t✐✈❡ ✐♠♣♦rt❛♥❝❡ ❢♦r ❧✐♥❡❛r r❡❣r❡ss✐♦♥ ✐♥ r✿ t❤❡ ♣❛❝❦✲✼✺✻

❛❣❡ r❡❧❛✐♠♣♦✳ ❏♦✉r♥❛❧ ♦❢ st❛t✐st✐❝❛❧ s♦❢t✇❛r❡✱ ✶✼✭✶✮✿✶✕✷✼✱ ✷✵✵✻✳✼✺✼

❬✷✵❪ ❇ ●✉❣❣❡♥❤❡✐♠✳ ❙tr❡♣t♦❝♦❝❝✐ ♦❢ ❞❡♥t❛❧ ♣❧❛q✉❡s✳ ❈❛r✐❡s r❡s❡❛r❝❤✱ ✷✭✷✮✿✶✹✼✕✼✺✽

✶✻✸✱ ✶✾✻✽✳✼✺✾

❬✷✶❪ P❛✉❧ ❍ ❍❛r✈❡② ❛♥❞ ▼❛r❦ ❉ P❛❣❡❧✳ ❚❤❡ ❝♦♠♣❛r❛t✐✈❡ ♠❡t❤♦❞ ✐♥ ❡✈♦❧✉t✐♦♥❛r②✼✻✵

❜✐♦❧♦❣②✱ ✈♦❧✉♠❡ ✷✸✾✳ ❖①❢♦r❞ ✉♥✐✈❡rs✐t② ♣r❡ss ❖①❢♦r❞✱ ✶✾✾✶✳✼✻✶

❬✷✷❪ ❋r❛♥ç♦✐s ❑❡❝❦✱ ❋ré❞ér✐❝ ❘✐♠❡t✱ ❆❣♥❡s ❇♦✉❝❤❡③✱ ❛♥❞ ❆❧❛✐♥ ❋r❛♥❝✳ ♣❤②❧♦s✐❣✲✼✻✷

♥❛❧✿ ❛♥ r ♣❛❝❦❛❣❡ t♦ ♠❡❛s✉r❡✱ t❡st✱ ❛♥❞ ❡①♣❧♦r❡ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ s✐❣♥❛❧✳✼✻✸

❊❝♦❧♦❣② ❛♥❞ ❡✈♦❧✉t✐♦♥✱ ✻✭✾✮✿✷✼✼✹✕✷✼✽✵✱ ✷✵✶✻✳✼✻✹

❬✷✸❪ ❖♠r② ❑♦r❡♥✱ ❆②♠é ❙♣♦r✱ ❏❡♥♥② ❋❡❧✐♥✱ ❋r✐❞❛ ❋å❦✱ ❏❡ss❡ ❙t♦♠❜❛✉❣❤✱✼✻✺

❱❛❧❡♥t✐♥❛ ❚r❡♠❛r♦❧✐✱ ❈❛r❧ ❏♦❤❛♥ ❇❡❤r❡✱ ❘♦❜ ❑♥✐❣❤t✱ ❇❥ör♥ ❋❛❣❡r❜❡r❣✱✼✻✻

❘✉t❤ ❊ ▲❡②✱ ❡t ❛❧✳ ❍✉♠❛♥ ♦r❛❧✱ ❣✉t✱ ❛♥❞ ♣❧❛q✉❡ ♠✐❝r♦❜✐♦t❛ ✐♥ ♣❛t✐❡♥ts✼✻✼

✇✐t❤ ❛t❤❡r♦s❝❧❡r♦s✐s✳ Pr♦❝❡❡❞✐♥❣s ♦❢ t❤❡ ◆❛t✐♦♥❛❧ ❆❝❛❞❡♠② ♦❢ ❙❝✐❡♥❝❡s✱✼✻✽

✶✵✽✭❙✉♣♣❧❡♠❡♥t ✶✮✿✹✺✾✷✕✹✺✾✽✱ ✷✵✶✶✳✼✻✾

❬✷✹❪ ❑✐♠✲❆♥❤ ▲❡ ❈❛♦✱ ▼❛r②✲❊❧❧❡♥ ❈♦st❡❧❧♦✱ ❱❛♥❡ss❛ ❆♥♥❡ ▲❛❦✐s✱ ❋r❛♥❝♦✐s ❇❛r✲✼✼✵

t♦❧♦✱ ❳✐♥✲❨✐ ❈❤✉❛✱ ❘❡♠✐ ❇r❛③❡✐❧❧❡s✱ ❛♥❞ P❛s❝❛❧❡ ❘♦♥❞❡❛✉✳ ♠✐①♠❝✿ ❛ ♠✉❧✲✼✼✶

t✐✈❛r✐❛t❡ st❛t✐st✐❝❛❧ ❢r❛♠❡✇♦r❦ t♦ ❣❛✐♥ ✐♥s✐❣❤t ✐♥t♦ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s✳✼✼✷

❜✐♦❘①✐✈✱ ♣❛❣❡ ✵✹✹✷✵✻✱ ✷✵✶✻✳✼✼✸

❬✷✺❪ ❏ør❣❡♥ ❏ ▲❡✐s♥❡r✱ ❇✐r❣✐t ●r♦t❤ ▲❛✉rs❡♥✱ ❍❡r✈é Pré✈♦st✱ ❉❥❛♠❡❧ ❉r✐❞❡r✱ ❛♥❞✼✼✹

P❛✇ ❉❛❧❣❛❛r❞✳ ❈❛r♥♦❜❛❝t❡r✐✉♠✿ ♣♦s✐t✐✈❡ ❛♥❞ ♥❡❣❛t✐✈❡ ❡✛❡❝ts ✐♥ t❤❡ ❡♥✈✐✲✼✼✺

r♦♥♠❡♥t ❛♥❞ ✐♥ ❢♦♦❞s✳ ❋❊▼❙ ♠✐❝r♦❜✐♦❧♦❣② r❡✈✐❡✇s✱ ✸✶✭✺✮✿✺✾✷✕✻✶✸✱ ✷✵✵✼✳✼✼✻

✷✺

Page 28: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❬✷✻❪ ● ▲✐✱ ❲ ❍✉❛♥❣✱ ❉◆ ▲❡r♥❡r✱ ❛♥❞ ❳ ❩❤❛♥❣✳ ❊♥r✐❝❤♠❡♥t ♦❢ ❞❡❣r❛❞✐♥❣ ♠✐✲✼✼✼

❝r♦❜❡s ❛♥❞ ❜✐♦r❡♠❡❞✐❛t✐♦♥ ♦❢ ♣❡tr♦❝❤❡♠✐❝❛❧ ❝♦♥t❛♠✐♥❛♥ts ✐♥ ♣♦❧❧✉t❡❞ s♦✐❧✳✼✼✽

❲❛t❡r ❘❡s❡❛r❝❤✱ ✸✹✭✶✺✮✿✸✽✹✺✕✸✽✺✸✱ ✷✵✵✵✳✼✼✾

❬✷✼❪ ❈❛t❤❡r✐♥❡ ▲♦③✉♣♦♥❡ ❛♥❞ ❘♦❜ ❑♥✐❣❤t✳ ❯♥✐❢r❛❝✿ ❛ ♥❡✇ ♣❤②❧♦❣❡♥❡t✐❝ ♠❡t❤♦❞✼✽✵

❢♦r ❝♦♠♣❛r✐♥❣ ♠✐❝r♦❜✐❛❧ ❝♦♠♠✉♥✐t✐❡s✳ ❆♣♣❧✐❡❞ ❛♥❞ ❡♥✈✐r♦♥♠❡♥t❛❧ ♠✐❝r♦❜✐✲✼✽✶

♦❧♦❣②✱ ✼✶✭✶✷✮✿✽✷✷✽✕✽✷✸✺✱ ✷✵✵✺✳✼✽✷

❬✷✽❪ ❏♦s❡♣ ❆ ▼❛rtí♥✲❋❡r♥á♥❞❡③✱ ❈❛r❧❡s ❇❛r❝❡❧ó✲❱✐❞❛❧✱ ❛♥❞ ❱❡r❛ P❛✇❧♦✇s❦②✲✼✽✸

●❧❛❤♥✳ ❉❡❛❧✐♥❣ ✇✐t❤ ③❡r♦s ❛♥❞ ♠✐ss✐♥❣ ✈❛❧✉❡s ✐♥ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ s❡ts✼✽✹

✉s✐♥❣ ♥♦♥♣❛r❛♠❡tr✐❝ ✐♠♣✉t❛t✐♦♥✳ ▼❛t❤❡♠❛t✐❝❛❧ ●❡♦❧♦❣②✱ ✸✺✭✸✮✿✷✺✸✕✷✼✽✱✼✽✺

✷✵✵✸✳✼✽✻

❬✷✾❪ ❊♠✐❧✐❛ P ▼❛rt✐♥s ❛♥❞ ❚❤♦♠❛s ❋ ❍❛♥s❡♥✳ P❤②❧♦❣❡♥✐❡s ❛♥❞ t❤❡ ❝♦♠♣❛r❛t✐✈❡✼✽✼

♠❡t❤♦❞✿ ❛ ❣❡♥❡r❛❧ ❛♣♣r♦❛❝❤ t♦ ✐♥❝♦r♣♦r❛t✐♥❣ ♣❤②❧♦❣❡♥❡t✐❝ ✐♥❢♦r♠❛t✐♦♥ ✐♥t♦✼✽✽

t❤❡ ❛♥❛❧②s✐s ♦❢ ✐♥t❡rs♣❡❝✐✜❝ ❞❛t❛✳ ❆♠❡r✐❝❛♥ ◆❛t✉r❛❧✐st✱ ♣❛❣❡s ✻✹✻✕✻✻✼✱ ✶✾✾✼✳✼✽✾

❬✸✵❪ ❏❡♥♥✐❢❡r ❇❍ ▼❛rt✐♥②✱ ❙t✉❛rt ❊ ❏♦♥❡s✱ ❏❛② ❚ ▲❡♥♥♦♥✱ ❛♥❞ ❆❞❛♠ ❈ ▼❛r✲✼✾✵

t✐♥②✳ ▼✐❝r♦❜✐♦♠❡s ✐♥ ❧✐❣❤t ♦❢ tr❛✐ts✿ ❆ ♣❤②❧♦❣❡♥❡t✐❝ ♣❡rs♣❡❝t✐✈❡✳ ❙❝✐❡♥❝❡✱✼✾✶

✸✺✵✭✻✷✻✶✮✿❛❛❝✾✸✷✸✱ ✷✵✶✺✳✼✾✷

❬✸✶❪ ❏✐❡ ◆✐♥❣ ❛♥❞ ❘♦❜❡rt ● ❇❡✐❦♦✳ P❤②❧♦❣❡♥❡t✐❝ ❛♣♣r♦❛❝❤❡s t♦ ♠✐❝r♦❜✐❛❧ ❝♦♠✲✼✾✸

♠✉♥✐t② ❝❧❛ss✐✜❝❛t✐♦♥✳ ▼✐❝r♦❜✐♦♠❡✱ ✸✭✶✮✿✶✱ ✷✵✶✺✳✼✾✹

❬✸✷❪ ❱❡r❛ P❛✇❧♦✇s❦②✲●❧❛❤♥ ❛♥❞ ❆♥t♦♥❡❧❧❛ ❇✉❝❝✐❛♥t✐✳ ❈♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛ ❛♥❛❧✲✼✾✺

②s✐s✿ ❚❤❡♦r② ❛♥❞ ❛♣♣❧✐❝❛t✐♦♥s✳ ❏♦❤♥ ❲✐❧❡② ✫ ❙♦♥s✱ ✷✵✶✶✳✼✾✻

❬✸✸❪ ▼♦r❣❛♥ ◆ Pr✐❝❡✱ P❛r❛♠✈✐r ❙ ❉❡❤❛❧✱ ❛♥❞ ❆❞❛♠ P ❆r❦✐♥✳ ❋❛sttr❡❡ ✷✕✼✾✼

❛♣♣r♦①✐♠❛t❡❧② ♠❛①✐♠✉♠✲❧✐❦❡❧✐❤♦♦❞ tr❡❡s ❢♦r ❧❛r❣❡ ❛❧✐❣♥♠❡♥ts✳ P❧♦❙ ♦♥❡✱✼✾✽

✺✭✸✮✿❡✾✹✾✵✱ ✷✵✶✵✳✼✾✾

❬✸✹❪ ❊❧♠❛r Pr✉❡ss❡✱ ❏ör❣ P❡♣❧✐❡s✱ ❛♥❞ ❋r❛♥❦ ❖❧✐✈❡r ●❧ö❝❦♥❡r✳ ❙✐♥❛✿ ❛❝❝✉✲✽✵✵

r❛t❡ ❤✐❣❤✲t❤r♦✉❣❤♣✉t ♠✉❧t✐♣❧❡ s❡q✉❡♥❝❡ ❛❧✐❣♥♠❡♥t ♦❢ r✐❜♦s♦♠❛❧ r♥❛ ❣❡♥❡s✳✽✵✶

❇✐♦✐♥❢♦r♠❛t✐❝s✱ ✷✽✭✶✹✮✿✶✽✷✸✕✶✽✷✾✱ ✷✵✶✷✳✽✵✷

❬✸✺❪ ❊❧✐③❛❜❡t❤ P✉r❞♦♠✳ ❆♥❛❧②s✐s ♦❢ ❛ ❞❛t❛ ♠❛tr✐① ❛♥❞ ❛ ❣r❛♣❤✿ ▼❡t❛❣❡♥♦♠✐❝✽✵✸

❞❛t❛ ❛♥❞ t❤❡ ♣❤②❧♦❣❡♥❡t✐❝ tr❡❡✳ ❚❤❡ ❆♥♥❛❧s ♦❢ ❆♣♣❧✐❡❞ ❙t❛t✐st✐❝s✱ ♣❛❣❡s✽✵✹

✷✸✷✻✕✷✸✺✽✱ ✷✵✶✶✳✽✵✺

❬✸✻❪ ❑❡❧❧② ❙ ❘❛♠✐r❡③✱ ❏♦♥❛t❤❛♥ ❲ ▲❡✛✱ ❆❧❜❡rt ❇❛r❜❡rá♥✱ ❙❝♦tt ❚❤♦♠❛s ❇❛t❡s✱✽✵✻

❏❛s♦♥ ❇❡t❧❡②✱ ❚❤♦♠❛s ❲ ❈r♦✇t❤❡r✱ ❊✉❣❡♥❡ ❋ ❑❡❧❧②✱ ❊♠✐❧② ❊ ❖❧❞✜❡❧❞✱✽✵✼

❊ ❆s❤❧❡② ❙❤❛✇✱ ❈❤r✐st♦♣❤❡r ❙t❡❡♥❜♦❝❦✱ ❡t ❛❧✳ ❇✐♦❣❡♦❣r❛♣❤✐❝ ♣❛tt❡r♥s ✐♥✽✵✽

✷✻

Page 29: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❋✐❣✉r❡ ✶✿ ❙❤♦rt❝♦♠✐♥❣s ♦❢ ❘♦♦t❡❞ ■▲❘ ✭❛✮ ❚❤❡ ✐s♦♠❡tr✐❝ ❧♦❣✲r❛t✐♦ tr❛♥s❢♦r♠ ❝♦rr❡s♣♦♥❞✐♥❣ t♦

❛ ♣❤②❧♦❣❡♥② r♦♦t❡❞ ❛t t❤❡ ❝♦♠♠♦♥ ❛♥❝❡st♦r ✐s ✐♥❛❝❝✉r❛t❡ ❢♦r ❣❡♦♠❡tr✐❝ ❝❤❛♥❣❡s ✇✐t❤✐♥ ❝❧❛❞❡s✳ ❍❡r❡✱

❛❜s♦❧✉t❡ ❛❜✉♥❞❛♥❝❡s ♦❢ ✺✵ t❛①❛ ✐♥ ✸✵ s❛♠♣❧❡s ♣❡r s✐t❡ ✇❡r❡ s✐♠✉❧❛t❡❞ ❛❝r♦ss t✇♦ s✐t❡s✳ ❆♥ ❛✛❡❝t❡❞

❝❧❛❞❡✱ B✱ ✐s ✉♣✲r❡♣r❡s❡♥t❡❞ ✐♥ t❤❡ s❡❝♦♥❞ s✐t❡✳ ❘❡❣r❡ss✐♦♥ ♦♥ t❤❡ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡s✱ x∗i ✱ ❛❣❛✐♥st

t❤❡ s❛♠♣❧❡ s✐t❡ ✐♥❞✐❝❛t❡❞ t❤❛t t❤❡ ♣❛rt✐t✐♦♥ s❡♣❛r❛t✐♥❣ ❝❧❛❞❡ A,B✱ r❡❢❡rr❡❞ t♦ ❛s x∗{A,B}✱ ❤❛❞ t❤❡

❤✐❣❤❡st t❡st✲st❛t✐st✐❝✱ ❜✉t t❤❡ r♦♦t❡❞ ■▲❘ ♣r❡❞✐❝ts ❢♦❧❞✲❝❤❛♥❣❡s ✐♥ B r❡❧❛t✐✈❡ t♦ A✱ ♥♦t ❢♦❧❞ ❝❤❛♥❣❡s

✐♥ B r❡❧❛t✐✈❡ t♦ t❤❡ r❡st ♦❢ t❤❡ t❛①❛✳ ✭❜✮ ❈♦♥s❡q✉❡♥t❧②✱ ✇❤❡♥ ♦♥❡ ❝❧❛❞❡ ✐♥❝r❡❛s❡ ✐♥ ❛❜✉♥❞❛♥❝❡

✇❤✐❧❡ t❤❡ r❡st r❡♠❛✐♥ ✉♥❛✛❡❝t❡❞✱ ♣❛rt✐t✐♦♥s ❜❡t✇❡❡♥ t❤❡ ❛✛❡❝t❡❞ ❝❧❛❞❡ ❛♥❞ t❤❡ r♦♦t ✇✐❧❧ ❛❧s♦ ❤❛✈❡ ❛

s✐❣♥❛❧ ❧❡❛❞✐♥❣ t♦ ❛ ❝♦rr❡❧❛t✐♦♥ ✐♥ t❤❡ ❝♦♦r❞✐♥❛t❡s ❛❧♦♥❣ t❤❡ ♣❛t❤ ❢r♦♠ B t♦ t❤❡ r♦♦t✳ ❚❤❡ ❝♦rr❡❧❛t✐♦♥

♣❧♦tt❡❞ ❤❡r❡ ✐s t❤❡ ❛❜s♦❧✉t❡ ✈❛❧✉❡ ♦❢ t❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t✱ ❛♥❞ t❤❡ ❜❛s❡❧✐♥❡ ❝♦rr❡❧❛t✐♦♥ ✇❛s

❡st✐♠❛t❡❞ ❛s t❤❡ ❛✈❡r❛❣❡ ❛❜s♦❧✉t❡ ✈❛❧✉❡ ♦❢ t❤❡ ❝♦rr❡❧❛t✐♦♥ ❝♦❡✣❝✐❡♥t ❜❡t✇❡❡♥ ■▲❘ ❝♦♦r❞✐♥❛t❡s ♥♦t

❛❧♦♥❣ t❤❡ r♦♦t✲♣❛t❤ ♦❢ t❤❡ ❛✛❡❝t❡❞ ❝❧❛❞❡✳

❜❡❧♦✇✲❣r♦✉♥❞ ❞✐✈❡rs✐t② ✐♥ ♥❡✇ ②♦r❦ ❝✐t②✬s ❝❡♥tr❛❧ ♣❛r❦ ❛r❡ s✐♠✐❧❛r t♦ t❤♦s❡✽✵✾

♦❜s❡r✈❡❞ ❣❧♦❜❛❧❧②✳ ■♥ Pr♦❝✳ ❘✳ ❙♦❝✳ ❇✱ ✈♦❧✉♠❡ ✷✽✶✱ ♣❛❣❡ ✷✵✶✹✶✾✽✽✳ ❚❤❡✽✶✵

❘♦②❛❧ ❙♦❝✐❡t②✱ ✷✵✶✹✳✽✶✶

❬✸✼❪ ▲✐❛♠ ❏ ❘❡✈❡❧❧✳ ♣❤②t♦♦❧s✿ ❛♥ r ♣❛❝❦❛❣❡ ❢♦r ♣❤②❧♦❣❡♥❡t✐❝ ❝♦♠♣❛r❛t✐✈❡ ❜✐♦❧♦❣②✽✶✷

✭❛♥❞ ♦t❤❡r t❤✐♥❣s✮✳ ▼❡t❤♦❞s ✐♥ ❊❝♦❧♦❣② ❛♥❞ ❊✈♦❧✉t✐♦♥✱ ✸✭✷✮✿✷✶✼✕✷✷✸✱ ✷✵✶✷✳✽✶✸

❬✸✽❪ ▼❛r❦ ❉ ❘♦❜✐♥s♦♥✱ ❉❛✈✐s ❏ ▼❝❈❛rt❤②✱ ❛♥❞ ●♦r❞♦♥ ❑ ❙♠②t❤✳ ❡❞❣❡r✿ ❛✽✶✹

❜✐♦❝♦♥❞✉❝t♦r ♣❛❝❦❛❣❡ ❢♦r ❞✐✛❡r❡♥t✐❛❧ ❡①♣r❡ss✐♦♥ ❛♥❛❧②s✐s ♦❢ ❞✐❣✐t❛❧ ❣❡♥❡ ❡①✲✽✶✺

♣r❡ss✐♦♥ ❞❛t❛✳ ❇✐♦✐♥❢♦r♠❛t✐❝s✱ ✷✻✭✶✮✿✶✸✾✕✶✹✵✱ ✷✵✶✵✳✽✶✻

❬✸✾❪ ▼✐❦❤❛✐❧ ❚✐❦❤♦♥♦✈✱ ❘♦❜❡rt ❲ ▲❡❛❝❤✱ ❛♥❞ ◆❡❞ ❙ ❲✐♥❣r❡❡♥✳ ■♥t❡r♣r❡t✐♥❣ ✶✻s✽✶✼

♠❡t❛❣❡♥♦♠✐❝ ❞❛t❛ ✇✐t❤♦✉t ❝❧✉st❡r✐♥❣ t♦ ❛❝❤✐❡✈❡ s✉❜✲♦t✉ r❡s♦❧✉t✐♦♥✳ ❚❤❡✽✶✽

■❙▼❊ ❥♦✉r♥❛❧✱ ✾✭✶✮✿✻✽✕✽✵✱ ✷✵✶✺✳✽✶✾

❬✹✵❪ ▼❛r❝❡❧ ●❆ ❱❛♥ ❉❡r ❍❡✐❥❞❡♥✱ ❘✐❝❤❛r❞ ❉ ❇❛r❞❣❡tt✱ ❛♥❞ ◆✐❝♦ ▼✽✷✵

❱❛♥ ❙tr❛❛❧❡♥✳ ❚❤❡ ✉♥s❡❡♥ ♠❛❥♦r✐t②✿ s♦✐❧ ♠✐❝r♦❜❡s ❛s ❞r✐✈❡rs ♦❢ ♣❧❛♥t ❞✐✈❡r✲✽✷✶

s✐t② ❛♥❞ ♣r♦❞✉❝t✐✈✐t② ✐♥ t❡rr❡str✐❛❧ ❡❝♦s②st❡♠s✳ ❊❝♦❧♦❣② ❧❡tt❡rs✱ ✶✶✭✸✮✿✷✾✻✕✽✷✷

✸✶✵✱ ✷✵✵✽✳✽✷✸

❋✐❣✉r❡s✽✷✹

✽✷✺

✷✼

Page 30: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❋✐❣✉r❡ ✷✿ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✿ ✭❛✮ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❝❤❛♥❣❡s ✈❛r✐❛❜❧❡s ❢r♦♠ t✐♣s ♦❢ t❤❡ ♣❤②✲

❧♦❣❡♥② ✭❖❚❯s ✉s❡❞ ✐♥ ❛♥❛❧②s✐s ♦❢ ♠✐❝r♦❜✐♦♠❡ ❞❛t❛s❡ts✮ t♦ ❡❞❣❡s ♦❢ t❤❡ ♣❤②❧♦❣❡♥② ✇✐t❤ t❤❡ ❧❛r❣❡st

♣r❡❞✐❝t❛❜❧❡ ❞✐✛❡r❡♥❝❡s ❜❡t✇❡❡♥ t❛①❛ ♦♥ ❡❛❝❤ s✐❞❡ ♦❢ t❤❡ ❡❞❣❡✳ ❚♦ ✐❧❧✉str❛t❡ t❤✐s ♠❡t❤♦❞✱ ✇❡ ❝♦♥s✐❞❡r

t❤❡ tr❡❛t♠❡♥t ♦❢ ❛ ❜❛❝t❡r✐❛❧ ❝♦♠♠✉♥✐t② ✇✐t❤ ❛♥ ♦①❛③♦❧✐❞✐♥♦♥❡✳ ❖①❛③♦❧✐❞✐♥♦♥❡s t❛r❣❡t ❣r❛♠✲♣♦s✐t✐✈❡

❜❛❝t❡r✐❛ ❛♥❞ ✇✐❧❧ ❧✐❦❡❧② ❧❡❛❞ t♦ ❛ ❞❡❝r❡❛s❡ ✐♥ t❤❡ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ♦❢ ❣r❛♠✲♣♦s✐t✐✈❡ ❜❛❝t❡r✐❛

✭❛♥t✐❜✐♦t✐❝ s✉s❝❡♣t✐❜❧❡ ❝❧❛❞❡✱ ❆✱ ❤❛✈✐♥❣ t❤❡ ❛♥t✐❜✐♦t✐❝ t❛r❣❡t✮✳ ❆♠♦♥❣ t❤❡ ❛♥t✐❜✐♦t✐❝ s✉s❝❡♣t✐❜❧❡

❜❛❝t❡r✐❛✱ ♣❤②❧♦❢❛❝t♦r ❝❛♥ ✐❞❡♥t✐❢② ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡s t❤❛t ❛r❡ r❡s✐st❛♥t r❡❧❛t✐✈❡ t♦ ♦t❤❡r ❛♥t✐❜✐♦t✐❝✲

s✉s❝❡♣t✐❜❧❡ ❜❛❝t❡r✐❛ ❞✉❡ t♦ ❛ ✈❡rt✐❝❛❧❧②✲tr❛♥s♠✐tt❡❞ tr❛✐t ✭❇✮ s✉❝❤ ❛s t❤❡ ❧♦ss ♦❢ t❤❡ ❛♥t✐❜✐♦t✐❝ t❛r❣❡t

♦r ❡♥③②♠❡s t❤❛t ❜r❡❛❦ ❞♦✇♥ t❤❡ ❛♥t✐❜✐♦t✐❝✳ ✭❜✮ ❚❤❡ t✇♦ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ♣r♦❞✉❝❡ t❤r❡❡ ♠❡❛♥✲

✐♥❣❢✉❧ ❜✐♥s ♦❢ t❛①❛ ✲ t❤♦s❡ s✉s❝❡♣t✐❜❧❡ t♦ ❛♥t✐❜✐♦t✐❝s ✭❆✮✱ t❤♦s❡ ✇✐t❤✐♥ t❤❡ s✉s❝❡♣t✐❜❧❡ ❝❧❛❞❡ t❤❛t ❛r❡

r❡s✐st❛♥t t♦ ❛♥t✐❜✐♦t✐❝s ✭❆✰❇✮✱ ❛♥❞ ❛ ♣♦t❡♥t✐❛❧❧② ♣❛r❛♣❤②❧❡t✐❝ r❡♠❛✐♥❞❡r✳ ✭❝✮ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥

✐s ❛ ❣r❡❡❞② ❛❧❣♦r✐t❤♠ t♦ ❡①tr❛❝t t❤❡ ❡❞❣❡s ✇❤✐❝❤ ❝❛♣t✉r❡ t❤❡ ♠♦st ♣r❡❞✐❝t❛❜❧❡ ❞✐✛❡r❡♥❝❡s ✐♥ t❤❡

r❡s♣♦♥s❡ ♦❢ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ❛♠♦♥❣ t❛①❛ ♦♥ t❤❡ t✇♦ s✐❞❡s ♦❢ ❡❛❝❤ ❡❞❣❡✳ ✭❝✱ t♦♣ r♦✇✮ ❋♦r t❤❡

✜rst ✐t❡r❛t✐♦♥✱ ❛❧❧ ❡❞❣❡s ❛r❡ ❝♦♥s✐❞❡r❡❞ ✲ ❛♥ ■▲❘ ❝♦♦r❞✐♥❛t❡ ✐s ❝r❡❛t❡❞ ❢♦r ❡❛❝❤ ❡❞❣❡ ✉s✐♥❣ ❡q✉❛t✐♦♥

✭✶✮ ❛♥❞ t❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡ ✐s r❡❣r❡ss❡❞ ❛❣❛✐♥st t❤❡ ✐♥❞❡♣❡♥❞❡♥t ✈❛r✐❛❜❧❡✳ ❚❤❡ ❡❞❣❡ ✇❤✐❝❤ ♠❛①✲

✐♠✐③❡s t❤❡ ♦❜❥❡❝t✐✈❡ ❢✉♥❝t✐♦♥ ✐s ❝❤♦s❡♥✳ ❉❡♣✐❝t❡❞ ❛❜♦✈❡✱ t❤❡ ✜rst ❢❛❝t♦r ❝♦rr❡s♣♦♥❞s t♦ t❤❡ ❡❞❣❡

s❡♣❛r❛t✐♥❣ ❛♥t✐❜✐♦t✐❝ s✉s❝❡♣t✐❜❧❡ ❜❛❝t❡r✐❛ ❢r♦♠ t❤❡ r❡st✳ ❚❤❡♥✱ t❤❡ tr❡❡ ✐s s♣❧✐t ✲ ❛❧❧ s✉❜s❡q✉❡♥t

❝♦♠♣❛r✐s♦♥s ❛❧♦♥❣ ❡❞❣❡s ✇✐❧❧ ❜❡ ❝♦♥t❛✐♥❡❞ ✇✐t❤✐♥ t❤❡ s✉❜✲tr❡❡s✳ ❚❤❡ ❝♦♥❝❡♣t✉❛❧ ❥✉st✐✜❝❛t✐♦♥ ❢♦r

❧✐♠✐t✐♥❣ ❝♦♠♣❛r✐s♦♥s ✇✐t❤✐♥ s✉❜✲tr❡❡s ✐s t♦ ♣r❡✈❡♥t ♦✈❡r✲❧❛♣♣✐♥❣ ❝♦♠♣❛r✐s♦♥s✿ ♦♥❝❡ ✇❡ ✐❞❡♥t✐❢② t❤❡

❛♥t✐❜✐♦t✐❝ s✉s❝❡♣t✐❜❧❡ ❝❧❛❞❡✱ ✇❡ ✇❛♥t t♦ ❧♦♦❦ ❛t ✇❤✐❝❤ t❛①❛ ✇✐t❤✐♥ t❤❛t ❝❧❛❞❡ ❜❡❤❛✈❡ ❞✐✛❡r❡♥t❧② ❢r♦♠

♦t❤❡r t❛①❛ ✇✐t❤✐♥ t❤❛t ❝❧❛❞❡✳ ✭❝✱ ❜♦tt♦♠ r♦✇✮ ❋♦r t❤❡ s❡❝♦♥❞ ✐t❡r❛t✐♦♥✱ t❤❡ r❡♠❛✐♥✐♥❣ ❡❞❣❡s ❛r❡

❝♦♥s✐❞❡r❡❞✱ ■▲❘ ❝♦♦r❞✐♥❛t❡s ✇✐t❤✐♥ s✉❜✲tr❡❡s ❛r❡ ❝♦♥str✉❝t❡❞✳ ❚❤❡ ❡❞❣❡ ♠❛①✐♠✐③✐♥❣ t❤❡ ♦❜❥❡❝t✐✈❡

❢✉♥❝t✐♦♥ ✐s s❡❧❡❝t❡❞ ❛♥❞ t❤❡ tr❡❡ ✐s s♣❧✐t ❛t t❤❛t ❡❞❣❡✳ ❋♦r ♠♦r❡ ❞❡t❛✐❧s✱ s❡❡ t❤❡ s❡❝t✐♦♥ ✏P❤②❧♦❋❛❝t♦r✑

✐♥ t❤❡ s✉♣♣❧❡♠❡♥t❛❧ ✐♥❢♦✳

❋✐❣✉r❡ ✸✿ ✭❛✮ P♦✇❡r ❆♥❛❧②s✐s ✲ ✶ ❈❧❛❞❡✳ ❚❤❡ r♦♦t❡❞ ■▲❘ tr❛♥s❢♦r♠ t❤❛t ♠✐♥✐♠✐③❡s r❡s✐❞✉❛❧

✈❛r✐❛♥❝❡ ✇❤❡♥ r❡❣r❡ss❡❞ ❛❣❛✐♥st s❛♠♣❧❡ s✐t❡ ✐s ❧❡ss ❛❜❧❡ t♦ ✐❞❡♥t✐❢② t❤❡ ❝♦rr❡❝t ❝❧❛❞❡ ❝♦♠♣❛r❡❞

t♦ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❢♦r ❛ ✈❛r✐❡t② ♦❢ ❡✛❡❝t s✐③❡s✱ a✱ ❛♥❞ s❛♠♣❧❡ s✐③❡s✳ ✭❜✮ ❚❤r❡❡ ❙✐❣♥✐✜❝❛♥t

❈❧❛❞❡s✿ ❲❤❡♥ t❤r❡❡ s✐❣♥✐✜❝❛♥t ❝❧❛❞❡s ❛r❡ ❝❤♦s❡♥ ❛♥❞ ❣✐✈❡♥ ❛ s❡t ♦❢ ❡✛❡❝ts ✐♥❝r❡❛s✐♥❣ ✐♥ ✐♥t❡♥s✐t②

✇✐t❤ t❤❡ ♣❛r❛♠❡t❡r b✱ ❝❤♦♦s✐♥❣ t❤❡ t♦♣ r♦♦t❡❞ ■▲❘ ❝♦♦r❞✐♥❛t❡s ✉♥❞❡r ♣❡r❢♦r♠s ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥

✐♥ ❝♦rr❡❝t❧② ✐❞❡♥t✐❢②✐♥❣ t❤❡ ❛✛❡❝t❡❞ ❝❧❛❞❡s✳ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❛❧s♦ ❡①♣❧❛✐♥s ♠♦r❡ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡

❞❛t❛✿ ❛❝r♦ss ❡✛❡❝t s✐③❡s✱ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ❡①♣❧❛✐♥s ✷ ♦r❞❡rs ♦❢ ♠❛❣♥✐t✉❞❡ ♠♦r❡ ♦❢ t❤❡ ✈❛r✐❛♥❝❡

✐♥ t❤❡ ❞❛t❛s❡t t❤❛♥ t❤❡ s❡q✉❡♥t✐❛❧ r♦♦t❡❞ ■▲❘✳ ✭❝✮ ❙t♦♣♣✐♥❣ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥✿ P❧♦ts ♦❢ t❤❡

tr✉❡ ♥✉♠❜❡r ♦❢ ❛✛❡❝t❡❞ ❝❧❛❞❡s ✐♥ s✐♠✉❧❛t❡❞ ❞❛t❛s❡ts ❛❣❛✐♥st t❤❡ ♥✉♠❜❡r ♦❢ ❝❧❛❞❡s ✐❞❡♥t✐✜❡❞ ❜② t❤❡

❘ ♣❛❝❦❛❣❡ ✬♣❤②❧♦❢❛❝t♦r✬✳ ❖♥❡ ❝❛♥ t❡r♠✐♥❛t❡ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✇❤❡♥ t❤❡ tr✉❡ ♥✉♠❜❡r ♦❢ ❛✛❡❝t❡❞

❝❧❛❞❡ ✐s ✉♥❦♥♦✇♥ ❜② ❝❤♦♦s✐♥❣ ❛ st♦♣♣✐♥❣ ❢✉♥❝t✐♦♥ ❛✐♠❡❞ ❛t st♦♣♣✐♥❣ ✇❤❡♥ t❤❡r❡ ✐s ♥♦ ❡✈✐❞❡♥❝❡ ♦❢

❛ r❡♠❛✐♥✐♥❣ s✐❣♥❛❧✳ ❇② st♦♣♣✐♥❣ t❤❡ ✐t❡r❛t✐♦♥ ✇❤❡♥ t❤❡ ❞✐str✐❜✉t✐♦♥ ♦❢ P✲✈❛❧✉❡s ❢r♦♠ ❛♥❛❧②s❡s ♦❢

✈❛r✐❛♥❝❡ ♦❢ r❡❣r❡ss✐♦♥ ♦♥ ❝❛♥❞✐❞❛t❡ ■▲❘ ❜❛s✐s ❡❧❡♠❡♥ts ✐s ✉♥✐❢♦r♠ ✭s♣❡❝✐✜❝❛❧❧②✱ st♦♣♣✐♥❣ ✇❤❡♥ ❛

❑❙ t❡st ❛❣❛✐♥st ❛ ✉♥✐❢♦r♠ ❞✐str✐❜✉t✐♦♥ ②✐❡❧❞s P > 0.05✮✱ ✇❡ ♦❜t❛✐♥ ❛ ❝♦♥s❡r✈❛t✐✈❡ ❡st✐♠❛t❡ ♦❢ t❤❡

♥✉♠❜❡r ♦❢ ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ✐♥ t❤❡ ❞❛t❛✳

✷✽

Page 31: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

❋✐❣✉r❡ ✹✿ P❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ♦❢ ❤✉♠❛♥ ❢❡❝❡s✴t♦♥❣✉❡ ❞❛t❛s❡t ✐❞❡♥t✐✜❡s ❝❧❛❞❡s ✇✐t❤ ❞✐❢✲

❢❡r❡♥t ❤❛❜✐t❛t ♣r❡❢❡r❡♥❝❡s✳ ✭❛✮ P❤②❧♦❣❡♥❡t✐❝ str✉❝t✉r❡ ✐s ✈✐s✐❜❧❡ ❛s ❜❧♦❝❦s ✉s✐♥❣ ❛ ♣❤②❧♦❣❡♥❡t✐❝

❤❡❛t♠❛♣ ❢r♦♠ t❤❡ ❘ ♣❛❝❦❛❣❡ ✬♣❤②t♦♦❧s✬ ❬✸✼❪✳ ❚❤❡ ✜rst ❢❛❝t♦r s❡♣❛r❛t❡s ❆❝t✐♥♦❜❛❝t❡r✐❛ ❛♥❞ s♦♠❡

Pr♦t❡♦❜❛❝t❡r✐❛ ❢r♦♠ t❤❡ r❡st✱ t❤❡ s❡❝♦♥❞ ❢❛❝t♦r s❡♣❛r❛t❡s t❤❡ ❝❧❛ss ❇❛❝✐❧❧✐ ❢r♦♠ t❤❡ r❡♠❛✐♥✐♥❣ ♥♦♥✲

Pr♦t❡♦❜❛❝t❡r✐❛ ❛♥❞ ♥♦♥✲❆❝t✐♥♦❜❛❝t❡r✐❛✱ t❤❡ t❤✐r❞ ❢❛❝t♦r ♣✉❧❧s ♦✉t t❤❡ ❣❡♥✉s Pr❡✈♦t❡❧❧❛ ❢r♦♠ ❇❛❝✲

t❡r♦✐❞❡t❡s ❛♥❞ ✐♥❞✐❝❛t❡s t❤❛t ✐t✱ ✉♥❧✐❦❡ ♠❛♥② ♦t❤❡r t❛①❛ ✐♥ ❇❛❝t❡r♦✐❞❡t❡s✱ ✐s ✉♥r❡♣r❡s❡♥t❡❞ ✐♥ t❤❡

t♦♥❣✉❡✳ ❊❛❝❤ ❢❛❝t♦r ❝❛♣t✉r❡s ❛ ♠❛❥♦r ❜❧♦❝❦ ♦❢ ✈❛r✐❛t✐♦♥ ✐♥ t❤❡ ❞❛t❛✱ ❛♥❞ t❤❡ ♦rt❤♦❣♦♥❛❧✐t② ♦❢ t❤❡

■▲❘ ❝♦♦r❞✐♥❛t❡s ❢r♦♠ ❡❛❝❤ ❢❛❝t♦r ❛❧❧♦✇ ♠✉❧t✐♣❧❡ ❢❛❝t♦rs t♦ ❜❡ ❝♦♠❜✐♥❡❞ ❡❛s✐❧② ❢♦r ❡st✐♠❛t❡s ♦❢ ❝♦♠✲

♠✉♥✐t② ❝♦♠♣♦s✐t✐♦♥✳ ✭❜✮ ❚❤❡s❡ t❤r❡❡ ❢❛❝t♦rs s♣❧✐ts t❤❡ ♣❤②❧♦❣❡♥② ✐♥t♦ ❢♦✉r ❜✐♥s✳ ❚❤r❡❡ ♦❢ t❤♦s❡

❜✐♥s ❛r❡ ♠♦♥♦♣❤②❧❡t✐❝ ❛♥❞ t❤❡ ✜♥❛❧ ❜✐♥ ✐s ❛ ✏r❡♠❛✐♥❞❡r✑ ❜✐♥✱ ❝♦♥t❛✐♥✐♥❣ t❛①❛ s♣❧✐t ♦✛ ❜② t❤❡ ♣r❡✈✐✲

♦✉s ♠♦♥♦♣❤②❧❡t✐❝ ❜✐♥s✳ ❚❤❡ t❤r❡❡ ❢❛❝t♦rs ❛r❡ ✐❞❡♥t✐✜❛❜❧❡ ❡❞❣❡s ❜❡t✇❡❡♥ ♥♦❞❡s t❤❛t ❝❛♥ ❜❡ ♠❛♣♣❡❞

t♦ ❛♥ ♦♥❧✐♥❡ ❞❛t❛❜❛s❡ ❝♦♥t❛✐♥✐♥❣ t❤♦s❡ ♥♦❞❡s✳ ❚❤❡ t❛①♦♥♦♠✐❝ ❛ss✐❣♥♠❡♥t ✉s❡❞ ❤❡r❡ ✐s t❤❡ ❞❡❢❛✉❧t

❢♦r t❤❡ ❘ ♣❛❝❦❛❣❡ ✬♣❤②❧♦❢❛❝t♦r✬✱ ♥❛♠❡❧② t❤❡ s❡t ♦❢ ❛❧❧ s❤♦rt❡st✲✉♥✐q✉❡ ♣r❡✜①❡s t❤❛t s❡♣❛r❛t❡ ❡❛❝❤

♠♦♥♦♣❤②❧❡t✐❝ ❜✐♥ ❢r♦♠ ✐ts ❝♦♠♣❧❡♠❡♥t✳

❋✐❣✉r❡ ✺✿ ❉✐♠❡♥s✐♦♥❛❧✐t② ❘❡❞✉❝t✐♦♥ ❛♥❞ ❖r❞✐♥❛t✐♦♥✲❱✐s✉❛❧✐③❛t✐♦♥ ♦❢ s♦✐❧ ♠✐❝r♦❜✐♦♠❡

❞❛t❛s❡t✳ P❤②❧♦❢❛❝t♦r ♣r❡s❡♥ts t✇♦ ❝♦♠♣❧❡♠❡♥t❛r② ♠❡t❤♦❞s ❢♦r ♣r♦❥❡❝t✐♥❣ ❛♥❞ ✈✐s✉❛❧✐③✐♥❣ t❤❡ ❤✐❣❤✲

❞✐♠❡♥s✐♦♥❛❧ ♣❤②❧♦❣❡♥❡t✐❝❛❧❧②✲str✉❝t✉r❡❞ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛✳ ✭❛✮ ❚❤❡ ■▲❘ ❝♦♦r❞✐♥❛t❡s ❤❛✈❡ ❛s②♠♣✲

t♦t✐❝ ♥♦r♠❛❧✐t② ♣r♦♣❡rt✐❡s ❛♥❞ ♣r♦✈✐❞❡ ❜✐♦❧♦❣✐❝❛❧❧② ✐♥❢♦r♠❛t✐✈❡ ♦r❞✐♥❛t✐♦♥✲✈✐s✉❛❧✐③❛t✐♦♥ ♣❧♦ts✳ ❍❡r❡✱

✇❡ ✇❡ ✈✐s✉❛❧✐③❡ t❤❛t ♣❍ ✐s ❛ ♠✉❝❤ ❜❡tt❡r ♣r❡❞✐❝t♦r t❤❛♥ ◆ ♦❢ t❤❡ ♠❛❥♦r ♣❤②❧♦❣❡♥❡t✐❝ ❢❛❝t♦rs ✐♥ ❈❡♥✲

tr❛❧ P❛r❦ s♦✐❧s✳ ❉♦♠✐♥❛♥❝❡ ❛♥❛❧②s✐s ✐♥❞✐❝❛t❡❞ t❤❛t ♣❍ ❛❝❝♦✉♥ts ❢♦r ❛♣♣r♦①✐♠❛t❡❧② ✾✷✳✽✼✪✱ ✽✾✳✼✽✪✱

❛♥❞ ✾✷✳✾✹✪ ♦❢ t❤❡ ❡①♣❧❛✐♥❡❞ ✈❛r✐❛♥❝❡ ✐♥ t❤❡ ✜rst✱ s❡❝♦♥❞✱ ❛♥❞ t❤✐r❞ ❢❛❝t♦r✱ r❡s♣❡❝t✐✈❡❧②✱ ❝♦♥s✐st❡♥t

✇✐t❤ ♣r❡✈✐♦✉s r❡s✉❧ts ❜❛s❡❞ ♦♥ ❇r❛②✲❈✉rt✐s ❞✐st❛♥❝❡s ❛♥❞ ▼❛♥t❡❧ t❡sts s❤♦✇✐♥❣ t❤❡ ❞♦♠✐♥❛♥❝❡ ♦❢

♣❍ ✐♥ str✉❝t✉r✐♥❣ s♦✐❧ ♠✐❝r♦❜✐♦♠❡s ❬✸✻❪✳ ✭❜✮ ❊✈❡r② ❡❞❣❡ s❡♣❛r❛t❡s ♦♥❡ ❣r♦✉♣ ♦❢ t❛①❛ ✐♥t♦ t✇♦✱ ❛♥❞

t❤❡ ❞✐s❥♦✐♥t ❣r♦✉♣s ♦❢ t❛①❛ ❞❡✜♥❡❞ ❜② ❛ ❝♦♠♠♦♥ ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ✲ ✇❤❛t ✇❡ r❡❢❡r t♦ ❛s ❜✐♥s ✲ ❝❛♥

❜❡ ✉s❡❞ t♦ ❛♠❛❧❣❛♠❛t❡ ❖❚❯s ❛♥❞ ❝♦♥str✉❝t ❛ ❧♦✇❡r✲❞✐♠❡♥s✐♦♥❛❧✱ ❝♦♠♣♦s✐t✐♦♥❛❧ ❞❛t❛s❡t ♦❢ ✏❜✐♥♥❡❞

♣❤②❧♦❣❡♥❡t✐❝ ✉♥✐ts✑ ✭❇P❯s✮✳ P❧♦tt✐♥❣ t❤❡ ♦❜s❡r✈❡❞ r❡❧❛t✐✈❡ ❛❜✉♥❞❛♥❝❡s ♦❢ ❇P❯s ❛♥❞ t❤❡ r❡❧❛t✐✈❡

❛❜✉♥❞❛♥❝❡s ♦❢ ❇P❯s ♣r❡❞✐❝t❡❞ ❜② ♣❤②❧♦❢❛❝t♦r✐③❛t✐♦♥ ②✐❡❧❞s ❛ s✐♠♣❧❡✱ ♣❤②❧♦❣❡♥❡t✐❝ st♦r② ♦❢ ♠✐❝r♦❜✐❛❧

❛ss♦❝✐❛t✐♦♥s ✇✐t❤ ♣❍✳✳ ❲❤✐❧❡ ♠❛♥② ❆❝✐❞♦❜❛❝t❡r✐❛ t❤r✐✈❡ ❛t ❧♦✇ ♣❍✱ t❤❡r❡ ✐s ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❣r♦✉♣ ♦❢

✬❛❝✐❞♦♣❤♦❜✐❝✬ ❆❝✐❞♦❜❛❝t❡r✐❛ ✇❤✐❝❤ t❤r✐✈❡s ❛t ❤✐❣❤❡r ♣❍ ❛♥❞ ✐s ❝♦♠♣r✐s❡❞ ♦❢ ❝❧❛ss❡s ❈❤❧♦r❛❝✐❞♦❜❛❝t❡✲

r✐❛✱ ❆❝✐❞♦❜❛❝t❡r✐❛✲✻✱ ❙✵✸✺ ❛♥❞ t✇♦ ❖❚❯s ✉♥❝❧❛ss✐✜❡❞ ❛t t❤❡ ❝❧❛ss ❧❡✈❡❧✳ ❆❧s♦✱ ❛ ♠♦♥♦♣❤②❧❡t✐❝ ❝❧❛❞❡

♦❢ ❆❝t✐♥♦♠②❝❡t❛❧❡s ✐s ❤✐❣❤❧② ❛❝✐❞♦♣❤✐❧✐❝ ❛♥❞ ❝♦♥s✐sts ♦❢ ✸✶ ❖❚❯s✱ ✷✽ ♦❢ ✇❤✐❝❤ ❛r❡ ✉♥❝❧❛ss✐✜❡❞ ❛t

t❤❡ ❢❛♠✐❧② ❧❡✈❡❧❀ t❤❡ r❡♠❛✐♥✐♥❣ ❆❝t✐♥♦♠②❝❡t❛❧❡s ❛r❡ ✐♥ t❤❡ r❡♠❛✐♥❞❡r ❜✐♥ s❤♦✇ ❛ ✈❡r② ✇❡❛❦ ❛✣♥✐t②

❢♦r ❧♦✇ ♣❍ s♦✐❧s✳ ◆♦♥❡ ♦❢ t❤❡s❡ ❜✐♥s ♦❢ ❖❚❯s ❝♦rr❡s♣♦♥❞ t♦ ❛ s✐♥❣❧❡ t❛①♦♥♦♠✐❝ ❣r♦✉♣✐♥❣✱ ❜✉t ❛❧❧ ♦❢

t❤❡♠ ❛r❡ ❝❤❛r❛❝t❡r✐③❡❞ ❛♥❞ ❝❧❛ss✐✜❡❞ ❜② t❤❡✐r ❧♦❝❛t✐♦♥s ♦♥ ♦♥❡ s✐❞❡ ♦r t❤❡ ♦t❤❡r ♦❢ ❢♦✉r ❡❞❣❡s ✇✐t❤✐♥

t❤❡ ♣❤②❧♦❣❡♥②✳

✷✾

Page 32: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Figure 1

Shortcomings of Rooted ILR

(a) The isometric log-ratio transform corresponding to a phylogeny rooted at the common

ancestor is inaccurate for geometric changes within clades. Here, absolute abundances of 50

taxa in 30 samples per site were simulated across two sites. An affected clade, B, is up-

represented in the second site. Regression on the rooted ILR coordinates, x_{i}^{*}, against

the sample site indicated that the partition separating clade {A,B} , referred to as

x_{\{A,B\}}^{*}, had the highest test-statistic, but the rooted ILR predicts fold-changes in B

relative to A, not fold changes in B relative to the rest of the taxa. (b) Consequently, when

one clade increase in abundance while the rest remain unaffected, partitions between the

affected clade and the root will also have a signal leading to a correlation in the coordinates

along the path from B to the root. The correlation plotted here is the absolute value of the

correlation coefficient, and the baseline correlation was estimated as the average absolute

value of the correlation coefficient between ILR coordinates not along the root-path of the

affected clade.

Page 33: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Figure 2

Phylofactorization

(a) Phylofactorization changes variables from tips of the phylogeny (OTUs used in analysis of

microbiome datasets) to edges of the phylogeny with the largest predictable differences

between taxa on each side of the edge. To illustrate this method, we consider the treatment

of a bacterial community with an oxazolidinone. Oxazolidinones target gram-positive bacteria

and will likely lead to a decrease in the relative abundances of gram-positive bacteria

(antibiotic susceptible clade, A, having the antibiotic target). Among the antibiotic

susceptible bacteria, phylofactor can identify monophyletic clades that are resistant relative

to other antibiotic-susceptible bacteria due to a vertically-transmitted trait (B) such as the

loss of the antibiotic target or enzymes that break down the antibiotic. (b) The two

phylogenetic factors produce three meaningful bins of taxa - those susceptible to antibiotics

(A), those within the susceptible clade that are resistant to antibiotics (A+B), and a

potentially paraphyletic remainder. (c) Phylofactorization is a greedy algorithm to extract the

edges which capture the most predictable differences in the response of relative abundances

among taxa on the two sides of each edge. (c, top row) For the first iteration, all edges are

considered - an ILR coordinate is created for each edge using equation (1) and the ILR

coordinate is regressed against the independent variable. The edge which maximizes the

objective function is chosen. Depicted above, the first factor corresponds to the edge

separating antibiotic susceptible bacteria from the rest. Then, the tree is split - all

subsequent comparisons along edges will be contained within the sub-trees. The conceptual

justification for limiting comparisons within sub-trees is to prevent over-lapping comparisons:

once we identify the antibiotic susceptible clade, we want to look at which taxa within that

clade behave differently from other taxa within that clade. (c, bottom row) For the second

iteration, the remaining edges are considered, ILR coordinates within sub-trees are

constructed. The edge maximizing the objective function is selected and the tree is split at

Page 34: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

that edge. For more details, see the section “PhyloFactor” in the supplemental info.

Page 35: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Figure 3

Phylofactorization can correctly identify affected clades and be stopped at a

conservative number of phylogenetic factors.

(a) Power Analysis - 1 Clade. The rooted ILR transform that minimizes residual variance when

regressed against sample site is less able to identify the correct clade compared to

phylofactorization for a variety of effect sizes, a, and sample sizes. (b) Three Significant

Clades: When three significant clades are chosen and given a set of effects increasing in

intensity with the parameter b, choosing the top rooted ILR coordinates under performs

phylofactorization in correctly identifying the affected clades. Phylofactorization also explains

more variation in the data: across effect sizes, phylofactorization explains 2 orders of

magnitude more of the variance in the dataset than the sequential rooted ILR. (c) Stopping

Phylofactorization: Plots of the true number of affected clades in simulated datasets against

the number of clades identified by the R package 'phylofactor'. One can terminate

phylofactorization when the true number of affected clade is unknown by choosing a

stopping function aimed at stopping when there is no evidence of a remaining signal. By

stopping the iteration when the distribution of P-values from analyses of variance of

regression on candidate ILR basis elements is uniform (specifically, stopping when a KS test

against a uniform distribution yields P>0.05), we obtain a conservative estimate of the

number of phylogenetic factors in the data.

Page 36: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of
Page 37: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Figure 4

Phylofactorization of human feces/tongue dataset identifies clades differentiating body

sites

(a) Phylogenetic structure is visible as blocks using a phylogenetic heatmap from the R

package 'phytools' (Revell, 2012). The first factor separates Actinobacteria and some

Proteobacteria from the rest, the second factor separates the class Bacilli from the remaining

non-Proteobacteria and non-Actinobacteria, the third factor pulls out the genus Prevotella

from Bacteroidetes and indicates that it, unlike many other taxa in Bacteroidetes, is

unrepresented in the tongue. Each factor captures a major block of variation in the data, and

the orthogonality of the ILR coordinates from each factor allow multiple factors to be

combined easily for estimates of community composition. (b) These three factors splits the

phylogeny into four bins. Three of those bins are monophyletic and the final bin is a

“remainder” bin, containing taxa split off by the previous monophyletic bins. The three

factors are identifiable edges between nodes that can be mapped to an online database

containing those nodes. The taxonomic assignment used here is the set of all shortest-

unique-prefixes that separate the taxa in each bin.

Page 38: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of
Page 39: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of

Figure 5

Dimensionality Reduction and Ordination-Visualization of soil microbiome dataset

Phylofactor presents two complementary methods for projecting and visualizing the high-

dimensional phylogenetically-structured compositional data. (a) The ILR coordinates have

asymptotic normality properties and provide biologically informative ordination-visualization

plots. Here, we we see that pH is a much better predictor than N of the major phylogenetic

factors in Central Park soils. Dominance analysis indicated that pH accounts for

approximately 92.87%, 89.78%, and 92.94% of the explained variance in the first, second,

and third factor, respectively, consistent with previous results based on Bray-Curtis distances

and Mantel tests showing the dominance of pH in structuring soil microbiomes (Ramirez,

2014). (b) Every edge separates one group of taxa into two, and the disjoint groups of taxa

defined by a common phylofactorization - what we refer to as bins - can be used to

amalgamate OTUs and construct a lower-dimensional, compositional dataset of “binned

phylogenetic units” (BPUs). Plotting the observed relative abundances of BPUs and the

relative abundances of BPUs predicted by phylofactorization yields a simple, phylogenetic

story of microbial associations with pH.. While low pH is dominated by Amany Acidobacteria

thrive at low pH, there is a monophyletic group of 'acidophobic' Acidobacteria which thrives

at higher pH and isthat includes comprised of classes Chloracidobacteria, Acidobacteria-6, ,

and S035 and two OTUs unclassified at the class level. The acidophobic Acidobacteria

increase in relative abundance with pH. Also, a monophyletic clade of Actinomycetales is

highly acidophilic and consists of 31 OTUs, 28 of which are unclassified at the family level;

the remaining Actinomycetales are in the remainder bin show a very weak affinity for low pH

soils. None of these bins of OTUs correspond to a single taxonomic grouping, but all of them

are characterized and classified by their locations on one side or the other of four edges

within the phylogeny.

Page 40: Phylogenetic factorization of compositional data yields ... · Thus, instead of analyzing just the tips and a 104 series of Linnaean splits in the tree, a more robust analysis of