Phylogenetic Workflows

Preview:

DESCRIPTION

Phylogenetic Workflows: Tree Building and Post-tree AnalysesGiven at the Dept for Ecology and Evolutionary Biology, University of Arizona in 2011A phylogenetic workflow example showcasing iPlant Cyberinfrastructure

Citation preview

Phylogenetic Workflows:Tree Building andPost-tree Analyses

Naim MatasciThe iPlant Collaborative

Plant Biology 2011August 6-10, 2011

Why is the tree of life important?

“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”

Nothing in biology makes sense except in the light of evolution.

T. G. Dobzahnsky

Scalability

Ackerly, 2009; J. Felsenstein, ca. 1980; Ranger Cluster at TACC

iPlant Tree of Life Grand Challenge

Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants

Tree VisualizationScalable visualization for small to large trees

Data Assembly and IntegrationAcquisition, organization and processing the data

Taxonomic IntelligenceSorting out different names for the same species

Tree ReconciliationResolving discordant gene and species trees

Trait EvolutionUsing trees to understand how traits evolved

Ancestral state of Hawaiian lobelioids

Lobelia niihauensis (Image: David Eickhoff)

Cyanea leptostegia (Image: Karl Magnacca)

(Schulter et al. 1997, Paradis 2004)

Continuous Ancestral Character Estimation

?

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

>gi|1835233|emb|Z83147.1| S.nepaulensis rbcL geneTTATTATACTCCTGAATAYGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTGCTCAGCCTGGAGTTCCACCCGAAGAAGCGGGGGCCGCGGTAGCTGCGGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAACCTTGATCGTTACAAAGGGCGATGCTACAACATAGAGCCCGTTGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTATCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTACTGCGTATTGTAAAACTTTCCAAGGACCGCCTCATGGGATCCAAGTTGAAAGAGATAAATTGAACAAGTATGGTCGTCCCTTGCTGGGATGTACTATTAAACCTAAATTGGGGTTATCGGCTAAAAACTACGGTAGAGCAGTTTATGAATGTCTACGCGGTGGGCTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGACCGTTTCGTATTTTGTGCCGAAGCAATTTTTAAAGCACAGTCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCTACTGCAGGTACATGTGAAGAAATGATGAAAAGGGCTATATTT

>gi|1835227|emb|Z83136.1| S.foetidissimum rbcL geneAAGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTGACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCAGGGGCCGCGGTAGCTGCCGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTNGCTGGAGAAGAAAATCAATATATTGCTTATGTAGCTTATCCTTTAGACCTYTTTGAAGAAGGTTCTGTTACTAATATGTKNACTTCCATTGTGGGGAATGTATTTGGGTTCAAAGCCCTGCGTGCTTTACGTCTGGAAGATCTGCGAATCCCTCCTGCGTATTCTAAAACTTTCCAAGGACCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAACAAGTACGGTCGTCCCCTGTTGGGATGTACTATTAAACCTAAATTGGGGTTATCTGCTAAAAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTGGACTTGATTTTACCAAAGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGAGATCGTTTCTTATTTTGTGCCGAAGCACTTTATAAAGCACAGGCTGAAACAGGTGAAATCAAAGGGCATTACTTGAATGCT

>gi|1834456|emb|Z83132.1| G.urceolata rbcL geneAACTAAAGCGGGTGTTGGATTCAAAGCGGGTGTTAAAGATTACAAATTAACTTATTATACTCCTGACTATGAAACCAAAGATACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCACCTGAAGAAGCGGGGGCCGCCGTAGCTGCCGAATCCTCCACTGGTACATGGACAACTGTGTGGACCGACGGACTTACTAGCCTTGATCGTTACAAAGGGCGATGCTACCACATCGAGCCCGTGGCTGGAGAAGAAAATCAATTTATTGCTTATGTAGCTTACCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGGTTCAAAGCCCTGCGCGCTCTACGTCTGGAAGATCTGCGAATCCCTGTTGCGTATGCTAAAACTTTCCAAGGGCCGCCTCATGGCATCCAAGTTGAAAGAGATAAATTGAATAAGTATGGTCGTCCCCTG

Get Sequences

• Retrieves nucleotide and amino acid sequences from NCBI's GenBank

• Automatically includes species name and taxon ID

Get sequences DEMO

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

muscleDEMO

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Improved Tree Building Tools

NINJA/WINDJAMMER (Travis Wheeler)Neighbor-Joining implementation that can analyze > 200K species

Six day run time reduced 32-fold to 4.5 hours for 220K species data set

Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set

RAxML-Light (Alexandros Stamatakis)

Large Scale Maximum Likelihood implementation

55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)

RAxML DEMO

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Tree Visualization

• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information

iPlant Tree Viewer

http://portnoy.iplantcollaborative.org/

Live tree view demo

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Obstacles

Lopper DEMO

Lobelia kauaensisLobelia villosaGaleatella gloria-montisTrematolobelia kauaiensisTrematolobelia macrostachysLobelia hypoleucaNeowimmeria yuccoidesLobelia niihauensisBrighamia insignisBrighamia rockiiDelissea rhytidospermaDelissea subcordataCyanea acuminataCyanea hirtellaCyanea coriaceaDelissea leptostegiaClermontia kakeanaClermontia parvifloraClermontia arborescensClermontia fauriei

The TNRS: A Taxonomic Name Resolution Service for Plants

Tonight from 5:30 - 7:30 in Exhibit Hall A.Poster number P21011.

Obtain sequences

•GetSeq

Obtain sequences

•GetSeq

Align sequences

•Muscle

Align sequences

•MuscleBuild Tree

•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Build Tree•FastTree (aML)•Ninja (NJ)•PHYLIP (MP, NJ, ML)•RAxML (ML)

Visualize Tree

•iPlant Tree Viewer

Visualize Tree

•iPlant Tree Viewer

Integrate Data

•Lopper•TNRS

Integrate Data

•Lopper•TNRS

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

Run Analysis•CACE•DACE•Contrast•OUch•Picante•Penalized likelihood

CACE DEMO

Recommended