View
215
Download
1
Category
Preview:
Citation preview
Christian M Zmasek, PhDBurnham Institute for Medical Research
Bioinformatics and Systems Biologywww.phylosoft.orgwww.phyloxml.org
PhylogenomicsOriginal definition
the application of phylogenetic information for gene function analysis (Eisen, 1998)
Recent usagespecies evolution based on whole genome
analyses (for example, Dunn et al., 2008)various types of studies at the intersection of
genomics and phylogenetics
2www.phyloxml.org
RAT
MOUSE
HUMAN
CIONA
RAT
CIONA
MOUSE
HUMANCIONA
RAT
CIONA
Y
Z
: query sequence
: orthologous to query
: most similar to query
: gene duplication
RAT
X
Z
Y
3www.phyloxml.org
What information do we need for a phylogenomic analysis (sequence function analysis type)?In phylogenomic analyzes, tree nodes might
be annotated with:Sequence nameSpecies nameDuplication: true/false
Branches might be annotated with:Branch lengthsSupport values (bootstrap, probability, …)
4www.phyloxml.org
What information might we need for other types of phylogenomic analyses?Support values (possible multiple)Taxonomy information (possibly detailed)Geographic informationHost/parasite data (relation between tree
nodes)Gene expression valuesGenomic locationMutations, variation, disease…
5www.phyloxml.org
How is this information processed and stored?Tree topologies are described by hierarchical parenthesis:
((A,B),C)Unique tree node labels mapped to text files, spreadsheets,
databasesManual processing of text files with text editorsMacros, shell scripts, Perl scriptsNew Hamphshire eXtended (NHX) format
Adds tags for different fields: Species: S= Bootstrap support: B=
Example: ADH2:0.1[&&NHX:S=human:B=90]http://www.phylosoft.org/forester/NHX.html
6www.phyloxml.org
How is this information published?Mostly as images of phylogenetic trees in
journalsnot suitable as input for further studies!
Submission to (publicly accessible) databases rare
7www.phyloxml.org
Problems with this approachTediousError pronePublished images are difficult to use as input
for further studiesMeta-analyzes are hardDifferent, and incompatible, “dialects” of
NHX appearedLimited expressiveness
8www.phyloxml.org
phyloXML by example<phylogeny rooted="true"> <name> example from Prof. Joe Felsenstein's book "Inferring Phylogenies“ </name> <clade> <clade> <branch_length>0.06</branch_length> <clade> <name>A</name> <branch_length>0.102</branch_length> </clade> <clade> <name>B</name> <branch_length>0.23</branch_length> </clade> </clade> <clade> <name>C</name> <branch_length>0.4</branch_length> </clade> </clade></phylogeny>
9www.phyloxml.org
phyloXMLImportant elements:
TaxonomySequenceConfidenceEvents (duplication, speciation)Property (“custom data”)Typed relations (between clades, sequences)
XSD schema, examples, description, applications: http://www.phyloxml.org/
Current version: 1.o
10www.phyloxml.org
Important clade level elements <taxonomy>
<id source=“”> <scientific_name> <common_name> <rank> <uri>
<sequence> <symbol> <accession source=“”> <name> <uri>
<confidence type=“”> <distribution>
<desc> <point geodetic_datum=“”>
<lat> <long> <alt>
<property ref=“” unit=“” datatype=“”>
www.phyloxml.org 11
phyloXML applications/implementations (examples)BioPerl:
Parser, writerATV — A Tree Viewer
Java based tree display tool suitable for large (>10 000) and highly decorated phylogenetic/taxonomic trees
http://www.phylosoft.org/atvphyloxml_converter
Command line tool to convert Newick (NH), NHX, and Nexus formatted trees to phyloXML
www.phyloxml.org 12
Recommended