Upload
angeni
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The (Supertree) of Life: Procedures, Problems, and Prospects. Presented by Usman Roshan. Supertree Methods. Input: Set of trees Output: Tree leaf-labeled by where is the set of leaves of . Why supertree methods?. - PowerPoint PPT Presentation
Citation preview
The (Supertree) of Life: Procedures, Problems, and
ProspectsPresented by Usman Roshan
Supertree Methods• Input: Set of trees • Output: Tree leaf-labeled by where is
the set of leaves of .
• Why supertree methods?
}{ ,,1 kTT T
)(TLk
iiTL
1)(
TT
T
Motivation (1)
• Supertree methods are used as part of divide-and-conquer method to solve NP-hard problems on large datasets
Motivation (2)
• Supertree methods are used when we have missing data
Types of supertree methods (1)• Direct methods (e.g. strict consensus
supertrees, MinCutSupertrees)
Types of supertree methods (2)
• Indirect methods (e.g. MRP, average consensus)
Types of supertree methods (3) (MRP)
Definitions• Contraction:
• Restriction:
• If then contains
23 TT
2},,,{|3 TT EDBA
2)(|3 2 TT TL 3T 2T
Optimization problems
• Subtree Compatibility: Given set of trees ,does there exist tree ,such
that, (we say contains ).
• NP-hard (Steel 1992)• Special cases are poly-time (rooted trees,
DCM)• MRP: also NP-hard
}{ ,,1 kTT T TtTt tL )(|,T T
T
Limitations of supertree methods
Three desirable properties:• P1: Method can be applied to any unordered set of
input trees• P2: Renaming the species does not change the
constructed supertree• P3: If the input trees are compatible, then the output
tree is one of the “parent trees”.There is no supertree method that can satisfy P1-P3 when the input trees are unrooted; however, forrooted trees an extension of BUILD satisfies P1-P3.
Rooted subtrees (BUILD)(Aho et al 1981)
• Input: Set of rooted trees
• Output: Tree that contains
TT T
BUILD (2) - Definitions
• Cluster: Set of taxa in a rooted subtree
• A different representation of rooted phylogenetic trees
• Let C(T) be the clusters of tree T. In this example C(T) = {{1,2}, {3,4}, {1,2,3,4},{1,2,3,4,5}}
• We write (IJ)K in T, if I,J are in some cluster of T which doesn’t contain J; e.g. (12)3, (34)5 are in T
BUILD (3) - Algorithm
1. Initialize C as set of input taxa2. If |C|=1 return C, else compute graph
3. Let C’ be the sets of taxa in the connected components of G. If |C’| = 1 then is incompatible, else set C = C C’, and repeat step (2) on each new cluster in C’.
})(,:),{( TkijTCkjiE T
}{speciesV
T
BUILD (4) - Algorithm
BUILD (5) - Algorithm
BUILD (6) - Algorithm
BUILD (7) - Algorithm
Compatible source trees
• For compatible source trees, MRP or BUILD can be used; however, the strict consensus of MRP trees (or the strict consensus supertree) may not be compatible with the input.
• BUILD has been extended to output all parent trees; also shown that source trees have a unique parent tree iff BUILD constructs a binary tree.
Incompatible source trees (1)
For incompatible source trees two strategies:
• Resolve incompatibilities by using quartet methods or removing troublesome taxa.
• Use an appropriate algorithm such as MRP or MinCutSupertrees; the latter is an extension of BUILD so that it always outputs a tree.
Incompatible source trees (2)
Desirable property• P1: If at least one tree contains (IJ)K and no
source tree contains (IK)J or (JK)L, then the output tree must contain (IJ)K
No method can satisfy P1; however, thecondition: if all source trees contain (IJ)K then output must contain (IJ)K can be satisfied.
Supertree criticism
• Do not take biomolecular sequences into account• Dataset non-independence• MRP: Favors larger source trees because they
contribute more characters; may also favor unbalanced source trees
• Direct methods: Cannot incorporate support values in the source trees (except for MinCutSupertrees), and cannot compute support values in the supertree (unlike MRP)
Applications of supertrees
• Systematics – MRP is the standard method used by biologists
• Evolutionary models
• Rates of cladogenesis
• Evolutionary patterns
• Biodiversity and conservation
Bright future for supertree construction
• Despite increase in phylogenetic data, species are poorly characterizes at the molecular level; thus, giving rise to problems from taxon sampling (non-random sampling), long branch attraction, and missing data
• ML analysis: Genes evolve under different models• Non-molecular data