Upload
bernard-houston
View
230
Download
0
Embed Size (px)
Citation preview
Ch.6 Phylogenetic Trees
2
Contents Phylogenetic Trees Character State Matrix
Perfect Phylogeny Binary Character States Two Characters
Distance Matrix Additive Trees Ultrametric Trees
Agreement (Isomorphic) between Phylogenies
3
Phylogenetic Trees (Phylogenies) Explain the evolutionary history of today’s species (Figure 6.1) A hypothesis; do not have enough data about distant ancestors
of present-day species Characteristic
Leaf; an object or a set of objects, Interior node; hypothetical ancestor objects
Unrooted tree Classify input data for phylogeny reconstruction into main categ
ories Character state matrix Distance matrix
4
Character State Matrix Character have following features
Independent inheritance Homologous
Character state matrix A matrix M with n rows (objects) and m columns (characters) Mij denotes the state the object i has for character j Each row is the state vector for an object
5
Difficulties to create a phylogeny from a character state matrix
Convergence or parallel evolution Objects that share the same state are genetically closer than
objects that do not Reversal
Gains and losses of the character
☞ assume convergence or reversal should not happen, or their number should be minimized
Ordered or unordered, directed
6
Perfect Phylogeny Problem For each state s of each character c, the set of all no
des u (leaves and interior nodes) for which the state is s with respect to c must form a subtree of T
Characters are compatible If a set of objects defined by a character state matrix admits
a perfect phylogeny
7
Example
8
Perfect Phylogeny Problem How many different trees can we build for n objects?
Consider only unrooted binary trees )!( )52(
3nOi
n
i
9
Binary Character States Two phases algorithm (runs in time O(nm))
Decide whether the input matrix M admits a perfect phylogeny
Construct one possible phylogeny Assume that state 0 is ancestral and state 1 is
derived
10
Deciding perfect phylogeny A rooted tree T is a perfect phylogeny for input matrix
M, if Every character in input matrix M there corresponds an edge
in T, and this edge marks the transition from state 0 to state 1 for that character
Edges are labeled by their respective characters and root has character state vector (0, 0, …, 0)
11
Deciding perfect phylogeny Definition 6.1 For each column j of M, let Oj be the se
t of objects whose state is 1 for j. Let Oj be the set of objects whose state is 0 for j
Lemma 6.1 A binary matrix M admits a perfect phylogeny if and only if for each pair of character i and j the sets Oi and Oj are disjoint or one of them contains the other
12
Deciding perfect phylogeny Example; Table 6.2
O1 = {B, D}, O2 = {B}, O3 = {D}
O4 = {A, C, E}, O5 = {A, C}, O6 = {C}
Lemma 6.1 for decision phase takes O(nm2) Figure 6.5 Algorithm Perfect Binary Phylogeny Decision ->
O(nm)
13
Deciding perfect phylogeny
if Lij ≠ Llj for some i, l and both Lij and Llj are nonzero then
return FALSE
M c4 c1 c5 c2 c3 c6
A 1 0 1 0 0 0
B 0 1 0 1 0 0
C 1 0 1 0 0 1
D 0 1 0 0 1 0
E 1 0 0 0 0 0
L c4 c1 c5 c2 c3 c6
A -1 0 1 0 0 0
B 0 -1 0 2 0 0
C -1 0 1 0 0 3
D 0 -1 0 0 1 0
E -1 0 0 0 0 0
14
Construction perfect phylogeny Figure 6.6 Algorithm Perfect Binary Phylogeny
Construction Running time O(nm)
15
Unordered binary character The majority state becomes 0 and the other 1 If equal frequency, choose either one to be 0 and the
other to be 1
16
Two characters Allow characters can be unordered and have an arbitrary numb
er of states, but restrict on the maximum number of characters two
Definition 6.2 A triangulated graph is an undirected graph in which any cycle with four or more vertices has a chord, that is, an edge joining two nonconsecutive vertices of the cycle
Theorem 6.1 To every collection of subtrees {T1, T2, …, Tl} of a tree T there corresponds a triangulated graph and vice versa
17
Two characters Definition 6.3 An intersection graph for a collection C of sets is the
graph G that we get by mapping each set in C to a vertex of G, and linking two vertices in G by an edge if the corresponding sets have a nonempty intersection
Definition 6.4 Given a graph G = (V, E) with a coloring c on V, we say that G can be c-triangulated if there exists a triangulated graph H = (V, E’), such that E ⊆ E’ and c is a valid coloring for H. In other words, any edge present in E’ but not in E must link two vertices with different colors
18
Two characters Theorem 6.2 A character state matrix M, with a char
acter set defining a coloring c, admits a perfect phylogeny if and only if its corresponding SIG can be c-triangulated
Theorem 6.3 A character state matrix M with only two characters admits a perfect phylogeny if and only if its corresponding SIG is acyclic
19
Example
x1
y1
x2
z2
x3
y3
z3y2
{B} {A, B}
{A}
{B, C}
{C}
{C, D}
{D}
{A, D}
20
Reconstruction algorithm for two characters
Running time O(n) Test for acyclicity -> O(n) Reconstruction of the perfect phylogeny -> O(n)
21
Parsimony and Compatibility Real character state matrices are unlikely to admit
perfect phylogenies Experimental data always carries errors The assumptions (no reversals and no convergence)
sometimes are violated Two approach
Parsimony criterion Allow reversal and convergence events, but to try to minimize
their occurrence Compatibility criterion
Find a maximum set of characters that are compatible -> exclude characters that cause such “problem”
22
Algorithms for Distance Matrices Problem of reconstructing trees based on comparativ
e numerical data between n objects, distance matrix M
Consider two problems Reconstructing Additive Trees Reconstructing Ultrametric Trees
23
Reconstructing Additive Trees Metric space
A set of objects O such that to every pair i, j ∈ O and associated a nonnegative real number dij with the following properties:
dij > 0 for i ≠ j,
dij = 0 for i= j,
dij = dji for all i and j,
dij ≤ dik + dkj for all i, j, and k (the triangle inequality)
M and T are additive Tree must have n leaves Leaves are nodes with degree one; the others with degree three All edges in the tree have nonnegative weight The weight of the path between any two leaves i and j must be equal to
Mij
24
Reconstructing Additive Trees Lemma 6.2 A metric space O is additive if and only if gi
ven any four objects of O labeled i, j, k, and l such that dij + dkl = dik + djl ≥ dil + djk
If M is additive, T is unique (algorithm runs in time O(n2))
Real-life distance matrices are rarely additive due to errors in the distance measurement
Obtain a tree that is as close as possible to an additive tree Approaching the problem that is tractable
25
Reconstructing Ultrametric Trees Given two distance matrices, Ml and Mh, reconstruct a
n evolutionary tree such that the distances measured on the tree fit “between” these two input matrices (sandwich constraints, )
A tree is ultrametric when it is additive and can be rooted in such a way that the lengths of all leaf-root paths are equal -> the objects being studied have evolved at equal rate from a common ancestor
hijij
lij MdM
26
Reconstructing Ultrametric Trees link of a and b in MST T; (a, b)max
The largest-weight edge in the unique path from a to b in T Definition 6.5 The cut-weight of an edge e of the mini
mum spanning tree of Gh is given by
})(|max{)( max, a,beMeCW lba
27
Reconstructing Ultrametric Trees Reconstruction algorithm -> runs in time O(n2)
Compute a MST T of Gh; Construction of R; Compute CW(e); Build ultrametric tree U
28
Agreement between Phylogenies In practice it occurs quite often that two different meth
ods applied on the same data yield different trees (in the topological sense)
Definition 6.6 We say that a tree Tr refines another tree Ts whenever Tr can be transformed into Ts by contracting selected edges from Tr. Two trees T1 and T2 agree when there exists a tree T3 that refines both
29
Isomorphic Two trees T1 and T2 are isomorphic when there is an
one-to-one correspondence between their nodes such that for every pair u, v of corresponding nodes, u ∈ T1 and v ∈ T2, the objects contained in leaves below u are the same as the objects contained in leaves below v
Binary Tree Isomorphism Figure 6.21 runs in time O(n)
General case (leaves contain several objects) Figure 6.22 runs in time O(n)