29
1 Modified Mincut Supertrees Roderic Page University of Glasgow

1 Modified Mincut Supertrees Roderic Page University of Glasgow

Embed Size (px)

Citation preview

Page 1: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

1

Modified Mincut Supertrees

Roderic PageUniversity of Glasgow

Page 2: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

2

Tree of Life

About 1.7 million species described.

What we have so far:

• TreeBASE database (15,000 taxa)

• Ribosomal Database Project (RDP II) (20,000 sequences)

• The Tree of Life Project (11,000 taxa)

Page 3: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

3

Recent interest in the Tree of Life

Assembling the Tree of Life: Science, Relevance, and Challenges AMNH, New York, May 2002

$US 10 million “to construct a phylogeny for the 1.7 million described species ofLife” announced February 15th 2002

NSF sponsored “Tree of Life” workshops(2000-2001)

European initiative (ATOL) under FP6

Page 4: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

4

Problem: how to build the tree of life

Solutions:

• Find one or more “magic markers” that will allow us to recover the whole tree in one go (problems: combinability and complexity)

• Assemble big tree from many smaller trees derived from many kinds of data (supertrees)

Page 5: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

5

Tree terminology

a b c d

{a,b}

{a,b,c}

{a,b,c,d} root

leaf

internal nodecluster

edge

Page 6: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

6

Nestings and triplets

a b c d

{a,b} <T {a,b,c,d}

{b,c} <T {a,b,c,d}

(bc)d

bc|d

Nestings

Triplets

Page 7: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

7

Supertree

a b c b c da b c d

supertree

T1 T2

+ =

Page 8: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

8

Some desirable properties of a supertree method

(Steel et al., 2000)

• The supertree can be computed in polynomial time

• A grouping in one or more trees that is not contradicted by any other tree occurs in the supertree

Page 9: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

9

Homo sapiens 1 1 1

Pan paniscus 1 1 1

Gorilla gorilla 1 1 0

Pongo pygmaeus 1 0 0

Hylobates 0 0 0

1 2 3

1

2

3

MRP(Matrix Representation Parsimony)

•NP-hard•Can generate many solutions

Page 10: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

10

Aho et al.’s algorithm (OneTree)Aho, A. V., Sagiv, Y., Syzmanski, T. G., and Ullman, J. D. 1981. Inferring a

tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10: 405-421.

Input: set of rooted trees

1. If set is compatible (i.e., will agree on a tree), output that tree.

2. If set is not compatible, stop!

Page 11: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

11

a b c b c d

T1 T2

a b

cd

a, b

d

a, b, c, d

a b

ca, b, c

a b

c

Aho et al.’sOneTree algorithm

supertree

Page 12: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

12

Mincut supertreesSemple, C., and Steel, M. 2000. A supertree method for

rooted trees. Discrete Appl. Math. 105: 147-158.

• Modifies OneTree by cutting graph

• Requires rooted trees (no analogue of OneTree for unrooted trees)

• Recursive

• Polynomial time

Page 13: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

13

a b c d e a b c d

T1 T2

a

b

c

de

{T 1,T 2}S

Semple and Steel (2000)

Page 14: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

14

a

b

c

de

a,b

c

de

1

1 1

1

11

1

2

{T1,T2}Smax

S /E{T1,T2} {T1,T2}

Collapsing the graph(Semple and Steel mincut algorithm)

This edge has

maximum weight

Page 15: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

15

Cut the graph to get supertree

a b c d e

supertree

a,b

c

de

1

1

1max

S /E{T1,T2} {T1,T2}

Page 16: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

16

My mincut supertree implementationdarwin.zoology.gla.ac.uk/~rpage/supertree

• Written in C++

• Uses GTL (Graph Template Library) to handle graphs (formerly a free alternative to LEDA)

• Finds all mincuts of a graph faster than Semple and Steel’s algorithm

Page 17: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

17

A counter example: two input trees...

a

b

c

x1

x2

x3

c

b

a

y1

y2

y3

y4

Page 18: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

18

Mincut gives this (strange) result

cx1x2x3bay1y2y3y4

• Disputed relationships among a, b, and c are resolved

• x1, x2, and x3 collapsed into polytomy

Page 19: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

19

Problem:Cuts depend on connectivity(in this example it is a function of tree size)

a

x1

x2 y1

y3

y4x3

y2

c

b

{T1,T2}S

Page 20: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

20

So, mincut doesn’t work

• But, Semple and Steel said it did

• My program seems to work

• Argh!!! What is happening….?

Page 21: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

21

What mincut does… …and does not do

• Mincut supertree is guaranteed to include any nesting which occurs in all input trees

• Makes no claims about nestings which occur in only some of the trees

• “Does exactly what it says on the tin™”

Page 22: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

22

Modifying mincut supertree

• Can we incorporate more of the information in the input trees?

• Three categories of information• Unanimous (all trees have that grouping)• Contradicted (trees explicitly disagree)• Uncontradicted (some trees have information

that no other tree disagrees with)

Page 23: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

23

Uncontradicted informationassume we have k input trees

a b

a and b co-occurin a tree

a and b nestedin a tree

a b

c n

c - n = 0 uncontradicted (if c = k then unanimous)

c - n > 0 contradicted

Page 24: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

24

Uncontradicted informationassume we have k input trees

a b

a and b co-occurin a tree

a and b nestedin a tree

a b

c n

c - n -f = 0 uncontradicted (if c = k then unanimous)

c - n - f > 0 contradicted

a b

a and b in a fan

f

Page 25: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

25

a

b

c

x1 x

x3

y1 y2y3 y4

2

a

b

c

y1

y3

y4

x1

x2

x3

y2

Uncontradicted

Uncontradicted but adjacent to contradictedContradicted

Classifying edges

{T1,T2}S

Page 26: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

26

Modified mincut

• Species a, b, and c form a polytomy

• x1, x2, and x3 resolved as per the input tree

modified mincut

abcx1x2x3y1y2y3y4

Page 27: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

271 2 3 4 5

1 2 3 4 5 1 2 3 4 5

1 2 3 4 5

(12)5

(45)1

(23)5

(34)1

If no tree contradicts an item of information, is that information always in the supertree?

Page 28: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

28

1 2

3

4

5

No!Steel, Dress, & Böcker 2000

• The four trees display (12)5, (23)5, (34)1, and (45)1

• No tree displays (IK)J or (JK)I for any (IJ)K above

• Triplets are uncontradicted, but cannot form a tree

Page 29: 1 Modified Mincut Supertrees Roderic Page University of Glasgow

29

Future directions

• Improve handling of uncontradicted information

• Add support for constraints

• Visualising very big trees

• Better integration into phylogeny

databases (www.treebase.org)

darwin.zoology.gla.ac.uk/~rpage/supertree