24
Optimal Efficient Reconstruction of Root- Unknown Phylogenetic Networks with Constrained and Structured Recombination Author: Dan Gusfield Presentation by: C. Badri Narayanan

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

  • Upload
    dyllis

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination. Author: Dan Gusfield Presentation by: C. Badri Narayanan . Agenda. Main Problem – Root-Unknown galled-tree problem Solving Optimal Root-Unknown Galled-Tree Problem. - PowerPoint PPT Presentation

Citation preview

Page 1: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Optimal Efficient Reconstruction of Root-Unknown Phylogenetic

Networks with Constrained and Structured Recombination

Author: Dan Gusfield

Presentation by: C. Badri Narayanan

Page 2: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Agenda

• Main Problem – Root-Unknown galled-tree problem

• Solving Optimal Root-Unknown Galled-Tree Problem

Page 3: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Root-Unknown Galled-Tree problem

Given a set of sequences (say, M), find a galled-tree with minimum number of recombinations, if one exists else output none

Let’s see the approach previously taken

Page 4: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Points Considered in Theorem(s)

• Only single-crossover recombinations are considered

• The algorithm will be extended to multiple crossover recombinations

Before seeing the approach let’s consider some definitions

Page 5: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Definition of Terms

• Trivial Component: A node with no edges

• Component (a.k.a. Connected/Non-Trivial Component): For any pair of nodes there is at least one path between those nodes

• Reduced galled-tree: If no gall contains a character site from a trivial component

Page 6: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Previous Approaches – A Roadmap

• To construct a galled-tree for M with known ancestral sequence (say, A)

Focus on each non-trivial component

separately from incompatibility graph

For each component in the incompatibility

graph, determine the site arrangement on a

gall

Connect the galls in a tree structure

Place the sites from the trivial components

Page 7: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Difficulties for Unknown Ancestral Sequence

• For any two sequences S & S’ (in M), the conflict and incompatibility graphs may be different

• How do we know which (ancestral) sequence will allow a galled-tree

Page 8: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Optimal Galled-Tree• If a galled-tree that minimizes the number

of recombinations over all galled-trees for a set of sequences (say, M) and over all choices of ancestral sequence then it is called “Optimal Galled-Tree”

• The ancestral sequence of an optimal galled-tree is called an “optimal ancestral sequence”

Page 9: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Author’s Approach: Theorem on Galled Trees – Finding An

Ancestral Sequence

If there is a galled-tree for M with some ancestral sequence, then there is an optimal galled-tree for M where the (optimal) ancestral sequence is one of the sequences in M

Page 10: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Proof for the Theorem

T – optimal galled-tree for M A – ancestral sequence for T

Every gall must have at least three edges branching off of it

Page 11: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Proof continued….

Path P in T from root to some leaf z which doesn’t contain any recombination nodes

Zz – sequence labeling z where Zz is in M

Make Zz as the ancestral sequence &

reverse the directions of all edges on path P

Page 12: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Main Problem contd..

• Each such reversal of edges changes the direction of mutation on edges

• The reversal of edges don’t change

> Labels on edges in T

> Recombination node on a gall

• The modified tree T’ also derives M

Page 13: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Main Problem contd..

• Ancestral sequence of T’ is Zz which is a member of M

• T’ also contains same number of galls and hence T’ is also optimal

• Running time is O(n2 m + n4) where

n – number of sequences

m – length of binary sequence

Page 14: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Solving Optimal Root-Unknown Galled-Tree Problem

• M – can be derived on a galled-tree; T* - an optimal galled-tree for M

• A* - an optimal ancestral sequence

Page 15: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Connecting galls of T*

Assumptions Every node v on a gall Q in T* is

incident with exactly one edge; The

other end is off of Q (a.k.a. “off-edge”)

Off-edge may be directed into or out of a node

(say, x)

Page 16: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Connecting Galls of T*• Transform T* to T’

(conceptually) as follows– Node 00100 (say, x) is

incident with 2 edges– A new edge (say, y) is

introduced– Connect the 2 original

edges (that were initially out of x) from y

– T’ specifies how galls of T* are connected to each other but does not show the internal arrangement of the sites on any gall

Page 17: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Connecting Galls of T*

If x is root of T* then create a new root and connect it with an If x is root of T* then create a new root and connect it with an edge to xedge to x

Contract each gall Q in T* to a single node (say, q) and make all Contract each gall Q in T* to a single node (say, q) and make all edges undirectededges undirected

Page 18: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Algorithmic Construction of T’

• Find a family of splits SP(T)

• C1 & C2 are obtained from the incompatibility graph

• The leaf nodes for the tree (on the right side of the figure) are determined by the sites that have unique combination of characters

Page 19: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Extensions to Complex Biological Phenomena & Structured Recombination

• Site-Arrangement algorithm for gall Q corresponding to component C

Let M(C ) be matrix M restricted to sites in C

Page 20: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Extensions to Complex Biological Phenomena & Structured Recombination For each distinct sequence X in M(C ):

Let M(C, X) be M(C ) after removal of all rows with sequence X

If there is an undirected perfect phylogeny T(C) for M(C,X) where all sites on C are contained in one path whose end sequences can be recombined (with single-crossover) to create sequence X then output the pair (X, T(C ))

Page 21: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Extensions to Complex Biological Phenomena & Structured Recombination

• Step 2 of above algorithm is modified for multiple-crossover recombination

• To determine if X can be created by a multiple-crossover recombination of Su(C) and Sy(C),

starting with Su(C)

– Let Su(C) and Sy(C) denote two sequences

Page 22: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Extensions to Complex Biological Phenomena & Structured Recombination

• Algorithm:– i = 1; Z = Su(C)

– do{

• Find longest substring of Z starting at position i that matches a substring X starting at position i

• If none, return no else

• Set i to position past the right end of those matching substrings

• If Z = Su(C) then set Z = Sy(C) else Z = Su(C)

}

– Return yes

Page 23: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Extensions to Complex Biological Phenomena & Structured Recombination

The above algorithm produces a multiple-crossover galled-tree for M

Page 24: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination

Thank You