35
Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li

Common Intervals in Sequences, Trees, and Graphs

Embed Size (px)

DESCRIPTION

Common Intervals in Sequences, Trees, and Graphs. Steffen Heber and Jiangtian Li. Genome Comparison of Bacteria. Kim et al ., Nat. Biotechnol. , 2004]. Gene Order & Function in Bacteria. - PowerPoint PPT Presentation

Citation preview

Common Intervals in Sequences,Trees, and Graphs

Steffen Heber and Jiangtian Li

Genome Comparison of Bacteria

Kim et alKim et al.,., Nat. Biotechnol., 2004]

Gene Order & Function in Bacteria

• Gene order in bacteria is weakly conserved. [Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996]

• Some genes cluster together even in unrelated species.

• Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial

genomes. Tamames et al.; J Mol Evol. 1997]

Gene Order & Function in Bacteria

Gene Order & Function in Bacteria

Formalization of Gene Clusters

Genomes: permutations π1, π2 ,…, πk

Genes: numbers 1,…,n

π1

π2

π3

π4

1 2 3 4 5 6 7 8

8 7 6 4 5 2 1 3

3 1 2 5 8 7 6 4

6 7 4 2 1 3 8 5

Intervals

• For permutation of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1 i < j n.

• Any permutation of [n] has n(n-1)/2 intervals.

1 3 5 4 2 6 7

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Lemma

Let F = (0, 1, …, k-1) and c, d CF .

• If c d then c d CF.

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Lemma

Let F = (0, 1, …, k-1) and c, d CF .

• If c d then c d CF.

• We call c d reducible.

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

reducible interval

irreducible

Analysis

• We have K n(n-1)/2 common intervals, and I<n irreducible intervals.

• Find all K common intervals of k 2 permutations of [n]:O(kn + K) time & O(n) space

Common Intervals of Trees

Let T,T1,…,Tk be trees with vertex set [n].

Definition:

• S [n] is interval of T iffT[S] connected, and |S|>1

• S [n] is common interval of T1,…,Tk, iffS is interval in all trees.

• Tree intervals generalize intervals of permutations.

Miscellaneous

Example:

common intervals of T1, T2: { [2], [3], [4], [5] }

• (Common) Intervals in trees are induced subtrees.

4321

5

T1

5412

3

T2

Structure of Tree Intervals

• Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iI the assumption Tp Tq for every p,qI implies iITi

Extreme Cases

n-vertex stars Sn-1

# non-trivial induced subtrees: 2n-1-1

The Common Interval Graph

• Given T = (T1,…,Tk ) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with

V = CT

E = {(c,d) | c,d CF, cd , c d}

Example

• V=[n], T=(Pn, Sn-1)

• We have CT = { [2],[3],…,[n] },GT = K(CT).

[2]

[3]

[4]

[n]

1

2

3

4321

4

GT

Common Interval Graphs cont’d

A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices.

Proposition: Common interval graphs of trees are chordal graphs.

Irreducible Common Intervals

For a common interval c CT and a subset V CT we say that V generates c, iff

i. for each d V, d c

ii. c = Ud

iii. GT[V] is connected.

If there is no such V then c is irreducible.

The irred. intervals generate all common intervals.

1

53

2 4

6 7

Finding Irreducible Intervals

• We have K < 2n-1 common intervals, and I<n irreducible intervals.

• Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space

Finding Irreducible Intervals

• Irreducible intervals are minimal common intervals containing an adjacent vertex pair.

yx

l

z

m

x y lz m

yx

l

z

m

x y lz m

Graph Intervals

G=(V,E), undirected, connected graph, V=[n]

S V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S.

1

32

4

1

32

4

convex NOT!

Common Intervals of Graphs

Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n].

Definition: S [n] is common interval of G, iff S is interval in all graphs.

• Graph intervals generalize tree intervals.

1

32

4

2

34

1

G0 G1

Some Differences

• The union of convex sets is NOT always convex.

Some Differences

3

21

• The common convex hull of an adjacent vertex pair is NOT always irreducible.

3

21

G1 G2

Finding Irreducible Graph Intervals

Sketch: Given G=(G0, G1, …, Gk-1)

For each edge (i,j)Ei* do

S(i,j) := {i,j}

For each (k,l)S(i,j)

Add vertices ‘between’ k and l to S(i,j)

Remove reducible intervals

Extreme Cases

Permutations (identical permutations):

• C n(n-1)/2 I < n

Trees (identical star-trees):

• C < 2n-1 I < n

Graphs (complete graphs):

• C < 2n I n(n-1)/2

Example: InterDom

Database of protein domain interactions.• Gene fusions• Protein-protein interactions (DIP & BIND)• Protein complexes (PDB)

Comparing Two Networks

Comparing Three Networks

G : Gene fusionP : PDBB : BIND D : DIP

Irreducible Intervals

size of irreducible interval

Biological Meaningful?

RAS family domain protein kinase

ankyrin repeat

PH domain

regulator of chromosome condensation

THANK YU!!!