Gene Order & Function in Bacteria
• Gene order in bacteria is weakly conserved. [Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996]
• Some genes cluster together even in unrelated species.
• Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial
genomes. Tamames et al.; J Mol Evol. 1997]
Formalization of Gene Clusters
Genomes: permutations π1, π2 ,…, πk
Genes: numbers 1,…,n
π1
π2
π3
π4
1 2 3 4 5 6 7 8
8 7 6 4 5 2 1 3
3 1 2 5 8 7 6 4
6 7 4 2 1 3 8 5
Intervals
• For permutation of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1 i < j n.
• Any permutation of [n] has n(n-1)/2 intervals.
1 3 5 4 2 6 7
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.
• We say SCF .
1 3 5 4 2 6 7 2 4 5 1 3 7 6
0 1
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.
• We say SCF .
1 3 5 4 2 6 7 2 4 5 1 3 7 6
0 1
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.
• We say SCF .
1 3 5 4 2 6 7 2 4 5 1 3 7 6
0 1
Lemma
Let F = (0, 1, …, k-1) and c, d CF .
• If c d then c d CF.
• We call c d reducible.
1 3 5 4 2 6 7 2 4 5 1 3 7 6
0 1
reducible interval
irreducible
Analysis
• We have K n(n-1)/2 common intervals, and I<n irreducible intervals.
• Find all K common intervals of k 2 permutations of [n]:O(kn + K) time & O(n) space
Common Intervals of Trees
Let T,T1,…,Tk be trees with vertex set [n].
Definition:
• S [n] is interval of T iffT[S] connected, and |S|>1
• S [n] is common interval of T1,…,Tk, iffS is interval in all trees.
• Tree intervals generalize intervals of permutations.
Miscellaneous
Example:
common intervals of T1, T2: { [2], [3], [4], [5] }
• (Common) Intervals in trees are induced subtrees.
4321
5
T1
5412
3
T2
Structure of Tree Intervals
• Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iI the assumption Tp Tq for every p,qI implies iITi
The Common Interval Graph
• Given T = (T1,…,Tk ) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with
V = CT
E = {(c,d) | c,d CF, cd , c d}
Example
• V=[n], T=(Pn, Sn-1)
• We have CT = { [2],[3],…,[n] },GT = K(CT).
[2]
[3]
[4]
[n]
1
2
3
4321
4
GT
Common Interval Graphs cont’d
A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices.
Proposition: Common interval graphs of trees are chordal graphs.
Irreducible Common Intervals
For a common interval c CT and a subset V CT we say that V generates c, iff
i. for each d V, d c
ii. c = Ud
iii. GT[V] is connected.
If there is no such V then c is irreducible.
The irred. intervals generate all common intervals.
1
53
2 4
6 7
Finding Irreducible Intervals
• We have K < 2n-1 common intervals, and I<n irreducible intervals.
• Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space
Finding Irreducible Intervals
• Irreducible intervals are minimal common intervals containing an adjacent vertex pair.
yx
l
z
m
x y lz m
yx
l
z
m
x y lz m
Graph Intervals
G=(V,E), undirected, connected graph, V=[n]
S V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S.
1
32
4
1
32
4
convex NOT!
Common Intervals of Graphs
Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n].
Definition: S [n] is common interval of G, iff S is interval in all graphs.
• Graph intervals generalize tree intervals.
1
32
4
2
34
1
G0 G1
Some Differences
3
21
• The common convex hull of an adjacent vertex pair is NOT always irreducible.
3
21
G1 G2
Finding Irreducible Graph Intervals
Sketch: Given G=(G0, G1, …, Gk-1)
For each edge (i,j)Ei* do
S(i,j) := {i,j}
For each (k,l)S(i,j)
Add vertices ‘between’ k and l to S(i,j)
Remove reducible intervals
Extreme Cases
Permutations (identical permutations):
• C n(n-1)/2 I < n
Trees (identical star-trees):
• C < 2n-1 I < n
Graphs (complete graphs):
• C < 2n I n(n-1)/2
Example: InterDom
Database of protein domain interactions.• Gene fusions• Protein-protein interactions (DIP & BIND)• Protein complexes (PDB)
Biological Meaningful?
RAS family domain protein kinase
ankyrin repeat
PH domain
regulator of chromosome condensation