Upload
shaine-villarreal
View
27
Download
1
Embed Size (px)
DESCRIPTION
Homomorphism Mapping in Metabolic Pathways. Qiong Cheng, Robert Harrison , Alexander Zelikovsky Computer Science in Georgia State University. Oct. 17 2007 IEEE 7 th International Conference on BioInformatics & BioEngineering. Outline. Comparison of metabolic pathways - PowerPoint PPT Presentation
Citation preview
Qiong Cheng, Robert Harrison, Alexander ZelikovskyComputer Science in Georgia State University
Oct. 17 2007 IEEE 7th International Conference on BioInformatics & BioEngineering
Homomorphism Mapping in Metabolic Pathways
Outline Comparison of metabolic pathways Graph mappings: embeddings & homomorphisms Min cost homomorphism problem
Enzyme similarity Optimal DP algorithm for trees Searching metabolic networks for
pathway motifs pathway holes
Future work
Metabolic pathway & pathways model
Metabolic pathways model
A portion of pentose phosphate pathway
1.1.1.49
1.1.1.34
2.7.1.13
3.1.1.31 1.1.1.44
Metabolic pathway
Comparison of metabolic pathways
Enzyme Similarity
Pathway topology Similarity
Enzyme similarity and pathway topology together represent the similarity of pathway functionality.
Mismatch/Substitute
match
match
match
Related work
Linear topology
Tree topology
DCBA
D’X’
A’
(Forst & Schulten[1999], Chen & Hofestaedt[2004];)
D
CB
A
X’
B’
A’
(Pinter [2005] VGVTlogVGVGVTlogVT)
Arbitrary topology
Mapping : Linear pattern Graph (Kelly et al 2004) ( VTVG)Exhaustively search(Sharan et al 2005 ( VTVG) Yang et al 2007 ( VGVG)
Enzyme mapping cost
EC (Enzyme Commission) notation
Enzyme similarity score Δ
Measure Δ by tight reaction property Enzyme X = x1 . x2 . x3 . x4Enzyme Y = y1 . y2 . y3 . y4
= = = = Δ[X, Y ] = 1= = = Δ[X, Y ] = 10
Δ[X, Y ] = +∞
Measure Δ by the lowest common upper class distribution
Enzyme D = d1 . d2 . d3 . d4
Δ[X, Y ] = log2c(X, Y )
otherwise
Graph mappings: embeddings & homomorphisms
Isomorphism
T G
f
Homomorphism
Isomorphic embedding Homeomorphic embedding
Homomorphism f : T G: fv : VT VG
fe : ET paths of GEdge-to-path cost : (|fe(e)|-1)
We allow different enzymes to be mapped to the same enzyme.
Homomorphism coste in ET
(|fe(e)|-1)+
Δ(v, fv(v))v in T=
Min cost homomorphism problem
A multi-source tree is a directed graph, whose underlying undirected graph is a tree.
ignoring direction
Given a multisource tree T =<VT, ET> (Pattern) and an arbitrary graph G =<VG, EG> (Text), find min cost homomorphism f : T G
Preprocessing of text graph
Transitive closure of G is graph G*=(V, E*), where E*={(i,j): there is i-j-path in G}
Text G
A B
C D
E
FTransitive Closure of G : G*
A B
C
D
EF
1
1 1
1
1
1
2
23
2
2
3
Transitive closure
Pattern graph ordering
Pattern T
a b
c
d
Ordering
c bd a
Construct ordered pattern T’• DFS traversal • Processing order in opposite way
• Each edge ei in T’ is the unique edge connecting vi with the previous vertices in the order
Ordered pattern T’
DP table
u1 … uj … u|VG|
a
b
c
d
Text arbitrary order
DT[a, uj]
min cost homomorphism mapping from T’s subgraph induced by
previous vertices in the order into G*
Pattern T
a b
c
d
Filling DP table
is penalty for gaps
Δ(vi , uj) if vi is a leaf in T
Δ(vi, uj) + ∑l=1 to adj(vi)Minj’=1 to |VG|C(il, j’) if vi is a leaf in T
=DT[i, j]i<|VT|j<|VG|
Recursive function
h(j, jl) = #(hops between uj and ujl in G)
C[il, jl] = DT[il, jl] + (h(j, jl) - 1)
vil
vi
Pattern T G*
uj’
uj
h(j, j’)
Runtime Analysis
Transitive closure takes O(|VG||EG|).
The total runtime is O(|VG||EG|+|VG*||VT|).
Pattern graph ordering takes O(|VT| + |ET|) Dynamic programming
- Calculate min contribution of all child pairs of node pair (vi∈T,uj∈G) takes tij = degT (vi)degG*(uj)
- Filling DT takes
j=1 to |VG| i=1 to |VT|tij = j=1 to |VG| degG*(uj)i=1 to |VT|degT(vi)
= 2|EG*||ET|
Statistical significance
Randomized P-Value computation
Random degree-conserved graph generation: Reshuffle nodes a b
c d
a b
c d
Reshuffle edgesReshuffle edge
Experiments & applications
Identifying conserved pathways 24 pathways that are conserved across all 4
species 18 more pathways that are conserved
across at least three of these species
Resolving ambiguity
Discovering pathways holes
All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli
Observation example 1 : Resolving Ambiguity
2.6.1.1
2.6.1.11.2.4.-1.2.4.2
2.3.1.-2.3.1.61
6.2.1.5
6.2.1.5
1.3.99.1
1.3.99.1
4.2.1.2
4.2.1.2
1.
1.1.1.82
1.1.1.82
Mapping of glutamate degradation VII pathways from B. subtilis to T. thermophilus (p < 0.01).
Future work
Approximation algorithm to handle with the comparison of general graphs
Mining protein interaction network
A web-oriented tool will be developed
Discovery of critical elements or modules based on graph comparison
Discovery of evolution relation of organisms by pathway comparison of different organisms at different time points
Integration with genome database