View
226
Download
0
Tags:
Embed Size (px)
Citation preview
12. Lecture WS 2006/07
Bioinformatics III 1
V12 Transcriptional regulatory networks
RegulonDB: database with information on transcriptional regulation and operon
organization in E.coli; 105 regulators affecting 749 genes
7 regulatory proteins (CRP, FNR, IHF, FIS, ArcA, NarL and Lrp) are sufficient
to directly modulate the expression of more than half of all E.coli genes.
Out-going connectivity follows a power-law distribution In-coming connectivity follows
exponential distribution (Shen-Orr).
Martinez-Antonio, Collado-Vides, Curr Opin Microbiol 6, 482 (2003)
12. Lecture WS 2006/07
Bioinformatics III 2
Frequency of co-regulation
Regulation by multiple TFs occurs in half of genes.
In most cases, a „gobal“ regulator (with > 10 interactions) works together with a
more specific local regulator.
Martinez-Antonio, Collado-Vides, Curr Opin Microbiol 6, 482 (2003)
12. Lecture WS 2006/07
Bioinformatics III 3
Regulation of TFs and club co-regulation
However, in a process of decisions and
information flux, the number of controlled
or affected elements is not the only factor
to be considered.
A hierarchy of different levels of decision
is natural to our understanding of how
things get done.
In general, global regulators work
together with other global regulators.
Dynamics of decison-making is a
cooperative process of different
subsets of the network put into action
at certain moments.
Martinez-Antonio, Collado-Vides, Curr Opin Microbiol 6, 482 (2003)
12. Lecture WS 2006/07
Bioinformatics III 4
Response to changes in environmental conditions
The second function of TFs is to sense changes in environmental conditions or
other internal signals encoding changes.
Martinez-Antonio, Collado-Vides, Curr Opin Microbiol 6, 482 (2003)
Global environment growth conditions in which TFs are regulating.
# in brackets indicates how many additional TFs participate in the
same number of conditions.
12. Lecture WS 2006/07
Bioinformatics III 5
Do we need to rely on experiments?
Determine homology between the domains and protein families
of TFs and regulated genes
and proteins of known 3D structure.
Determine uncharacterized E.coli proteins with
DNA-binding domains, thus identify large majority
of E.coli TFs.
Finding: 75% of all TFs are two-domain proteins.
Analysis of domain architecture shows that 75% of
the TFs have arisen by gene duplication.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
Sarah Teichmann
MRC LMB Cambridge
Madan Babu,
PhD student at LMB
12. Lecture WS 2006/07
Bioinformatics III 6
Flow chart of method to identify TFs in E.coli
SUPERFAMILY database (C. Chothia)
contains a library of HMM models based
on the sequences of proteins in SCOP for
predicted proteins of completely
sequenced genomes.
In addition to our set of 271 transcription
factors, there are eight transcription
factors without a DBD assignment that
have known regulatory information.
Remove all DNA-binding proteins involved
in replication/repair etc.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 7
3D structures of putative (and real) TFs in E.coli
The three-dimensional
structures of the 11 DBD
families seen in the 271
identified transcription
factors in E.coli. The
figure highlights the fact
that even though the
helix–turn–helix motif
occurs in all families
except the nucleic acid
binding family, the
scaffolds in which the
motif occurs are very
different.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 8
Domain architectures of TFs
The 74 unique domain architectures of the 271
identified TFs. Each functional class is represented by
a different shape and each family within the functional
class is represented by a different colour.
The DBDs are represented as rectangles. The partner
domains are represented as hexagons (small
molecule-binding domain), triangles (enzyme
domains), circles (protein interaction domain),
diamonds (domains of unknown function) and the
receiver domain has a pentagonal shape.
The letters A, R, D and U denote activators,
repressors, dual regulators and TFs of unknown
function, and the number of TFs of each type is given
next to each domain architecture.
Architectures of known 3D structure are denoted by
asterisks, and ‘+’ are cases where the regulatory
function of a TF has been inferred by indirect methods,
so that the DNA-binding site is not known.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 9
Evolution of TFs
10% 1-domain proteins
75% 2-domain proteins
12% 3-domain proteins
3% 4-domain proteins
TFs have evolved by extensive recombination of domains.
Proteins with the same sequential arrangement of domains are likely to be direct
duplicates of each other.
74 distinct domain architectures have duplicated to give rise to 271 TFs.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 10
Organisation of transcriptional regulatory network
For 121 TFs, there is information on their regulated genes.
They can be divided into 10 general functional categories.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 11
Regulatory cascades
The TF regulatory network in E.coli.
When more than one TF regulates a gene,
the order of their binding sites is as given in
the figure. An arrowhead is used to indicate
positive regulation when the position of the
binding site is known.
Horizontal bars indicates negative regulation
when the position of the binding site is
known. In cases where only the nature of
regulation is known, without binding site
information, + and – are used to indicate
positive and negative regulation.
The DBD families are indicated by circles of
different colours as given in the key. The
names of global regulators are in bold.
Babu, Teichmann, Nucl. Acid Res. 31, 1234 (2003)
12. Lecture WS 2006/07
Bioinformatics III 12
Modelling genetic networks by boolean networks
Methods to describe genetic networks:
(1) boolean networks (today)
(2) clustering gene expression data
( Bioinformatics II lecture)
Clustering is a relatively easy way
to extract useful information out of
large-scale gene expression data sets.
However, it typically only tells us
which genes are co-regulated,
not what is regulating what.
Need to reverse engineer networks from their activity profiles!
JCell manual, U Tübingen
12. Lecture WS 2006/07
Bioinformatics III 13
Intergenic interaction matrix M
Since the introduction detecting gene expression by microarrays,
a major problem has been the estimation of the intergenic interaction matrix M.
The matrix element mij of the interaction matrix M is
- positive if gene Gj activates gene Gi
- negative if gene Gj inhibits gene Gi
- equal to 0 if gene Gj and gene Gi have no interaction.
Gi = +1 if it is expressed, otherwise = 0.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 14
simulating the dynamics of regulatory networks
Given the interaction matrix M, the change of state xi of gene Gi between t and t +1
obeys a threshold rule:
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
btMxHtx
btxmHtx
i
nkikiki
1
or1,1
where H is the Heavyside function
H(y) = 1 if y 0 and
H(y) = 0 if y < 0,
and the bi‘s are threshold values.
In the case of small regulatory genetic systems, the knowledge of such a matrix M
makes it possible to know all possible stationary behaviors of the organisms having
the corresponding genome.
12. Lecture WS 2006/07
Bioinformatics III 15
Example
Mendoza, Alvarez-Buylla, JCB, 1998
In the genetic regulatory network which
rules Arabidopsis thaliana flower
morphogenesis (right), the interaction
matrix is a (11,11) matrix with only 22
non zero coefficient.
Below: A fixed configuration (attractor) of
its Boolean dynamics that is obtained
from propagating xi(t).
12. Lecture WS 2006/07
Bioinformatics III 16
Interaction matrix - interaction graph
For each genetic regulatory network, we can define an interaction graph built from
the interaction matrix M by drawing an edge + (resp. -) between the vertices
representing the genes j and i, iff mij > 0 (resp. < 0).
To calculate the mij´s, we can either determine the s-directional correlation ij(s)
between the state vector {xj(t – s)}t C of gene j at time t – s and the state vector
{xi(t)}t C of gene i at time t , t varying during the cell cycle C of length K = | C | and
corresponding to the observation time of the bio-array images:
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
21
22 1where
1
Ct Ctjjj
ij
Ct Ct Ctijij
ij
stxKstxs
ss
txstxKtxstxs
12. Lecture WS 2006/07
Bioinformatics III 17
interaction matrix
and then take
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
ijij
ijmsijij
mifm
mifsKsignm
0
,1,...,1
where is a de-correlation threshold.
Alternatively, one may identify the system with a Boolean neural network.
When it is impossible to obtain all the coefficients of M in this manner
(either from the literature or from such calculations),
it may be possible to complete M by appyling an heuristic approach.
12. Lecture WS 2006/07
Bioinformatics III 18
estimation of interaction values
We may randomly choose the missing coefficients by considering
- the connectivity coefficient K(M) = I / N, the ratio between the number I of
interactions and the number N of genes, and
- the mean inhibition weight I(M) = R / I , the ratio between the number of inhibitions
R and I.
For many known operons and regulation networks, K(M) is between 1.5 and 3, and
I(M) between 1/3 and 2/3.
If M is structurally stable, then the random estimation of M can be used to obtain an
approximate estimation on the control mechanisms of the regulatory network.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 19
Design principles of regulatory networks
Wiring diagrams of regulatory networks resemble somehow electrical circuits.
Try to break down networks into basic building blocks.
Search for „network motifs“ as patterns of interconnections that recur in many
different parts of a network at frequencies much higher than those found in
randomized networks.
Shen-Orr et al. Nature Gen. 31, 64 (2002)
Uri Alon
Weizman Institute
12. Lecture WS 2006/07
Bioinformatics III 20
Detection of motifs
Represent transcriptional network as a connectivity matrix M
such that Mij = 1 if operon j encodes a TF that transcriptionally regulates operon i
and Mij = 0 otherwise.
Scan all n × n submatrices of M generated by choosing n nodes that lie in a
connected graph, for n = 3 and n = 4.
Submatrices were enumerated efficiently by recursively searching for nonzero
elements.
Compute a P value for submatrices representing each type of connected subgraph
by comparing # of times they appear in real network vs. in random network.
For n = 3, the only significant motif is the feedforward loop.
For n = 4, only the overlapping regulation motif is significant.
SIMs and multi-input modules were identified by searching for identical rows of M.
Shen-Orr et al. Nature Gen. 31, 64 (2002)
12. Lecture WS 2006/07
Bioinformatics III 21
DOR detection
Consider all operons regulated by ≥ 2 TFs.
Define (nonmetric) distance measure between operons k and j, based on the # of
TFs regulating both operons:
d(k,j) = 1/ (1+n fnMk,n Mj,n)2)
Where fn = 0.5 for global TFs and fn = 1 otherwise.
Cluster operons with average-linkage algorithm.
DORs correspond to clusters with more than 10 connections
with a ratio of connections to TFs > 2.
Shen-Orr et al. Nature Gen. 31, 64 (2002)
12. Lecture WS 2006/07
Bioinformatics III 22
Network motifs found in E.coli transcript-regul network
a, Feedforward loop: a TF X regulates a second TF
Y, and both jointly regulate one or more operons
Z1...Zn.b, Example of a feedforward loop (L-arabinose utilization).
c, SIM motif: a single TF, X, regulates a set of
operons Z1...Zn. X is usually autoregulatory. All
regulations are of the same sign. No other
transcription factor regulates the operons.
d, Example of a SIM system (arginine biosynthesis).
e, DOR motif: a set of operons Z1...Zm are each
regulated by a combination of a set of input
transcription factors, X1...Xn. DOR-algorithm detects
dense regions of connections, with a high ratio of
connections to transcription factors. f, Example of a DOR (stationary phase response).
Shen-Orr et al. Nature Gen. 31, 64 (2002)
12. Lecture WS 2006/07
Bioinformatics III 23
Significance of motifs
Shen-Orr et al. Nature Gen. 31, 64 (2002)
12. Lecture WS 2006/07
Bioinformatics III 24
Regulatory network
Shen-Orr et al. Nature Gen. 31, 64 (2002)
Each TF appears only in a single subgraph except for
global TFs that can appear in several subgraphs.
12. Lecture WS 2006/07
Bioinformatics III 25
Most research on biological networks has been focused on static topological
properties, describing networks as collections of nodes and edges rather than as
dynamic structural entities.
Here this study focusses on the temporal aspects of networks, which allows us to
study the dynamics of protein complex assembly during the Saccharomyces
cerevisiae cell cycle.
The integrative approach combines protein-protein interactions with information on
the timing of the transcription of specific genes during the cell cycle, obtained from
DNA microarray time series shown before.
a quality-controlled set of 600
periodically expressed genes,
each assigned to the point in the
cell cycle where its expression peaks.
Analysis of complexome during cell cycle
Science 307, 724 (2005)
Ulrik Lichtenberg Peer Bork
12. Lecture WS 2006/07
Bioinformatics III 26
Temporal protein interaction network in yeast cell cycle
Cell cycle proteins that are part
of complexes or other physical
interactions are shown within
the circle.
For the dynamic proteins, the
time of peak expression is
shown by the node color;
static proteins are represented
as white nodes.
Outside the circle, the dynamic
proteins without interactions
are positioned and colored
according to their peak time.
Science 307, 724 (2005)
12. Lecture WS 2006/07
Bioinformatics III 27
Just-in-time synthesis vs. just-in-time-assembly
Transcription of cell cycle–regulated genes is generally thought to be turned on
when or just before their protein products are needed: often referred to as
just-in-time synthesis.
Contrary to the cell cycle in bacteria, however, just-in-time synthesis of entire
complexes is rarely observed in the network. The only large complex to be
synthesized in its entirety just in time is the nucleosome, all subunits of which are
expressed in S phase to produce nucleosomes during DNA replication.
Instead, the general design principle appears to be that only some subunits of
each complex are transcriptionally regulated in order to control the timing of
final assembly.
Science 307, 724 (2005)
12. Lecture WS 2006/07
Bioinformatics III 28
Integrate transcriptional regulatory information and gene-expression data for
multiple conditions in Saccharomyces cerevisae.
5 conditions cell cycle
sporulation
diauxic shift
DNA damage
stress response
Something spectacular at the end
Luscombe, Babu, … Teichmann, Gerstein, Nature 431, 308 (2004)
Sarah Teichmann Mark Gerstein
12. Lecture WS 2006/07
Bioinformatics III 29
SANDY: topological measures + network motifs
Luscombe et al. Nature 431, 308 (2004)
+ some post-analysis
12. Lecture WS 2006/07
Bioinformatics III 30
Dynamic representation of transript. regul. network
c, Standard statistics (global topological measures and local network motifs) describing network structures. These vary between endogenous and exogenous conditions; those that are high compared with other conditions are shaded. (Note, the graph for the static state displays only sections that are active in at least one condition, but the table provides statistics for the entire network including inactive regions.)
Luscombe, Babu, … Teichmann, Gerstein, Nature 431, 308 (2004)
a, Schematics and summary of properties for the endogenous and exogenous sub-networks.
b, Graphs of the static and condition-specific networks. Transcription factors and target genes are shown as nodes in the upper and lower sections of each graph respectively, and regulatory interactions are drawn as edges; they are coloured by the number of conditions in which they are active. Different conditions use distinct sections of the network.
12. Lecture WS 2006/07
Bioinformatics III 31
Luscombe et al. Nature 431, 308 (2004)
Interpretation
Half of the targets are uniquely expressed in only one condition; in contrast, most
TFs are used across multiple processes.
The active sub-networks maintain or rewire regulatory interactions, over half of
the active interactions are completely supplanted by new ones between conditions.
Only 66 interactions are retained across ≥ 4 conditions.
They are always „on“ and mostly regulate house-keeping functions.
The calculations divide the 5 condition-specific networks into 2 categories:
endogenous and exogenous.
Endogenous processes are multi-stage, operate with an internal transcriptional
program
Exogenous processes are binary events that react to external stimuli with a
rapid turnover of expressed genes.
12. Lecture WS 2006/07
Bioinformatics III 32
Figure 2 Newly derived 'follow-on' statistics for network structures. a, TF hub usage in different cellular conditions. The cluster diagram shades cells by the normalized number of genes targeted by TF hubs in each condition. One cluster represents permanent hubs and the others condition-specific transient hubs. Genes are labelled with four-letter names when they have an obvious functional role in the condition, and seven-letter open reading frame names when there is no obvious role. Of the latter, gene names are red and italicised when functions are poorly characterized. Starred hubs show extreme interchange index values, I = 1. b, Interaction interchange (I) of TF between conditions. A histogram of I for all active TFs shows a uni-modal distribution with two extremes. Pie charts show five example TFs with different proportions of interchanged interactions. We list the main functions of the distinct target genes regulated by each example transcription factor. Note how the TFs' regulatory functions change between conditions. c, Overlap in TF usage between conditions. Venn diagrams show the numbers of individual TFs (large intersection) and pair-wise TF combinations (small intersection) that overlap between the two endogenous conditions.
Luscombe et al. Nature 431, 308 (2004)
12. Lecture WS 2006/07
Bioinformatics III 33
Luscombe et al. Nature 431, 308 (2004)
Interpretation
Most hubs (78%) are transient = they are influential in one condition, but less
so in others.
Exogenous conditions have fewer transient hubs (different ).
„Transient hub“: capacity to change interactions between connections.
12. Lecture WS 2006/07
Bioinformatics III 34
a, The 70 TFs active in the cell cycle. The
diagram shades each cell by the normalized
number of genes targeted by each TF in a
phase. Five clusters represent phase-specific
TFs and one cluster is for ubiquitously active
TFs. Both hub and non-hub TFs are included.
b, Serial inter-regulation between phase-
specific TFs. Network diagrams show TFs
that are active in one phase regulate TFs in
subsequent phases. In the late phases, TFs
apparently regulate those in the next cycle.
c, Parallel inter-regulation between phase-
specific and ubiquitous TFs in a two-tiered
hierarchy. Serial and parallel inter-regulation
operate in tandem to drive the cell cycle while
balancing it with basic house-keeping
processes. Luscombe et al. Nature 431, 308 (2004)
TF inter-regulation during the cell cycle time-course
12. Lecture WS 2006/07
Bioinformatics III 35
Luscombe et al. Nature 431, 308 (2004)
Summary
Integrated analysis of transcriptional regulatory information and condition-specific
gene-expression data; post-analysis, e.g. - Identification of permanent and transient hubs- interchange index- overlap in TF usage across multiple conditions.
Large changes in underlying network architecture in response to diverse stimuli, TFs alter their interactions to varying degrees,
thereby rewiring the network some TFs serve as permanent hubs, most act transiently environmental responses facilitate fast signal propagation cell cycle and sporulation proceed via multiple stages
Many of these concepts may also apply to other biological networks.
12. Lecture WS 2006/07
Bioinformatics III 36
Luscombe et al. Nature 431, 308 (2004)
additional slides (not used)
12. Lecture WS 2006/07
Bioinformatics III 37
Structural organization of transcript-regul networks
Modules: observation that reg. Networks are highly interconnected, very few
modules can be entirely separated from the rest of the network.
Babu et al. Curr Opin Struct Biol. 14, 283 (2004)
12. Lecture WS 2006/07
Bioinformatics III 38
Evolution of the gene regulatory network
Larger genomes tend to have more TFs per gene.
Babu et al. Curr Opin Struct Biol. 14, 283 (2004)
12. Lecture WS 2006/07
Bioinformatics III 39
Cross-organism comparison
Many TF families are specific to
individual phylogenetic groups or
greatly expanded in some genomes.
Babu et al. Curr Opin Struct Biol. 14, 283 (2004)
In contrast to the high level of conservation of other regulatory and signalling
systems across the crown group eukaryotes,
some of the TF families are dramatically different in the various lineages.
12. Lecture WS 2006/07
Bioinformatics III 40
Regulatory interactions across organisms
Are regulatory interactions conserved among organisms? Apparently yes.
Orthologous TFs regulate orthologous target genes.
As expected, the conservation of genes and interaction is related to the
phylogenetic difference between organisms.
Above: Many interactions of (a) can be mapped to pathogenetic Pseudomonas
aeruginosa that is related to E.coli (b).
Very few interactions can be mapped from (a) to (c).
Babu et al. Curr Opin Struct Biol. 14, 283 (2004)
12. Lecture WS 2006/07
Bioinformatics III 41
Regulatory interactions across organisms
Observation: there is no bias towards conservation of network motifs.
Regulatory interactions in motifs are lost or retained at the same rate as the other
interactions in the network.
The transcriptional network appears to evolve in a step-wise manner, with loss
and gain of individual interactions probably playing a greater role than loss and
gain of whole motifs or modules.
Observation: TFs are less conserved than target genes, which suggests that
regulation of genes evolves faster than the genes themselves.
Babu et al. Curr Opin Struct Biol. 14, 283 (2004)
12. Lecture WS 2006/07
Bioinformatics III 42
Mathematical Aspects of the Inverse Problem
A network with two or more connected components, i.e. two or more sub-networks,
has as fixed configurations the combination (Cartesian product) of all fixed
configurations of each sub-network.
We say that the fixed configurations are factorizable.
Thus, the inverse problem consists of determining whether a fixed configurations
set is factorizable.
In this way, we can obtain some information on the connectivity of the network.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 43
Factorization
Given S {0,1}n and a permutation function : {1,...,n} {1,...,n},
we denote by (S), or simply S the set {s(1)s(2) ... s(n) : s1s2...sn S }.
A set S {0,1}n is said to be factorizable if there exist sets of vectors
S1 {0,1}j(1) and S2 {0,1}j(2) and , ..., Sk {0,1}j(3) and a permutation function
: {1, ..., n} {1,...,n} such that S can be written as S = (S1 S2 ... Sk) ,
where the symbol „“ is the cartesian product between sets.
If S is a factorizable set, then j(1) + j(2) + ... + j(k) = n.
The set defined by F = {S1,S2, ...,Sk} is called a factorization of S and each
Sj F a factor of S.
F is called a maximal factorization if every factor Sj F is not factorizable.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 44
Examples
i) S = {0100, 0111, 1000, 1011} = {01, 10} {00, 11}.
Here, the permutation function is the identity.
ii) S = {0010, 0111, 1000, 1101} =
({0100, 0111, 1000, 1011})(2,3) = ({01, 10} {00, 11})(2,3) ,
where (2,3) is the function which permutes the second and third coordinates.
Given the sets I {1, ..., n} and S (0,1)n, let PI(S) be the projection set defined by
PI(S) = {(sj(1),sj(2), ...,sj(I)): s S, j(k) I, k = 1, ..., | I |, and j(k) < j(l) for all k < l }.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 45
Proposition 2
Proposition 2
If a set S {0,1}n is factorizable, then the maximal factorization of S is unique.
ProofLet F = {S1,S2, ...,Sk} and G = {T1,T2, ...,Tk} be two distinct maximal factorizations of S.
S = (S1 S2 ... Sk)1 = (T1 T2 ... Tk)2
Hence, the permutation = (1)-1 ○ 2 is such that
S1 S2 ... Sk = (T1 T2 ... Tk)
Since F and G are maximal factorizations, there is a factor of F not included into G,
which is supposed to be S1 {0,1}q, q {1, ..., n}.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
Let T = T1 T2 ... Tm , so S1 = P{1,...,q} (T)
Hence, if we denote by I(k) {1, ..., n} the set of indices such that
PI(k)(T) = T, for every k = 1, ...,m and by J = { j {1,...,m}: I(j) {(1),...,(p)} }
then there exists a permutation function ‘ such that
jsjJ
TPTPS pjsIpjI
,...,1
where',...,1,...,111
Therefore, S1 is factorizable, a contradiction.
12. Lecture WS 2006/07
Bioinformatics III 46
Algorithm
Let : {0,1}n {0,1}n P({1,...,n}) be the function called the difference function
where P({1,...,n}) is the set of subsets of {1,...,n} and defined by
(x,y) = {i: xi yi}, where x,y {0,1}n.
Given S {0,1}n, the idea of the Factorization algorithm is first to construct a matrix
with all the values of (x,y) for every x,y S.
Next, for each row i of the matrix we construct a finite and undirected graph
Gi = (Vi,Ei), where the set of nodes Vi is equal to the set {1,...,n} and the set of arcs Ei
is determined by the values of each row of the matrix, according to the algorithm.
Finally, the connected components of the union of all graphs Gi determine the factors
of the maximal factorization of S.
In the case that S is not factorizable, the output of the algorithm will be a graph with a
unique connected component.
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 47
Algorithm
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 48
Theorem 3
Given a set S {0,1}n, if I = { I(1), I(2), ..., I(k) } is the output of the Factorization
algorithm with input S,
then F = { P(I)(S): I = 1, ..., k) is the maximal factorization of S
and the complexity of the algorithm is O(|S|3 + n2)
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
12. Lecture WS 2006/07
Bioinformatics III 49
Example 2
Let S = { x1 = 000, x2 = 001, x3 = 100, x4 = 010, x5 = 011, x6 = 110}.
The difference matrix is
and the partial graphs and the
output graph of the algorithm are:
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
The output is I(1) = {1,3} and I(2) = {2}.
the maximal factorization of S is given
by
S = (PI(1)(S) PI(2) (S))(2,3)
= ({00,01,10} {0,1})(2,3)
where (2,3) is the permutation of the
second and third coordinates.
12. Lecture WS 2006/07
Bioinformatics III 50
Example 3
Aracena & Demongeot, Acta Biotheoretica 52, 391 (2004)
The maximal factorization of S is given
by
S = (PI(1)(S) PI(2) (S))(4,5)
= ({0010,0001,1100) (000,111)}(4,5)
The following set of vectors corresponds to the observed fixed points of the A.thaliana
regulatory network, considering only genes whose activity is not constant.
Let S = { x1 = 0010000, x2 = 0011011, x3 = 0000100, x4 = 0001111, x5 = 1100000, x6 = 1101011}.
The difference matrix is
The graph G of the algorithm and the
connected components
I(1) = {1,2,3,5} and I(2) = {4,6,7} are: