Upload
buckminster-carroll
View
18
Download
1
Embed Size (px)
DESCRIPTION
Introduction to genetic network models. Roberto Serra Centro Ricerche Ambientali Montecatini [email protected] why networks basics of gene regulation generic properties self-organizing dynamical systems the Kauffman model continuous models topology of complex networks - PowerPoint PPT Presentation
Citation preview
Introduction to genetic Introduction to genetic network modelsnetwork models
Roberto Serra
Centro Ricerche Ambientali Montecatini
why networks basics of gene regulation generic properties self-organizing dynamical systems the Kauffman model continuous models topology of complex networks small world networks
linear cause-effect chainlinear cause-effect chain
unlimited growth
tree structure (no feedback)
feedback loops: modular feedback loops: modular circuitscircuits
web of interacting circuitsweb of interacting circuits
not only genesnot only genes
chemicals
proteinsgenes
from outside
synthesis
regulation
activation
catalysis
control pointscontrol points
DNA primary RNA transcript
mRNAmRNA-ribosome
transcriptionalcontrol
RNA processing
protein
mRNA transport mRNA degradation
translational control
protein activity control
cis- and trans-acting controlcis- and trans-acting control
control mechanismscontrol mechanisms
the most effective regulation acts at transcription level RNAp binds to a DNA region upstream of the coding
region, the promoter regulatory proteins can “recognize” certain sequences
and bind to them the interactions between proteins and segments of the
DNA chain are highly specific proteins recognize specificic sequences of bases
without the need for opening the DNA double helix in eucaryotes it is necessary the the DNA molecule be
unbound in order for the regulatory proteins to operate
product inhibitionproduct inhibition
bound RNAp
inactive repressor
repressor activated by tryptophan
catabolite induced activationcatabolite induced activation
inactive CAP
CAP activated by cAMP
collective regulatory collective regulatory mechanismsmechanisms
groups of genes may be activated or inactivated simoultaneously
sigma factors in bacteria transcription factors in eucaryotes
these mechanism introduce correlations among the expression patterns of different genes
certain kinds of packaging of genes in eucaryotes (e.g. heterochromatin) make genes in that region inaccessible to RNAp
modelling levelmodelling level
the choice of the modelling level is a crucial step while there are detailed models of the protein synthesis process, in
order to understand network properties it is advisable to use a simplified view of the synthesis
activation level of a given gene = concentration of the corresponding mRNA concentration of the corresponding protein
concentrations can be expressed either as continuous or as discrete variables
the latter when there are say a few molecules per cell
a boolean approximation may often be appropriately employed
our “standard” choice: activation = concentration of the corresponding protein activation = continuous or boolean
asking specific questionsasking specific questions
modelling specific control circuits which genes, chemicals etc. directly affect the
expression of my-gene? or which do affect it in an indirect way ? which are the control regions? which interactions are there among the control
molecules, which is the logic of the control? these are “classical” problems in biological research on
genetic control provide detailed, specific information about specific
circuits which serve as a guide to guess the general principles
of network “design”
a complementary approacha complementary approach
trying to understand the properties of large networks if we knew all the details, we could write down the exact
model of the overall network but this is impossible so far looking at general properties of “networks of the kind”
which is present in cells general properties means global structural features,
types of possible dynamical behaviours, etc. this analysis has very strong implications for the theory of biological
evolution
the search for generic properties may also provide hints for the analysis of specific circuits
which questions to ask which features to expect
generic properties of genetic generic properties of genetic networksnetworks
the strategy: analyze ensembles of networks
the ensemble is composed by networks which share some overall features (constraints)
nonconstrained features vary at random in the ensemble
characterize the statistical distribution
analyze the generic features
ensembles of networksensembles of networks
a technique from statistical physics example: the Hopfield model of boolean neural
networks stored patterns are “memorized” in a set of weights W wij weight connecting nodes i and j every set of stored patterns gives rise to a set of W values
to analyze the generic properties of these networks suppose that the stored patterns are random characterize the properties of W analyze the interesting features, like storing capacity, crosstalk among
patterns, etc.
ensembles of random ensembles of random networks (k=2)networks (k=2)
generic questionsgeneric questions
which kind of dynamic behaviour can we expect in a certain type of networks ?
fixed points, limit cycles, strange attractors ? islands of activation spreading through the network ?
how sensible are these asymptotic states to perturbations ?
either in inputs or in the network structure
what kind of topology shall we expect in genetic networks ?
how does the information flow from one point to the rest of the network ?
how far how fast
reduced descriptionreduced description
the activation of a gene depends upon proteins and chemicals
let us suppose that the synthesis of regulatory proteins is “fast” wrt to the time constants of
the regulatory processes regulatory proteins decay with a time constant which is fast wrt to the
time constants of the regulatory processes the concentrations of regulatory chemicals are constant
then we may express the activation at time t+t as a function of the activations at time t
only one kind of variable is sufficient !
this holds true under both interpretations of “activation” concentration of mRNA concentration of protein
the important point is the loss of memory within t
activations onlyactivations only
Kauffman modelKauffman modelKauffman modelKauffman model
a generic model, meant to capture the features of large webs of interconnected genes
genes’ activations are boolean (1 or 0)ir state at fixed time steps t, t+1, t+2 …
each gene activation at time t+1 is determined by the activation of a fixed set of input genes at time t
external chemicals are not explicitly taken into account
updating is synchronous
examples C’ = A and B C’ = A or B C’ = A xor B
def: canalyzing functions are those boolean functions where there is at least one value of one of the inputs which uniquely determines the output
irrespective of the others
examples canalyzing or, and
examples noncanalyzing xor, parity
BA
C
C(t+1) depends uponA(t) and B(t)
the Kauffman model is a the Kauffman model is a dynamical systemdynamical system
the Kauffman model is a the Kauffman model is a dynamical systemdynamical system
at time 0, an activation value is given to each gene at each time step t=1, 2 ..., each gene takes an activation value x i(t)
determined according to the previous laws
the global state of the system X = [x1, x2 ... xN] is the ordered set of activation values
X(t) determines X(t+1)
as time passes the system moves from state X(t) to X(t+1), X(t+2), etc, following a trajectory in a N-dimensional state space
allowed states are located on the corners of the unit hypercube
the state spacethe state spacethe state spacethe state space
101
100
001
111
110
000
011
010
x
z
y
definitionsdefinitionsdefinitionsdefinitions
attractor a set of states which is either approached in the limit t-> , or is reached in a finite time and no longer abandoned by a dynamical system
random boolean networks with a finite number of nodes have a finite number of states, so the attractor is reached in finite time
attractors may be fixed points, cycles, or strange attractors
(not allowed in finite boolean systems)
the set of initial conditions which evolve towards a given attractor is its basin of attraction
attractors determine the key features of dynamical systems
after transients have died out qualitative analysis of dynamical systems concentrates on attractors and
their basins, the so called “phase portrait”
basin of attractionbasin of attractionbasin of attractionbasin of attraction
asymptotic dynamics of RBNasymptotic dynamics of RBNasymptotic dynamics of RBNasymptotic dynamics of RBN
the state transition rule is such that X(t) determines X(t+1)
since the system has 2N different states, it comes back to a previous state after a “Poincarè time” < 2N time steps
therefore, after a transient < 2N time steps , the system enters a cycle
all the system attractors are cycles; a particular case is that of fixed points, i.e. cycles of length = 1
ensemble propertiesensemble properties
there are N genes
each node is influenced directly by k other genes
as we are looking for generic properties, for each node, the k input genes are chosen at random
for each node, the boolean function is chosen at random among the set of 2^(2k) possible functions (or among a subset)
input output
0000 1 0 0
0001 0 0 1
0010 1 0 1
0011 1 1 0
0100 0 1 0
0101 0 1 0
0110 0 1 0
0111 1 0 1
1000 1 1 0
1001 0 0 1
1010 1 1 1
1011 1 0 1
1100 0 0 0
1101 0 0 1
1110 0 1 0
1111 1 1 0
studying the ensemble of studying the ensemble of networksnetworks
studying the ensemble of studying the ensemble of networksnetworks
each network has its own dynamics dynamical analysis relies upon extensive simulations, starting form
random initial conditions
dynamical analysis is performed by varying connections and rules
the main features of the model (qualitative analysis), attractors and basins, are ruled by the degree of connectivity k
high connectivityhigh connectivityhigh connectivityhigh connectivity
if k=N-1, the state at time t+1 is completely uncorrelated to the state at time t
the input to each node is the vector of values of all the other nodes the output associated to each input set is random therefore there is no correlation between outputs corresponding to two
inputs which differ even by a single bit
there are relatively few cycles wrt to the total number of states
cycles are long (their period grows as 2bN) systems are fragile with respect to small changes in
initial conditions nearby initial states go to different attractors the boundaries of the basins of attraction are highly irregular
analogous to “chaotic behaviour” in continuous dynamical systems
fragility (sensitive fragility (sensitive dependence on initial dependence on initial
conditions)conditions)
fragility (sensitive fragility (sensitive dependence on initial dependence on initial
conditions)conditions) initial state 111111 -> cycle A initial state 111110 -> cycle B almost always, B#A
low connectivitylow connectivitylow connectivitylow connectivity
if k= 2, cycle number scales as N1/2
cycle length grows as N1/2
basins are regular: systems starting from two nearby intial states usually evolve to the same attractor
the behaviour is much more regular and ordered than in the k=N-1 case
a phase transition accurs at some k value
regular basinsregular basinsregular basinsregular basins
connected clusters (high k, connected clusters (high k, interaction with neighbours)interaction with neighbours)connected clusters (high k, connected clusters (high k,
interaction with neighbours)interaction with neighbours)
oscillating genes
constant genes
connected clusters (low k, connected clusters (low k, interaction with neighbours)interaction with neighbours)
connected clusters (low k, connected clusters (low k, interaction with neighbours)interaction with neighbours)
oscillating genes
constant genes
phase transitionphase transition
the network display a phase transition
by lowering the value of k, the transition takes place when the cluster of non oscillating genes percolates through the network
the boundary between ordered and disordered regimes can be found at different k values, if the set of boolean functions is restricted somehow
e.g. by limiting to canalyzing functions, i.e. those where at least one of the inputs has one values which forces the variable to take a specific value
order for freeorder for freeorder for freeorder for free
scaling laws in the self-organized regime number of cycles ~Nb (1/2<b<1) length of cycles ~Nb
the model is consistent with experimental observations over many different phyla
number of cellular types <-> number of different cycles cell life <-> length of cycles
selection builds upon the network self-organizing properties
the selective advantages of “the edge of chaos”?
warningwarning
the Kauffman model is a highly idealized representation of real genetic / metabolic nets which is based upon several approximations
no chemicals
proteins are fast wrt to the time step
synchronous activation may introduce “spurious cycles” in boolean dynamical systems (cfr. Hopfield nets)
fully random topology, constant k
butbut
the Kauffman model allows us to address issues which would otherwise be missed, and to develop an appropriate language in which we can frame some key questions
the very existence of self-organizing dynamics in nonlinear genetic networks
the importance of attractors in determining the properties of gene nets robustness and basins of attraction the importance of the average degree of connectivity
it also allows us to examine in a new way the interplay between selection and self-organization
the importance of studying ensembles of networks to gain information about their generic properties
continuous or booleancontinuous or boolean
the intermediate values of gene expression may be due to
intermediate values of the concentration of stimulating factors
time-dependent phenomena (transients, cycles)
the boolean approximation allows one to better elucidate the logic of control, but must be exercised with care
the boolean dynamics may be different from the continuous one
gene activation vs. gene activation vs. concentration of activatorconcentration of activator
linear
00.20.40.60.8
11.21.4
0 0.5 1 1.5 2 2.5 3
sigmoid
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3
clipped
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
boolean
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
constant activation inputconstant activation input
t<0: A=0
continuous t>0: dA/dt = s - kA A(t) = s/k(1-e-kt)
boolean: A=0, t<(ln2)/k A=1, t>(ln2)/k
attivazione
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4
t
A
modellocontinuo
modellobooleano
generalizing Kauffmangeneralizing Kauffman
it would then be desirable to have a model where activations can take continuous values the “logic of control” is explicit and flexible as in Kauffman
there is an embarras de richesse in model development
we require that the models are true generalizations of the Kauffman RBN
they lead to the same dynamics if the initial activations are boolean
continuous model (Serra & continuous model (Serra & Villani)Villani)
let t be larger than the time required for protein synthesis and degradation (as in Kauffman)
ai = activation of gene i (normalized to [0,1]) i.e. concentration of the corresponding product
a = [a1, a2 .. aN] ai(t+1) = [contri(a(t))] where x: (x) 0 x 0: (x) = 0 x>0, >0: (x+) (x) for simplicity: limx->+ (x) = 1
for the time being, chemicals are not explicitly considered, as in Kauffman
filtering functionsfiltering functions
filters
0
0.20.4
0.60.8
11.2
0 1 2
contr
activ
atio
n
logistic filter
piecewiselinear
summing over the pathssumming over the paths
let us focus upon the interactions among genes mediated of course by their synthesis products
i.e. consider ai(t+1) = (contri(a(t))
the “digital logic” of the genetic switch must be translated into a continuous rule
the transition rule for the activation must take into account contributions from all the combinations of [0,1] values of its inputs
for example, if the rule is an OR, it may receive positive contributions from the combinations of input values (11), (10), (01) (which tend to turn it on) and negative from (00) (which tends to turn it off
evolution lawsevolution lawsevolution lawsevolution laws
a “set of input values” (input set) to gene i, Yi = {yi1, y12} is defined as a given combination of boolean values of its input genes (in our case, 11, 10, 01 or 00)
generalization to K inputs is trivial
the truth table assigns a boolean function (the activation of the gene at the next time step) to each input set
we must define a weight for each input set, and a rule to combine the weights of the different input sets
Q1i set of the input paths to gene “i” which correspond to an updated value “1”
Q0i set of the input paths to gene “i” which correspond to an updated value “0”
weighting an input set weighting an input set (model A)(model A)
weighting an input set weighting an input set (model A)(model A)
the weight should be computed from the activations of the two input genes, i.e. from a1 and a2
the contribution of the input set (11) may be estimated to be limited by the gene with the smallest activation
(11) = min(a1, a2) the contribution of the input set (00) may be estimated
to be limited by the gene with the highest activation (00) = max(a1,a2) = min(1-a1, 1-a2) the contribution of (10) and (01) are (10) = min(a1, 1-a2) (01) = min(1-a1, a2)
the equations of model Athe equations of model Athe equations of model Athe equations of model A
if yij=1, ’(yij) = aj
if yij=0, ’(yij) = 1-aj
the contribution of the whole input set is (Yi) = min{’(yij) } the contribution to the activation at time t+1 is the
weighted sum of those contributions which turn the gene on minus those which turn it off
iiii QY
iQY
ii tYtYtacontr01
))(())(())((
dynamical propertiesdynamical propertiesdynamical propertiesdynamical properties
let us start from a set of initial activations which belong all to {0,1}
for every gene, there is one input set which has contribution = 1, precisely the one which corresponds to the “right” 0’s and 1’s
all the other input sets provide a vanishing contribution (as there is at least one “1” corresponding to a “0” real value, or a “0” corresponding to a 1, which give ’(yij)=0
if the output corresponding to the only nonvanishing contribution is 1, then the next state is 1, otherwise it is 0
therefore the system always remains on the corners of the unit hypercube
and the rule for determining ai(t+1) is the same as that of the original Kauffman model
the model therefore represents a true generalization of the Kauffman model
towards the corners of the towards the corners of the unit hypercubeunit hypercube
it can be observed that starting from a set of intermediate values the system tends to reach the corners of the hypercube
at least in systems with few inputs per node if the sigma function is piecewise linear, it exactly reaches the corners if it is a logistic, it approaches the corners (provided that (1) 1)
it then behaves much like the Kauffman boolean model the reason can be understood by observing that
in some nodes there is an imbalance between the numbers of input pathways which turn the gene on or off
these systems tend to reach their extreme values and to drive also the remaining genes to boolean extremes
the dynamics is therefore similar to that of random boolean networks
model Bmodel Bmodel Bmodel B
the proposal here is that of taking into account all the contributions to a path, and to consider only those which switch the gene on
if yij=1, ’(yij) = aj
if yij=0, ’(yij) = 1-aj
the contribution of the whole input set is (Yi) ={’ij} the contribution to the activation at time t+1 is the sum
of those contributions which turn the gene on
contr a t Y tiY Q ii
( ( )) ( ( ))
1
the features of model Bthe features of model B
model B would describe the properties of an ensemble of Kauffman (i.e. boolean) cells which
all have the same topology and the same boolean functions fro each node
evolve independently from each other starting from different initial conditions
if the different activations were independent which is not the case, due to the non ergodic evolution
of the system starting from a set of initial activations which belong all
to {0,1}, the system always remains on the corners of the unit hypercube, and the rule for determining ai(t+1) is the same as that of the Kauffman model
the model therefore represents a true generalization of the Kauffman model
the behaviour of model Bthe behaviour of model Bthe behaviour of model Bthe behaviour of model B
starting from random initial conditions the system can approach the corners of the unit hypercube evolve towards fixed points with nodes taking
intermediate values evolve towards cycles with nodes taking intermediate
values which usually have also a non oscillating part
therefore the continuous dynamics may differ from that of the original Kauffman model
yet features of self-organization are evident also in this case:
few attractors per network short cycle length
model improvementmodel improvement
different kinds of model, either boolean or continuous, display features of dynamical self-organization
it is important to explicitly take into account also the action of chemicals
morevoer, in order to describe processes as e.g. biodegradation of organic compounds, tumor growth, etc., it is necessary to take into account the process of cell proliferation
still in search of the generic properties number and characteristics of attractors scaling with network size influence of key parameters robustness of results vs. model changes (and not only vs. parameter
changes)
continuous model, general continuous model, general equations (time discrete)equations (time discrete)
let t be larger than the time required for protein synthesis and degradation (as in Kauffman)
ai = activation of gene i (normalized to [0,1]) i.e. concentration of the corresponding product
a = [a1, a2 .. aN] c = [c1, c2 .. cL] = external chemicals ai(t+1) = iai(t) + fi(a(t),c(t)) cm(t+1) = L{mcm(t) + gm(a(t),c(t)) + m(t)}
where L(x)=0 of x0, L(x)=x if x>0 m(t) = external flow
the consumption of a given chemical depends upon which genes are active
connectionsconnectionsconnectionsconnections
the equation for a(t) ai(t+1) = iai(t) + fi(a(t),c(t)) if dt is “long” i=0 the activation depends upon the chemicals as well as
upon the activation of other genes fi(a(t),c(t)) = [contri(a(t)) + i(a(t),c(t))] where x: (x) 0 x 0: (x) = 0 x>0, >0: (x+) (x) for simplicity: limx->+ (x) = 1 contri(a(t)) depends upon the activation of the other
genes
exampleexample
a constitutive gene is constantly expressed (activation a); there are N cells in a chemostat with constant flow rate
c(t+1) = L{c(t) - WaN + in + c} Pseudomonas stutzeri which degrades o-xylene
two operons, one for X->F->K, the other for F->K both controlled by the same promoter, activated by phenol both always expressed at a limited extent ignore differences in synthesis speed within a single operon
aT(t+1) = aT0+T(uTFcF) aP(t+1) = aP0+T(uPFcF) cX(t+1) = L{cX(t) - WXTN(t)aT(t)cX(t) + in -cX(t)} cF(t+1) = L{cF(t) + WFXN(t)aT(t)cX(t) - WFTN(t)aT(t)cF(t) -
WFPN(t)aP(t)cF(t) -cF(t)} + equations for N(t)
““energy”energy”““energy”energy”
to describe cell proliferation, only some genes are explicitly considered (the “green region”), while the effects of the cell’s standard genes (the “grey region”) are collectively described by a single variable
“energy” (i.e. excess resources) rules the reproduction rate, that is the cutoff on the activation values
energy decreases if there are no chemicals, increases due to gene-chemical interactions
let be the average “energy” per cell cell number decreases if energy is below its
“maintenance value” maint = (1-)/ allows a steady population (in the no flow
case)
the equation setthe equation set
if there is one chemical c which activates gene 1 whose product catalyzes a reaction whereby c is degraded
a1(t)=(contr1(a(t)) + uc(t)) ak(t)=(contrk(a(t))), k # 1 c(t+1) = L{c(t)-wN(t)a1(t)c(t) + in - c(t)} (t+1) = f(t) + Ea1(t)c(t) N(t+1) = L{N(t) + ( (t)N(t)) - N(t)}
lead to well known bacterial growth equations if energy is adiabatically eliminated
simulationssimulationssimulationssimulations
C
-
Energy
++
+
the behaviour of model Athe behaviour of model Athe behaviour of model Athe behaviour of model A
the system tends to reach the corners of the unit hypercube
if u is such that the first gene is always active, N, c and tend to reach constant values;
activations oscillate as in the Kauffman model if u is smaller, oscillations in c, N and are observed the cycles are slightly longer than in Kauffman original
work different attractors are observed in these networks
an example (chemostat)an example (chemostat)an example (chemostat)an example (chemostat)
network Bc1 (random boolean laws) has three attractors a fixed point (N=170, A=15), with a basin of attraction
which covers 3% of the initial conditions tested a limit cycle (with N=constant, A cycles with period 4)
with a very small basin a limit cycle where N and A oscillate with period 16
(317<N<318, 9<A<17), which attracts 96% of the initial conditions - all the nodes oscillate
conclusionsconclusions
continuous activation values chemicals the growth of cell population
therefore allowing to model bacterial degradeation of organic compounds, tumor growth, etc.
continuous model provides results which are, the case of model A, very similar to those of the Kauffman model
as far as the gene-gene interactions are concerned
also different models display features of dynamical self-organization
the model allows to consider a more complicated set of interactions, preserving self-organization features similar to those of Kauffman
the topology of real networksthe topology of real networks
in our search for generic properties of genetic/metabolic networks we have so far assumed random connections
more precisely, in Kauffman models the number of connections per node, k, is fixed, the wiring is random with uniform probability distribution
an obvious generalization is that of allowing that also k may differ in different nodes
the theoretical model which better describes this topology is the random graph
random graphs (Erdos-random graphs (Erdos-Renyi)Renyi)
N labelled nodes undirected links the probability pij that node i and node j are connected
is equal for all (i,j)
pij=p binomial distribution of the number of links per node if p<<1, this gives rise to an approximate Poisson
probability distibution for the number of connections per node, k
p(k) = qke-q/k! with <k>=q=pN, sk= q = (pN)
let us compare families of graphs with different <k>=pN in the case p N-1
if <k> <1, the graph is composed by isolated clusters almost all clusters are trees of clusters with exactly one cycle almost all nodes belong to trees
if <k> >1 a giant cluster appears a finite fraction of the nodes belongs to the giant cluster as <k> increases the small clusters coalesce into the giant one
so when p=pc=1/N the topology changes abruptly the giant component percolates through the graph
path lengthpath length
path length = 1
path length = 2
path length = 3
aggregate variables: path aggregate variables: path lengthlength
let the length of a given path between nodes a and b be the number of links along that path
define the distance between two nodes a and b, Lab , as the length of the shortest path between nodes a and b
let L be the average of Lab taken over all pairs of nodes L = < Lab >ab
a property of the network L is called “characteristic path length” the maximum value of Lab is sometimescalled the
diameter D of the network D = max(a,b) Lab
clusteringclustering
degree 3; connectionsamong neighbours 1
degree 3; connectionsamong neighbours 3
aggregate variables: aggregate variables: clustering coefficientclustering coefficient
consider first a given node, v, with kv connections let n(v) be the number of links which exist between the
nodes which are directly connected to node v the maximum number of links is nmax(v)= kv(kv-1)/2 let C(v) = n(v)/nmax(v)
C(v) is the clustering coefficient of node v it measures how likely it is that the neighbours of v are also connected
let C be the average of C(v) C = <C(v)>v
C is called the clustering coefficient of the graph it measures the average connectedness of the graph
results for random graphsresults for random graphs
L ln(N)/ln(<k>) random graphs have short characteristic path lengths,
which scale with ln(N)
C p = <k>/N random graphs have very small connectedness (if p =
<k>/N <<1) if <k> is held constant, the clustering coefficient ->0 as the network
increases
two nodes chosen at random are linked by a short path no obvious structure appears
comparison with regular comparison with regular latticeslattices
consider a regular ring with connections to the K neighrest neighbours
the characteristic path length grows linearly with N L N/2K
for a D-dimensional regular lattice, L N1/D
the clustering coefficient is C=3(K-2)/[4(K-1)] C -> 3/4 in the limit of large K two nodes chosen at random are connected by a long
path the regular structure induces a high clustering
strange properties of real strange properties of real networks : small worldsnetworks : small worlds
many real networks display the small world phenomenon, i.e. they
are sparse: k<<N the number of connections per node is much smaller than the number of
nodes
have high clustering C >> Crandom
have short characteristic path length L Lrandom
so they combine high clustering with short paths neither random nor regular
models of small world models of small world networks: Watts & Strogatznetworks: Watts & Strogatz
start from a regular graph e.g. a ring with connections to k neirest neighbours
each link is rewired with probability p (one node is held fixed, the other is changed)
double links between the same two vertices are forbidden
if p->0 then L N/2k , C 3/4
long pathways, high clustering
for a broad range of nonzero p values L(p) << L(0), C(p) C(0) >> Crandom
short paths, high clustering
if p->1, random graph L lnN/lnk , C k/N
WS modelWS model
the WS model interpolates between regular and random graphs
the degree distribution p(k) is similar to the Poisson distribution of a random graph
almost all nodes have similar connectedness
the WS model shows that the introduction of some long range interactions (“shortcuts”) allows the shortening of the characteristic path lengths
so the small world phenomenon can take place in exponentially distributed networks - where all the nodes have a similar degree - with some long range connections
strange properties of real strange properties of real networks: scale-freedomnetworks: scale-freedom
a further feature displayed by several real world nets is that, on a wide range, the distribution of node connectivities p(k) follows approximately a power law
p(k) k-g
which has a major consequence, i.e. that the probability of finding some highly connected nodes is significant
these “hub” nodes may influence the network properties profoundly
power law distributions are termed scale-free as there is no clear cutoff beyond whivh they become vanishingly small (as e.g. in exponential functions)
exponential and power law distribution (lin-lin)
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6
x
p(x
) exponential
power law
fat tails (lin-lin)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6
x
p(x
) exponential
power law
exponential vs. power law distribution (log-log)
-10
-8
-6
-4
-2
0
2
0 1 2
ln[x]ln
[p(x
)] exponential
power law
hubshubs
how do networks come into how do networks come into existence ?existence ?
random graphs and WS are both based on a fixed number of nodes and on drawing or rewiring connections
many real networks grow in time by addition of new nodes and links: Internet, genetic nets, telephone communications, etc.
moreover, in RG and WS the probability that a link is drawn to a node is independent from the number of existing links
in growing networks the probability that a new node is connected to an existing one may depend upon the connectivity of the latter (e.g. webpages, citations)
connectivity is a proxy for “importance”
models of scale-free models of scale-free networks: Barabasinetworks: Barabasi
start with a limited number m0 of nodes
at each step t add a new node and introduce m ( m0) edges which link the new node to m existing nodes
the probability that the new node is connected to the existing node i depends upon ki(t),
(ki) = ki/(j kj)
after T steps the network is composed by N=T+m0 T nodes and mT edges
initial growthinitial growth
preferential attachmentpreferential attachment
properties of scale-free netsproperties of scale-free nets
simulations show that the probability distribution is scale free:
p(k) k-g, (g 3 independent of m) L lnN
short characteristic path length
the clustering coefficient is higher than in the random graph case, as the process introduces correlations among the node’s degrees
the clustering coefficient descreases as N increases, in contrast to WS
the SF networks display a different kind of small world phenomenon, their properties are influenced by the presence of a few hub nodes with a high degree
models and realitymodels and reality
Watts-Strogatz and scale-free networks are two theoretical small-world models which can be useful to interpret real networks of the “small world” type
the property derives either from shortcuts in WS from hubs in SF
real world networks may present either behaviour deciding which model better approximates a specific network is a matter
of empirical testing
metabolic network of metabolic network of escherichia coliescherichia coli
Wagner and Fell: aerobic growth on a minimal medium with glucose as sole carbon source
reactions concerning central routes of energy metabolism and synthesis of small molecules
glycolisis, pentose phosphate pathway, glycogen metabolism, TCA cycle, oxydative phosphorylation, amino acid and polyammine biosynthesis, nucleotide and nucleoside biosynthesis, glycerol 3-phosphate and membrane lipids, riboflavin, Co-A, NADP and others
287 substrates, 317 reactions substrate graph: nodes represent substrates, and there
is a link between two nodes if there is a reaction to which both substrates participate
Jeong et al: metabolic network analysis of 43 organisms (6 archea, 35 bacteria, 5 eucaryotes)
biological implicationsbiological implications
short diameters means that information about a “perturbation” (i.e. removal of a substrate or of a reaction) can rapidly propagate through the network
some properties of scale free networks are highly robust; for example, random removal of substrates does not appreciably alter the characteristic path length
however, scale free networks are vulnerable to removal of hubs