Upload
frank-bates
View
215
Download
0
Embed Size (px)
Citation preview
Complex networks in naturePHYSBIO 2007
Imre DerényiDept. of Biological Physics, Eötvös University, Budapest
Complex systemsare often made of
many non-identical elements connected by diverse interactions.
networks
graphs
Outline
Lectures 1-3:Graph theoretical basics, examples of real networks, basic models (Erdős-Rényi, small world, scale free graphs) and their properties, examples.
Lecture 4:Dynamics on networks: error and attack tolerance, disease spreading, metabolic networks.
Lecture 5:Network motifs and communities.
Graph theory basics
A graph, usually denoted as G(V,E), consists of a set of vertices (or nodes) V together with a set of edges (or links) E. Every edge connects its two endvertices. The order of a graph (denoted by N) is the number of its vertices.
A graph is a simple graph if it has no multiple edges or loops.If not stated otherwise, a graph is usually assumed to be simple.
Two vertices are adjacent (or neighbors of each other) if there is an edge connecting them.
Every graph can be represented by its adjacency matrix A, which is an NN symmetric binary matrix with elements Aij = Aji = 1 if vertex i is adjacent to vertex j and Aij = Aji = 0 otherwise.
0010100
0010000
1100100
0000101
1011010
0000101
0001010
A The degree ki of vertex i is the number of its neighbors (or edges):
N
jji
N
jiji AAk
11
The sum of the degrees of all the vertices is twice the number M of the edges of the graph:
N
jiij
N
ii AkM
1,1
2
A sequence of adjacent vertices is a walk.A walk is closed if its first and last vertices are the same, and open if they are different.
A walk in which no edge occurs more than once is known as a trail.A closed trail is called tour or circuit.
A walk in which no vertex occurs more than once is known as a path.A cycle can be defined as a closed path.
Two vertices are reachable from each other, if there exists a path between them.
A graph is connected, if any of its vertices can be reached from any other.
A path or cycle is Hamiltonian if it uses all vertices exactly once.
A trail or circuit is Eulerian if it uses all edges precisely once.
A component of a graph is defined as a maximal connected subgraph.
A subgraph of a graph G is a graph whose vertices and edges are subsets of those of G.
A subgraph of G is a spanning subgraph, or factor, if it contains all the vertices of G.
k-cliques are complete subgraphs of order (size) k.
Cliques are maximal complete subgraphs.
A tree is an acyclic connected graph.It has N-1 edges.
The distance d(i, j) between two (not necessary distinct) vertices i and j is the length of a shortest path between them.
The length l of a walk is the number of edges that it uses.
The eccentricity ε(i) of a vertex i is its maximum distance from any other vertex:
The diameter D of a graph is its maximum eccentricity:
The characteristic path length (sometimes also called diameter) is defined as:
),(max)( jidij
),(max)(max,
jidiDjii
),(maxmin)(min jidiR
jii
ji
jidNN
L ),(2/)1(
1
The radius R of a graph is its minimum eccentricity:
Extensions
If weight or cost is assigned to each edge, then we get a weighted graph.In the calculation of lengths the weights are taken into account.
In a hypergraph more than two vertices can be connected by hyperedges.
If the edges are directed, then we have a directed graph or digraph.In-neighbors and out-neighbors, and in-degrees and out-degrees can be distinguished.
Random graphs
Graph theory was invented by Euler in the 18th century.The early work was concentrated on small graphs with a high degree of regularity
Random-graph theory was introduced by Erdős and Rényi in the late 1950s.As complex networks often appear to be random, random-graph theory appears to be a useful tool in the study of large complex networks.
The Erdős-Rényi model
Pál ErdősPál Erdős (1913-1996)
Original model:Connect N nodes by M edges randomly.
Alternative model:Connect every pair of the N nodes with probability p.
The two models (or ensembles) become equivalent in the thermodynamic limit
2/)1(for
NN
MpN
p=1/6
The average degree of a node is
pNNpN
Mk )1(
2
19:58
The Erdős-Rényi model
Degree distribution:
The characteristic path length can be estimated from
Poisson distribution
kNkk pp
k
NP
1)1(
1
k
k
k
k e!
NkL
k
NL
log
logresulting in
The greatest discovery of Erdős and Rényi was that many network properties appear suddenly as p is increased.
As an example let us consider the occurrence of an arbitrary subgraph consisting of n vertices and m edges.
Their number can be estimated as:a
pNp
a
n
n
N mnm
!
Thus the critical probability of appearance is: mncNpp /c
A giant (percolating) component also appears suddenly.
This can easily be understood with the help of a branching process:1. Let us start to grow a component from a seed vertex by
randomly selecting its neighbors from the remaining N-1 vertices with probability p.
2. Let us repeat this process with the newly selected vertices as seeds, over and over again.
3. The branching process stops when no new neighbor is selected.
If p < pc = 1/N then the expected number of new neighbors is smaller than the number of seeds, and the branching process quickly comes to a halt.
If , on the other hand, p > pc = 1/N then the component can easily grow to infinity.
k
The giant component has a tree-like structure.
Are complex networks really random?
No!One big difference is that nodes are often clustered, i.e., neighbors of a node tend to be connected to each other.
Clustering coefficient:2/)1(
of neighbors ebetween th links of #
iii kk
iC
Small worlds:Networks are clustered,
[C >> Crand = p]but have a small
characteristic path length L.
Network C Crand L N
WWW 0.1078 0.00023 3.1 153127
Internet 0.18-0.3 0.001 3.7-3.763015-6209
Actor 0.79 0.00027 3.65 225226
Coauthorship 0.43 0.00018 5.9 52909
Metabolic 0.32 0.026 2.9 282
Foodweb 0.22 0.06 2.43 134
C. elegance 0.28 0.05 2.65 282
Probability that the neighbors are connected
Watts-Strogatz model
[Watts and Strogatz, Nature 393, 440 (1998)]
Watts-Strogatz modeln nodes per block:
)log(
)/log()(
pn
nNnnL
0d
d
n
L
0
)log(
)/log(
)log(
1
)log(
)/log(2
pn
nN
pnpn
nN
0)log(
)/log(1)/log(
pn
nNnN
1)log( pn pn /1
p
NpL
)log( if pN /1
Optimal n:
World Wide Web
800 million documents (S. Lawrence, 1999)
ROBOT: collects all URL’s found in a document and follows them recursively
Nodes: WWW documents Links: URL links
R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999)
P(k=500) ~ 10-99
N(k=500)~10-90
What can we expect for ER and WS networks?
The results: Scale-free networkout= 2.45
in = 2.1
out~)(outkkP in~)(in
kkP
P(k=500) ~ 10-6
N(k=500)~103
k ~ 6
NWWW ~ 109
INTERNET BACKBONE
(Faloutsos, Faloutsos and Faloutsos, 1999)
Nodes: computers, routers Links: physical lines
ACTOR CONNECTIVITIES
Nodes: actors Links: cast jointly
N = 212,250 actors k = 28.78
P(k) ~k-
=2.3
SCIENCE CITATION INDEX
( = 3)
Nodes: papers Links: citations
(S. Redner, 1998)
P(k) ~k-
1736 PRL papers (1988)
Nodes: scientist (authors) Links: joint publication
(Newman, 2000, Barabasi et al 2001)
SCIENCE COAUTHORSHIP
M: mathNS: neuroscience
Nodes: online user Links: email contact
Ebel, Mielsch, Bornholdt, PRE 2002.
Online communities
Kiel University log files 112 days, N=59,912 nodes
Food Web
Nodes: trophic species Links: trophic interactions
R.J. Williams, N.D. Martinez , Nature (2000)R. Sole (cond-mat/0011195)
Sex-web
Nodes: people (Females; Males)Links: sexual relationships
Liljeros et al. Nature 2001
4781 Swedes; 18-74; 59% response rate.
Most real world networks have the same internal structure:
Scale-free networks
Why?
What does it mean?
SCALE-FREE NETWORKS
(1) The number of nodes (N ) is NOT fixed. Networks continuously expand
by the addition of new nodes
Examples: WWW : addition of new documents Citation : publication of new papers
(2) The attachment is NOT uniform.A node is linked with higher probability to a
node that already has a large number of links.
Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again
Origins SF
Scale-free model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the degree ki of that node
A.-L. Barabási, R. Albert, Science 286, 509 (1999)
jj
ii k
kk
)(
P(k) ~k-3
Mean Field Theory
t
k
mt
mk
k
km
t
k ii
j j
ii
22d
d
ii t
tmtk )(
, with the initial condition: mtk ii )(
/1/1
1)()(ˆ
k
mt
k
mtktkkP ii
)/11(/11
/1
~1
d
ˆd)(
kk
m
k
PkP
A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
2
1
31
1
Growth without preferential attachment
t
mm
t
k
j
i 1
1
d
d
ii t
tmtk ln)( "0"
-k/m-k/mii ttktkkP e1e)()(ˆ
mk
mk
PkP /e
1
d
ˆd)( "
11"
Preferential Attachment
Citation network
Internet
t
kk
t
k ii
i
~)( For given t, k (k)
(Jeong, Neda, A.-L. B, cond-mat/0104131)
k
ki
i
kk0
)()(
exponent is not universal
Extended Model
• prob. p : internal links• prob. q : link deletion• prob. 1-p-q : add node
WWW(in)
Internet ActorCitation
indexSexWeb
Cellularnetwork
Phone callnetwork
linguistics
= 2.1 = 2. 5 = 2.3 = 3 = 3.5 = 2.1 = 2.1 = 2.8
2 if d )(minmin
1
kkk
kkkkPk
3 if d )(minmin
222
kkk
kkkPkk
),1[ , )( ),,( mqpkkP
Other Models
Presence of a giant (percolating) component
Branching process:
k
kkP
kPk
kkPkQ
)(
)(
)()(
The probability that an edge leads to a vertex with degree k is:
The condition that the branching process prevails:
1)1()()1(
)()1(
k
kk
k
kkPkkQk
22
k
k
Yeast protein networkNodes: proteins
Links: physical interactions (binding)
P. Uetz, et al. Nature 403, 623-7 (2000).
C. Elegans
Li et al. Science 2004
Drosophila M.
Giot et al. Science 2003
Origin of the scale-free topology of PPI networks:gene duplication
Proteins with more interactions are more likely to obtain new links:Π(k) ~ k (preferential attachment)
Wagner 2001; Vazquez et al. 2003; Sole et al. 2001; Rzhetsky & Gomez 2001; Qian et al. 2001; Bhan et al. 2002.
Metabolic network
The metabolic networks of organisms from all three domains of life are scale-free!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)
Archaea Bacteria Eukaryotes
Nodes: chemicals (substrates)Links: bio-chemical reactions
Characterizing the links
Metabolism:Flux Balance Analysis (Palsson)Metabolic flux for each reaction
Edwards, J. S. & Palsson, B. O, PNAS 97, 5528 (2000).Edwards, J. S., Ibarra, R. U. & Palsson, B. O. Nat Biotechnol 19, 125 (2001). Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Nature 420, 186 (2002).
stoichiometric mx. flux vector
Maximize cv, where c is the unit vector in the direction of growth (biomass production).
Global flux organization in the E. coli metabolic network
E. Almaas, B. Kovács, T. Vicsek, Z. N. Oltvai, A.-L. B. Nature, 2004; Goh et al, PRL 2002.
SUCC: Succinate uptakeGLU : Glutamate uptake
Central Metabolism,Emmerling et. al, J Bacteriol 184, 152 (2002)
Inhomogeneity in the local flux distribution
~ k -0.27
Mass flows along linear pathways
RobustnessComplex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns)
node failure
Robustness of scale-free networks
1
S
0 1f
fc
Attacks Failures
Albert, Jeong, Barabasi, Nature 406 378 (2000)
Cohen, Erez, ben-Avraham, Havlin, PRL 85, 4626 (2000)
After random removal of a fraction f of the vertices:
kkk ffk
kkk
0)1()( 0
0
kk
kkk ffk
kkPkP
0
0)1()()( 000
The new degree distribution:
)1()( 01
fkkkPkk
200
1
)1()1()()1()1( fkkkPkkkkk
Percolation: )1()1()1(
10
00 fk
kk
k
kk
Critical fraction:)1(
100
0c
kk
kf
Absence of a critical percolation threshold for γ ≤ 3
Achilles’ Heel of complex networks
Internet
failureattack
R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)
Yeast protein network- lethality and topological position -
Highly connected proteins are more essential (lethal)...
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)
Disease spreading in thesusceptible-infected-susceptible (SIS) epidemic model
)(1)()(d
)(dttkt
t
t
Rate of becoming infected by an infected neighbor: Rate of recovery:
Mean-field approx. for “exponential” networks, where :kk
Steady state solution:k
11
Epidemic threshold:k
1c
Pastor-Satorras and Vespignani, PRE 65, 036104 (2002)
SIS in complex networks
)(1)()(d
)(dttkt
t
tkk
k
Mean-field approximation:
Steady state solution:
k
kkP
kPk
kkPkQ
)(
)(
)()(
The probability that an edge leads to a vertex with degree k is:
k
tkkPkQtt k
k
)()()()()(
The probability that a neighbor is infected:
k
ktk
1
)(
SIS in complex networks
Uniform immunization with probability g does not help in scale free networks if γ ≤ 3.
This has a nontrivial solution when:
k
kkkP
k
1)(
1
from which we get that the epidemic threshold is:
)1( g
11
)(1
d
d
0
k
kkkP
k
2ck
k
Non-uniform immunization of complex networks
~
1
~
~1
~)(
1kkP
k
1~1
Thus, the epidemic threshold is reintroduced: 1~
c
const)1(~
kgkIf i.e. whenk
gk ~
1 then
Motifs
Motifs: Subgraphs that have a significantly higher density in the real network than in the randomized version of the studied network
Randomized networks:Ensemble of maximally random networks preserving the degree distribution of the original network
Function is often carried out by subnetworks,rather than by single components.
R. Milo et al., Science 298, 824-827 (2002)
Three-node connected subgraphs
Hypothesis: they are dynamically desirable “building blocks”.
Feed-Forward (FF) motive is a noise filter.
Why do we have motifs?
Communities:“densely connected subgraphs”
Traditional method: hierarchical clustering (agglomerative method)
All edges are removed, and then added back one by one in decreasing order of their “strengths”.
Communities are defined as the forming components.
dendogram:
The strength of the relationship between any pair of vertices can, e.g., be defined as
where 1
0
][
AIAS l
l
max
1
The matrix Al contains the number of walks with length l between the vertex pairs.
Girvan-Newman method(divisive method)
It also results in a dendogram, by cutting the edges one by one.In each step the edge with the highest “betweenness centrality” (BC) is removed.
The BC of an edge is the number of shortest paths between all pairs of vertices that use this edge.
Girvan and Newman, PNAS 99, 7821 (2002)
Modularity
When should one stop with the agglomeration/division?
Newman and Girvan, PRE 69, 026113 (2004)
g
ggg aeQ 2At the maximal modularity:
hgM
hgegh if
and groupsbetween edges #
2
1
M
gegg
groupin edges #
h
ghg ea (fraction of edge ends being in group g)
Q is the fraction of edges in the groups compared to that in the randomized network.
Potts model
Minimization of the Hamiltonian:
Reichardt and Bornholdt, PRL 93, 218701 (2004)
q
s
ss
Eji
nnJΗ
ji1),(
, 2
)1(
ji
ijJAji
)(,
Clique percolation method (CPM)Most real networks are characterized by overlapping and nested communities.
Divisive/agglomerative methods fail to identify the communities when overlaps are significant.
Derényi, Palla, and Vicsek, Phys. Rev. Lett. 94, 160202 (2005)
Palla, Derényi, Farkas, and Vicsek, Nature 435, 814-818 (2005)
Advantages of this method:
• local,• allows overlaps,• density (not distance) based,• produces no cut-nodes, …
An example of overlappingk-clique communities for k=4:
k-cliques are complete subgraphs of size k:
k = 2 k = 3 k = 4 k = 5
We define a community as a k-clique percolation cluster.
Studied systems:
• Co-authorship networkLos Alamos cond-mat archive30,739 nodes and 136,065 links
• Word association networkSouth Florida Free Association norms list10,617 nodes and 63,788 links
• Protein-protein interaction networkDIP core list of the yeast S. cerevisiae2,609 nodes and 6,355 links
Links are usually weighted (wij).For each value of k (typically k=3,4,5) a threshold weight can be introduced.
(Note that there is a critical threshold at which a giant cluster appears.Optimally the threshold weight should be chosen close to this critical value.)
Web of communities for the protein interaction network of yeast
links represent overlaps between the communities
Community statistics
community size distribution
community degree distribution
overlap size distr. membership number distr.
Clique percolation in an ER graph
Branching process:
1)1( 1c kpkN
http://www.cfinder.org/
Dedicated web page for the CPM (software, papers, data):
Some review papers:
Albert and Barabasi, Rev. Mod. Phys. 74, 47 (2002).
Dorogovtsev and Mendes, Adv. Phys. 51, 1079 (2002).
Useful web page with papers, data, and ppt presentations:
http://www.nd.edu/~networks/(Where many of the slides of this course have been “borrowed” from.)