Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical

Complex networks in naturePHYSBIO 2007

Imre DerényiDept. of Biological Physics, Eötvös University, Budapest

Complex systemsare often made of

many non-identical elements connected by diverse interactions.

networks

graphs

Outline

Lectures 1-3:Graph theoretical basics, examples of real networks, basic models (Erdős-Rényi, small world, scale free graphs) and their properties, examples.

Lecture 4:Dynamics on networks: error and attack tolerance, disease spreading, metabolic networks.

Lecture 5:Network motifs and communities.

Graph theory basics

A graph, usually denoted as G(V,E), consists of a set of vertices (or nodes) V together with a set of edges (or links) E. Every edge connects its two endvertices. The order of a graph (denoted by N) is the number of its vertices.

A graph is a simple graph if it has no multiple edges or loops.If not stated otherwise, a graph is usually assumed to be simple.

Two vertices are adjacent (or neighbors of each other) if there is an edge connecting them.

Every graph can be represented by its adjacency matrix A, which is an NN symmetric binary matrix with elements Aij = Aji = 1 if vertex i is adjacent to vertex j and Aij = Aji = 0 otherwise.

0010100

0010000

1100100

0000101

1011010

0000101

0001010

A The degree ki of vertex i is the number of its neighbors (or edges):

N

jji

N

jiji AAk

11

The sum of the degrees of all the vertices is twice the number M of the edges of the graph:

N

jiij

N

ii AkM

1,1

2

A sequence of adjacent vertices is a walk.A walk is closed if its first and last vertices are the same, and open if they are different.

A walk in which no edge occurs more than once is known as a trail.A closed trail is called tour or circuit.

A walk in which no vertex occurs more than once is known as a path.A cycle can be defined as a closed path.

Two vertices are reachable from each other, if there exists a path between them.

A graph is connected, if any of its vertices can be reached from any other.

A path or cycle is Hamiltonian if it uses all vertices exactly once.

A trail or circuit is Eulerian if it uses all edges precisely once.

A component of a graph is defined as a maximal connected subgraph.

A subgraph of a graph G is a graph whose vertices and edges are subsets of those of G.

A subgraph of G is a spanning subgraph, or factor, if it contains all the vertices of G.

k-cliques are complete subgraphs of order (size) k.

Cliques are maximal complete subgraphs.

A tree is an acyclic connected graph.It has N-1 edges.

The distance d(i, j) between two (not necessary distinct) vertices i and j is the length of a shortest path between them.

The length l of a walk is the number of edges that it uses.

The eccentricity ε(i) of a vertex i is its maximum distance from any other vertex:

The diameter D of a graph is its maximum eccentricity:

The characteristic path length (sometimes also called diameter) is defined as:

),(max)( jidij

),(max)(max,

jidiDjii

),(maxmin)(min jidiR

jii

ji

jidNN

L ),(2/)1(

1

The radius R of a graph is its minimum eccentricity:

Extensions

If weight or cost is assigned to each edge, then we get a weighted graph.In the calculation of lengths the weights are taken into account.

In a hypergraph more than two vertices can be connected by hyperedges.

If the edges are directed, then we have a directed graph or digraph.In-neighbors and out-neighbors, and in-degrees and out-degrees can be distinguished.

Random graphs

Graph theory was invented by Euler in the 18th century.The early work was concentrated on small graphs with a high degree of regularity

Random-graph theory was introduced by Erdős and Rényi in the late 1950s.As complex networks often appear to be random, random-graph theory appears to be a useful tool in the study of large complex networks.

The Erdős-Rényi model

Pál ErdősPál Erdős (1913-1996)

Original model:Connect N nodes by M edges randomly.

Alternative model:Connect every pair of the N nodes with probability p.

The two models (or ensembles) become equivalent in the thermodynamic limit

2/)1(for

NN

MpN

p=1/6

The average degree of a node is

pNNpN

Mk )1(

2

19:58

The Erdős-Rényi model

Degree distribution:

The characteristic path length can be estimated from

Poisson distribution

kNkk pp

k

NP

1)1(

1

k

k

k

k e!

NkL

k

NL

log

logresulting in

The greatest discovery of Erdős and Rényi was that many network properties appear suddenly as p is increased.

As an example let us consider the occurrence of an arbitrary subgraph consisting of n vertices and m edges.

Their number can be estimated as:a

pNp

a

n

n

N mnm

!

Thus the critical probability of appearance is: mncNpp /c

A giant (percolating) component also appears suddenly.

This can easily be understood with the help of a branching process:1. Let us start to grow a component from a seed vertex by

randomly selecting its neighbors from the remaining N-1 vertices with probability p.

2. Let us repeat this process with the newly selected vertices as seeds, over and over again.

3. The branching process stops when no new neighbor is selected.

If p < pc = 1/N then the expected number of new neighbors is smaller than the number of seeds, and the branching process quickly comes to a halt.

If , on the other hand, p > pc = 1/N then the component can easily grow to infinity.

k

The giant component has a tree-like structure.

Are complex networks really random?

No!One big difference is that nodes are often clustered, i.e., neighbors of a node tend to be connected to each other.

Clustering coefficient:2/)1(

of neighbors ebetween th links of #

iii kk

iC

Small worlds:Networks are clustered,

[C >> Crand = p]but have a small

characteristic path length L.

Network C Crand L N

WWW 0.1078 0.00023 3.1 153127

Internet 0.18-0.3 0.001 3.7-3.763015-6209

Actor 0.79 0.00027 3.65 225226

Coauthorship 0.43 0.00018 5.9 52909

Metabolic 0.32 0.026 2.9 282

Foodweb 0.22 0.06 2.43 134

C. elegance 0.28 0.05 2.65 282

Probability that the neighbors are connected

Watts-Strogatz model

[Watts and Strogatz, Nature 393, 440 (1998)]

Watts-Strogatz modeln nodes per block:

)log(

)/log()(

pn

nNnnL

0d

d

n

L

0

)log(

)/log(

)log(

1

)log(

)/log(2

pn

nN

pnpn

nN

0)log(

)/log(1)/log(

pn

nNnN

1)log( pn pn /1

p

NpL

)log( if pN /1

Optimal n:

World Wide Web

800 million documents (S. Lawrence, 1999)

ROBOT: collects all URL’s found in a document and follows them recursively

Nodes: WWW documents Links: URL links

R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999)

P(k=500) ~ 10-99

N(k=500)~10-90

What can we expect for ER and WS networks?

The results: Scale-free networkout= 2.45

in = 2.1

out~)(outkkP in~)(in

kkP

P(k=500) ~ 10-6

N(k=500)~103

k ~ 6

NWWW ~ 109

INTERNET BACKBONE

(Faloutsos, Faloutsos and Faloutsos, 1999)

Nodes: computers, routers Links: physical lines

ACTOR CONNECTIVITIES

Nodes: actors Links: cast jointly

N = 212,250 actors k = 28.78

P(k) ~k-

=2.3

SCIENCE CITATION INDEX

( = 3)

Nodes: papers Links: citations

(S. Redner, 1998)

P(k) ~k-

1736 PRL papers (1988)

Nodes: scientist (authors) Links: joint publication

(Newman, 2000, Barabasi et al 2001)

SCIENCE COAUTHORSHIP

M: mathNS: neuroscience

Nodes: online user Links: email contact

Ebel, Mielsch, Bornholdt, PRE 2002.

Online communities

Kiel University log files 112 days, N=59,912 nodes

Food Web

Nodes: trophic species Links: trophic interactions

R.J. Williams, N.D. Martinez , Nature (2000)R. Sole (cond-mat/0011195)

http://online.sfsu.edu/~webhead/lrlbest.jpg

Sex-web

Nodes: people (Females; Males)Links: sexual relationships

Liljeros et al. Nature 2001

4781 Swedes; 18-74; 59% response rate.

Most real world networks have the same internal structure:

Scale-free networks

Why?

What does it mean?

SCALE-FREE NETWORKS

(1) The number of nodes (N ) is NOT fixed. Networks continuously expand

by the addition of new nodes

Examples: WWW : addition of new documents Citation : publication of new papers

(2) The attachment is NOT uniform.A node is linked with higher probability to a

node that already has a large number of links.

Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again

Origins SF

Scale-free model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).

(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the degree ki of that node

A.-L. Barabási, R. Albert, Science 286, 509 (1999)

jj

ii k

kk

)(

P(k) ~k-3

Mean Field Theory

t

k

mt

mk

k

km

t

k ii

j j

ii

22d

d

ii t

tmtk )(

, with the initial condition: mtk ii )(

/1/1

1)()(ˆ

k

mt

k

mtktkkP ii

)/11(/11

/1

~1

d

ˆd)(

kk

m

k

PkP

A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)

2

1

31

1

Growth without preferential attachment

t

mm

t

k

j

i 1

1

d

d

ii t

tmtk ln)( "0"

-k/m-k/mii ttktkkP e1e)()(ˆ

mk

mk

PkP /e

1

d

ˆd)( "

11"

Preferential Attachment

Citation network

Internet

t

kk

t

k ii

i

~)( For given t, k (k)

(Jeong, Neda, A.-L. B, cond-mat/0104131)

k

ki

i

kk0

)()(

exponent is not universal

Extended Model

• prob. p : internal links• prob. q : link deletion• prob. 1-p-q : add node

WWW(in)

Internet ActorCitation

indexSexWeb

Cellularnetwork

Phone callnetwork

linguistics

= 2.1 = 2. 5 = 2.3 = 3 = 3.5 = 2.1 = 2.1 = 2.8

2 if d )(minmin

1

kkk

kkkkPk

3 if d )(minmin

222

kkk

kkkPkk

),1[ , )( ),,( mqpkkP

Other Models

Presence of a giant (percolating) component

Branching process:

k

kkP

kPk

kkPkQ

)(

)(

)()(

The probability that an edge leads to a vertex with degree k is:

The condition that the branching process prevails:

1)1()()1(

)()1(

k

kk

k

kkPkkQk

22

k

k

Yeast protein networkNodes: proteins

Links: physical interactions (binding)

P. Uetz, et al. Nature 403, 623-7 (2000).

C. Elegans

Li et al. Science 2004

Drosophila M.

Giot et al. Science 2003

Origin of the scale-free topology of PPI networks:gene duplication

Proteins with more interactions are more likely to obtain new links:Π(k) ~ k (preferential attachment)

Wagner 2001; Vazquez et al. 2003; Sole et al. 2001; Rzhetsky & Gomez 2001; Qian et al. 2001; Bhan et al. 2002.

Metabolic network

The metabolic networks of organisms from all three domains of life are scale-free!

H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)

Archaea Bacteria Eukaryotes

Nodes: chemicals (substrates)Links: bio-chemical reactions

Characterizing the links

Metabolism:Flux Balance Analysis (Palsson)Metabolic flux for each reaction

Edwards, J. S. & Palsson, B. O, PNAS 97, 5528 (2000).Edwards, J. S., Ibarra, R. U. & Palsson, B. O. Nat Biotechnol 19, 125 (2001). Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Nature 420, 186 (2002).

stoichiometric mx. flux vector

Maximize cv, where c is the unit vector in the direction of growth (biomass production).

Global flux organization in the E. coli metabolic network

E. Almaas, B. Kovács, T. Vicsek, Z. N. Oltvai, A.-L. B. Nature, 2004; Goh et al, PRL 2002.

SUCC: Succinate uptakeGLU : Glutamate uptake

Central Metabolism,Emmerling et. al, J Bacteriol 184, 152 (2002)

Inhomogeneity in the local flux distribution

~ k -0.27

Mass flows along linear pathways

RobustnessComplex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns)

node failure

Robustness of scale-free networks

1

S

0 1f

fc

Attacks Failures

Albert, Jeong, Barabasi, Nature 406 378 (2000)

Cohen, Erez, ben-Avraham, Havlin, PRL 85, 4626 (2000)

After random removal of a fraction f of the vertices:

kkk ffk

kkk

0)1()( 0

0

kk

kkk ffk

kkPkP

0

0)1()()( 000

The new degree distribution:

)1()( 01

fkkkPkk

200

1

)1()1()()1()1( fkkkPkkkkk

Percolation: )1()1()1(

10

00 fk

kk

k

kk

Critical fraction:)1(

100

0c

kk

kf

Absence of a critical percolation threshold for γ ≤ 3

Achilles’ Heel of complex networks

Internet

failureattack

R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)

Yeast protein network- lethality and topological position -

Highly connected proteins are more essential (lethal)...

H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)

Disease spreading in thesusceptible-infected-susceptible (SIS) epidemic model

)(1)()(d

)(dttkt

t

t

Rate of becoming infected by an infected neighbor: Rate of recovery:

Mean-field approx. for “exponential” networks, where :kk

Steady state solution:k

11

Epidemic threshold:k

1c

Pastor-Satorras and Vespignani, PRE 65, 036104 (2002)

SIS in complex networks

)(1)()(d

)(dttkt

t

tkk

k

Mean-field approximation:

Steady state solution:

k

kkP

kPk

kkPkQ

)(

)(

)()(

The probability that an edge leads to a vertex with degree k is:

k

tkkPkQtt k

k

)()()()()(

The probability that a neighbor is infected:

k

ktk

1

)(

SIS in complex networks

Uniform immunization with probability g does not help in scale free networks if γ ≤ 3.

This has a nontrivial solution when:

k

kkkP

k

1)(

1

from which we get that the epidemic threshold is:

)1( g

11

)(1

d

d

0

k

kkkP

k

2ck

k

Non-uniform immunization of complex networks

~

1

~

~1

~)(

1kkP

k

1~1

Thus, the epidemic threshold is reintroduced: 1~

c

const)1(~

kgkIf i.e. whenk

gk ~

1 then

Motifs

Motifs: Subgraphs that have a significantly higher density in the real network than in the randomized version of the studied network

Randomized networks:Ensemble of maximally random networks preserving the degree distribution of the original network

Function is often carried out by subnetworks,rather than by single components.

R. Milo et al., Science 298, 824-827 (2002)

Three-node connected subgraphs

Hypothesis: they are dynamically desirable “building blocks”.

Feed-Forward (FF) motive is a noise filter.

Why do we have motifs?

Communities:“densely connected subgraphs”

Traditional method: hierarchical clustering (agglomerative method)

All edges are removed, and then added back one by one in decreasing order of their “strengths”.

Communities are defined as the forming components.

dendogram:

The strength of the relationship between any pair of vertices can, e.g., be defined as

where 1

0

][

AIAS l

l

max

1

The matrix Al contains the number of walks with length l between the vertex pairs.

Girvan-Newman method(divisive method)

It also results in a dendogram, by cutting the edges one by one.In each step the edge with the highest “betweenness centrality” (BC) is removed.

The BC of an edge is the number of shortest paths between all pairs of vertices that use this edge.

Girvan and Newman, PNAS 99, 7821 (2002)

Modularity

When should one stop with the agglomeration/division?

Newman and Girvan, PRE 69, 026113 (2004)

g

ggg aeQ 2At the maximal modularity:

hgM

hgegh if

and groupsbetween edges #

2

1

M

gegg

groupin edges #

h

ghg ea (fraction of edge ends being in group g)

Q is the fraction of edges in the groups compared to that in the randomized network.

Potts model

Minimization of the Hamiltonian:

Reichardt and Bornholdt, PRL 93, 218701 (2004)

q

s

ss

Eji

nnJΗ

ji1),(

, 2

)1(

ji

ijJAji

)(,

Clique percolation method (CPM)Most real networks are characterized by overlapping and nested communities.

Divisive/agglomerative methods fail to identify the communities when overlaps are significant.

Derényi, Palla, and Vicsek, Phys. Rev. Lett. 94, 160202 (2005)

Palla, Derényi, Farkas, and Vicsek, Nature 435, 814-818 (2005)

Advantages of this method:

• local,• allows overlaps,• density (not distance) based,• produces no cut-nodes, …

An example of overlappingk-clique communities for k=4:

k-cliques are complete subgraphs of size k:

k = 2 k = 3 k = 4 k = 5

We define a community as a k-clique percolation cluster.

Studied systems:

• Co-authorship networkLos Alamos cond-mat archive30,739 nodes and 136,065 links

• Word association networkSouth Florida Free Association norms list10,617 nodes and 63,788 links

• Protein-protein interaction networkDIP core list of the yeast S. cerevisiae2,609 nodes and 6,355 links

Links are usually weighted (wij).For each value of k (typically k=3,4,5) a threshold weight can be introduced.

(Note that there is a critical threshold at which a giant cluster appears.Optimally the threshold weight should be chosen close to this critical value.)

Web of communities for the protein interaction network of yeast

links represent overlaps between the communities

Community statistics

community size distribution

community degree distribution

overlap size distr. membership number distr.

Clique percolation in an ER graph

Branching process:

1)1( 1c kpkN

http://www.cfinder.org/

Dedicated web page for the CPM (software, papers, data):

Some review papers:

Albert and Barabasi, Rev. Mod. Phys. 74, 47 (2002).

Dorogovtsev and Mendes, Adv. Phys. 51, 1079 (2002).

Useful web page with papers, data, and ppt presentations:

http://www.nd.edu/~networks/(Where many of the slides of this course have been “borrowed” from.)

Documents

Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical