Upload
cargan
View
30
Download
2
Embed Size (px)
DESCRIPTION
COMS 6998-06 Network Theory Week 2: September 15, 2010. Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010. (3) Random graphs. Statistical analysis of networks. We want to be able to describe the behavior of networks under certain assumptions. - PowerPoint PPT Presentation
Citation preview
COMS 6998-06 Network TheoryWeek 2: September 15, 2010
Dragomir R. RadevWednesdays, 6:10-8 PM
325 Pupin TerraceFall 2010
(3) Random graphs
Statistical analysis of networks
• We want to be able to describe the behavior of networks under certain assumptions.
• The behavior is described by the diameter, clustering coefficient, degree distribution, size of the largest connected component, the presence and count of complete subgraphs, etc.
• For statistical analysis, we need to introduce the concept of a random graph.
Erdos-Renyi model
• A very simple model with several variants.• We fix n and connect each candidate edge with
probability p. This defines an ensemble Gn,p
• The two examples below are specific instances of G10,0.2.
In other models, m is fixed. There are also versions in which some graphs are more likely than others, etc.
Try Pajek
Erdos-Renyi model
• We are interested in the computation of specific properties of E-R random graphs.
• The number ofcandidate edges is:
• The actual number of edges mis on average:
• We will look at the actual distribution in a bit.
2
)1(
2
nnn
2
)1(
nnpm
Properties
• The expected value of a Poisson-distributed random variable is equal to λ and so is its variance.
• The mode of a Poisson-distributed random variable with non-integer λ is equal to floor(λ), which is the largest integer less than or equal to λ. When λ is a positive integer, the modes are λ and λ − 1.
Degree distribution
• The probability p(k) that a node has a degree k is Binomial:
• In practice, this is the Poisson distribution
for large n (n >> kz)where is the mean degree
• Average degree = = 2m/n = p(n-1) ≈ pn
knk ppn
kkp
1)1(1
)(
!)(
k
ekp
k
Giant component size
• Let v be the number of nodes that are not in the giant component. Then u=v/n is the fraction of the graph outside of the giant component.
• If a node is outside of the giant component, its k neighbors are too. The probability of this happening is uk.
• Let S=1-u. We now haveFor <1, the only non-negative solution is S=0For >1 (after the phase transition), the only non-negative solution is the size of the giant component
• At the phase transition, the component sizes are distributed according to a power law with exponent 5/2.
)1(
00 !
)(
u
k
k
k
kk e
k
ueupu
SeS 1
Giant component size
• Similarly one can prove thatS
s
1
1
[Newman 2003]
Diameter
• A given vertex i has Ni1 first neighbors. The
expected value of this number is .• But we also know that = pn.• Now move to Ni
2. This is the number of second neighbors of i. Let’s make the assumption that these are the neighbors of the first neighbors. So,
• What does this remind you of?• When must the procedure end?
212 ii NN
Diameter (cont’d)
DDiii NNNn 21
DDiN
At all distances:
For D equal to the diameter of the graph:
In other words (after taking a logarithm):
log
lognD
Are E-R graphs realistic?
• They have small world properties (diameter is logarithmic in the size of the graph)
• But low clustering coefficient. Example for autonomous internet systems, compare 0.30 with 0.0004 [Pastor-Satorras and Vespignani]
• And unrealistic degree distributions• Not to mention skinny tails
Clustering coefficient
• Given a vertex i and its two real neighbors j and k, what is the probability that the graph contains an edge between j and k.
• Ci = #triangles at i / #triples at I• C = average over all Ci
• Typical value in real graphs can be as high as 50% [Newman 2002].
• In random graphs, C = p (ignoring the fact that j and k share a neighbor (i).
Some real networks
• From Newman 2002:Network n Mean degree z Cc Cc for random graph
Internet (AS level) 6,374 3.8 0.24 0.00060
WWW (sites) 153,127 35.2 0.11 0.00023
Power grid 4,941 2.7 0.080 0.00054
Biology collaborations 1,520,251 15.5 0.081 0.000010
Mathematics collaborations 253,339 3.9 0.15 0.000015
Film actor collaborations 449,913 113.4 0.20 0.00025
Company directors 7,673 14.4 0.59 0.0019
Word co-occurrence 460,902 70.1 0.44 0.00015
Neural network 282 14.0 0.28 0.049
Metabolic network 315 28.3 0.59 0.090
Food web 134 8.7 0.22 0.065
[Newman 2002]
Graphs with predetermined degree sequences
• Bender and Canfield introduced this concept.
• For a given degree sequence, gie the same statistical weight to all graphs in the ensemble.
• Generate a random sequence in proportion to the predefined sequence
• Note that the sum of degrees must be even.
(4) Software
List of packages
• Pajek: http://vlado.fmf.uni-lj.si/pub/networks/pajek/
• Jung: http://jung.sourceforge.net/ • Guess: http://graphexploration.cond.org/• Networkx: https://networkx.lanl.gov/wiki • Pynetconv: http://pynetconv.sourceforge.net/ • Clairlib: http://www.clairlib.org • UCINET