Eight Formalisms for Defining Graph Models

Models of Graphs

Jérôme KunegisOberseminar2013-08-29

Jérôme Kunegis Models of Graphs 2

Erdős–Rényi

Each edge has probability p of existing

P(G) = pm (1 − p)(M − m)

m = #edgesM = max possible #edges


Barabási–Albert

An edge appears with probability proportional to the degree of the

node it connects

P({u, v}) d(u)∼

d(u) = degree of node u


What Everybody Thinks

My network model leads to graphs that have the same properties as

actual social networks

Hmmm...


P(G) = pm (1 − p)(M − m)

P({u, v}) d(u)∼

Why don't you use the same formalism??

Comparison


Formalisms for Graph Models

(1) Specify a graph generation algorithm(2) Specify a graph growth algorithm(3) Specify the probability of any graph(4) Specify the probability of any edge(5) Specify the probability of any event(6) Specify a score for node pairs(7) Matrix model(8) Graph compression


(1) Specify a Graph Generation Algorithm

STEP 1: Specify rules for generating a graph

Take a lattice, and rewire a certain proportion of edges randomly

EXAMPLE: small-world model (Watts & Strogatz 1998)

STEP 2: Generate random graph(s)

STEP 3: Compare with actual networks

Hey, a small diameter and large clustering coefficient!

●Not generative●Not probabilistic


(2) Specify a Graph Growth Algorithm

An edge appears with probability proportional to the degree with probability p and at

random with probability (1 − p)

STEP 1: Specify exact growth rules

STEP 2: Generate random graph(s)

STEP 3: Compare with actual networks

Look, a power law!

EXAMPLE: preferential attachment (Barabási & Albert 1999)

●No overall probability


What We Need: A Probabilistic Model

A probabilistic model assigns a probability to each possible value.

X: set of possible valuesx ∈ X: a valuep: A parameter of the modelP(x; p): Probability of x, given p, OR

Likelihood of p, given x

Σx∈X P(x; p) = 1 // Because P is a distribution for a given p

Given a set of values {xi} for i = 1, … N, the best fitting p can be found bymaximum likelihood:

maxp Πi P(xi, p)

So, are “values” whole graphs or individual edges?


(3) Specify the Probability of Any Graph

Each edge has probability p of existing

STEP 1: Specify the probability of any graph G

●Not generative●Needs multiple graphs for inference

STEP 2: Given a set of graphs with the same number of nodes, compute the likelihood of any value p

EXAMPLE: (Erdős & Rényi 1959)


Example: Extension of Erdős–Rényi using Formalism (3)

Goal: Add a parameter that controls the number of triangles.

Idea: The E–R model with parameter p is an exponential family; the extension should be too.

P(G) = (1 / C) pm (1 − p)(M − m) qt (1 − q)(T − t)

where t is the #triangles, T is the maximum possible #triangles.

Note: q = 1/2 gives the ordinary E–R model.

Result: exponential random graph models (ERGM) and p* models

The normalization constant C cannot be computed. It would be necessary to count the number of graphs with

n vertices, m edges and t triangles. This is a hard, open problem.

Gibbs sampling works, however.

Open problem: Use Gibbs sampling to generate mini-models of networks.


(4) Specify the Probability of Any Edge

STEP 1: Specify probability for all pairs {u, v}

EXAMPLE: Use a given degree vector d as parameter, and P({u, v}) = du dv

EXAMPLE: The p1 model based on node attributes (Holland & Leinhard 1977)

STEP 2: Compute likelihood of parameters

●Not generative

Let's model each edge as an event, not a full graph

●Supports multiple edges


Preliminary Results for Formalism (4)

The best rank-1 model is given by the preferential attachment model.

Let a graph G be given. Among all models of the form P({u, v}) = x xT, the one with maximum likelihood is given by

P({u, v}) = d(u) d(v) / 2m

Proof: By induction over n.

Open problem: define other models using this formalism

Hey, that's differentfrom minimizing the least squares distance to the given adjacency matrix, where the SVD is best


(5) Specify the Probability of Any Event

Let's specify the probability of an edge addition, given the current graph

STEP 1: Specify the probability of an edge addition given the current graph

EXAMPLE: P({u, v}) = p / n² + (1 − p) d(u) d(v) / 2m

STEP 2: Compute the likelihood

OTHER EXAMPLE: (Akkermans & al. 2012)

Open problem: Inference of parameters from real networks.

Generalizes naturally to edge removal events.


(6) Specify a Score for Node Pairs

Read my paper

STEP 1: Given a graph, specify a score for each node pairs

STEP 2: Evaluate using information retrieval methods

I know, that's link prediction!

●Not probabilistic

(Liben-Nowell & Kleinberg 2003)


(7) Matrix Model

STEP 1: Specify a probability matrix

STEP 2: Map nodes of the graph to rows/columns of the matrix

STEP 3: Compute the likelihood

Let's try the Kronecker product

EXAMPLE: (Leskovec & al. 2005)

●Not generative

Can I do this with any matrix?


(8) Graph Compression

STEP 1: Specify a graph compression algorithm

STEP 2: Check how well it compresses a graph

(Shannon)

More probable values should have shorter representations

I wonder how the E-R model can be used here

●Not generative

Now let'sdo someresearch!

SUMMARY

(1) Graph generation (e.g., Watts–Strogatz)(2) Graph growth (e.g., Barabási–Albert)(3) Graph probability (e.g., Erdős–Rényi)(4) Edge probability (5) Event probability(6) Edge score (link prediction)(7) Matrix model (e.g., Leskovec & al.)(8) Graph compression

Inference

Mini-models

Rank-2 model

Spectral model

sup

erc

ed

ed

by

Eq

uiv

ale

nce

Education

Eight Formalisms for Defining Graph Models