Dense subgraphs of random graphs

Dense subgraphs of random graphs

Uriel Feige

Weizmann Institute

Talk Outline

Discuss problems related to dense subgraphs of random graphs:

Planted k-clique.

Dense k-subgraph (if time permits).

Random Clique

Random graph G on n vertices and edge probability ½.

Maximum clique size almost surely 2log n.

Upper bound: expectation.

Lower bound: + variance.

Not constructive.

How to actually find the clique?

Greedy(degree) algorithm finds clique of size log n (plus low order terms).

No better polytime algorithm known.

Exhaustive search in time nO(log n).

Cryptographic applications[Juels and Peinado]

Assuming state of the art is not improved:

• Oneway functions.

• Hierarchical keys.

(Idea: distribution does not change if a small number of cliques of size 1.5 log n are planted in the graph.)

Planted/hidden clique

Random graph G on n vertices and edge probability ½.

A random set H of k vertices turned into a clique.

If k > 2log n, H will almost surely be the unique maximum clique in G.

Find H. Becomes easier the larger k is.

Degree concentration

• Degrees of vertices in G strongly concentrated around n/2.

• Distribution of degrees of H-vertices statistically different than other vertices if k larger than standard deviation.

• Kucera: if k > c(n log n)1/2, H is simply all vertices of largest degree.

(Greedy(degree) algorithm outputs H)

Use of eigenvectors[Alon, Krivilevich and Sudakov]

• Normalize adjacency matrix of G to sum up to 0.

• Eigenvalues of G strongly concentrated around 0. No value larger than n1/2.

• If k > cn1/2, H contributes a larger eigenvalue.

• H can be recovered from the eigenvector that corresponds to largest eigenvalue (takes some work).

Constant improvements

• Guess a vertex from H, and restrict problem to its neighborhood.

• Clique relative size increases, and graph remains random.

• Can find planted cliques of size n1/2/2t in time nO(t).

• Polynomial (but very slow) for fixed t.

Use of SDP [Feige and Krauthgamer]

Lovasz theta function provides upper bound of clique size.

On random graphs, its value is known to be O(n1/2).

Can be used to both find and certify optimality of H when k > n1/2.

Going below n1/2

• A certain Markov chain approach fails [Jerrum].

• Use of t levels of Lovasz-Schrijver SDP relaxations no better than simply guessing t vertices of clique [Feige and Krauthgamer].

• For k > n1/3, a global maximum of a certain cubic form [Frieze and Kannan].

Why care about planted clique?

Seems to require the development of new algorithmic techniques.

A concrete challenge for understanding observable properties of random graphs (does planting a large clique make a noticeable difference?).

Related to some other problems.

Interesting connection

• In a 2-person game, an approximate Nash equilibrium with nearly best payoffs (compared to true Nash) can be found in time nO(log n) [Lipton, Markakis and Metha].

• A poly-time algorithm for approximate best Nash will solve the hidden clique problem in polynomial time [Hazan and Krauthgamer].

The experimental approach to the design and analysis of algorithms

For hidden clique, the input distribution is well defined and can be sampled from efficiently.

To evaluate a candidate algorithm, run it on a random sample and observe performance.

• If not good, modify the algorithm.• If good, analyze the algorithm.

In practice, graphs for experiments are generated using pseudorandom generators.

Experimental results(with Dorit Ron)

• n = 40,000.• m = 400,000,000.• n1/2 = 200.For success rate roughly ½:• k = 158 (Alg1 - LDR), 137 (Alg2 - TPMR). Is this good or bad?• 2 log n = 30 • n1/4 = 14.

Understanding large sets of results

To estimate the success probability within 1% error requires roughly 10,000 experiments.

To see patterns, helps if results are displayed graphically.

Do our algorithms work when k = n0.49?

Need experiments with large n.

Jumping to conclusions

Care is needed.

• Is the PRG the issue?

• Is n sufficiently large to draw asymptotic conclusions?

• Might the choice of scaling of the x-axis be biasing our interpretation?

Jump to the analysis?

The TPMR algorithm (Truncated Power Method Removal) looks promising.

Difficult to analyze, but worth it, because the algorithm is so special.

Or is it? (there was also Alg1 …)

Information on the algorithms

General idea:

• Sort vertices by likelihood of being in H.

• Remove (one or more) least likely vertices.

• Repeat.

Our algorithms take linear time (in m).

Low Degree Removal (LDR)

Iterative removal phase:

• If current graph is a clique, move to expansion phase.

• Remove vertex of lowest degree (breaking ties arbitrarily).

Iterative expansion phase:

• Add vertices that are connected to all the clique.

Theorem

For every < 1 there is a constant c such that if k > cn1/2 then LDR finds the hidden k-clique H for at least a fraction of the input

instances.

Sketch of proof of theorem

Lemma 1. In every subgraph with t > 11k/10 vertices, some vertex not in H has degree at most t/2 + c1n1/2.

Proof. Straightforward. Large deviation bounds on average degree + union bound.

Corollary

As long as t > 11k/10 vertices remain, LDR removes a vertex of degree “not much larger” than t/2 (at most t/2 +c1n1/2 ).

Lemma 2

For any vertex v,

with high probability (say 99/100),

up to the point v was removed (if at all),

v’s average degree to removed vertices not in H is at most 1/2,

with a total deviation no larger than c2n1/2.

Sketch of proof of Lemma 2

Reveal the edges of v only when needed.

Given a candidate vertex u for removal, if no edge (u,v) then remove u. Otherwise perhaps delay removal.

Average rate of removal at most 1/2.

Probability of excursion larger than c2n1/2 is small.

Most vertices of H survive LDR.

Almost all vertices of H start with “very high” degree (assuming that c > 4(c1 + c2)).

There are always vertices of not high degree available for removal. (Lemma 1.)

The first k/10 high degree vertices of H to be removed must have lost degree at a high rate. This is a low probability event, by Lemma 2 and Markov’s inequality.

Finishing the proof

9k/10 vertices of H among the last 11k/10 survivors.

Hence no vertex not in H can survive the removal phase.

Expansion phase will pick up remaining vertices from H.

Conjectures

• The leading constant c is small: when =1/2, then c < 1 suffices.

• Order of quantifiers can be switched: for some c, the fraction tends to 1 as n grows.

• Lower bounds: LDR fails when k = o(n1/2).

Open question

Does the size of the planted clique exhibit threshold behavior with respect to the success probability of the LDR algorithm?

Truncated Power Method Removal TPMR algorithm

• Initially x is the vector of degrees.• Compute x’ = Ax.• Normalize x’ to sum up to 0.• Average x and x’ to get a new x.• Repeat 6 times. Sort vertices by their x value.Remove the lower 10%.Etc.

Some observations on TPMR

• Linear time in m, though slower than LDR.

• Finds smaller planted cliques than LDR.

Why not let x converge?

• Faster.

• Performs better in our experiments.

Any hope of analysing TPMR?

Summary

Experimental approach suggests interesting observations.

• Commit in small steps. (Related to “decimation” in message passing algs.)

• Truncated power method is better than power method.

Challenge: support observations by analysis.

Running times

Lenovo 2.53 Ghz and 3GB RAM.

20 samples with around 50% success rates.

N GEN | LDR | TPMR

2500 14 | 17 (3) | 48 (34) |

5000 72 | 80 (8) | 199 (127) |

10000 334 | 365 (31) | 832 (498) |

Documents

Dense subgraphs of random graphs