Influence at Scale: Distributed Computation of Complex Contagion in Networks
Brendan Lucier, Joel Oren, Yaron Singer Microsoft Research, University of Toronto, Harvard University
Motivation
β’ Information requirements: how much of the graph do we need to examine? (query complexity).β’ Efficient parallel computation: how to distribute computation; maintain high-volume intermediate data.
The FrameworkThe influence model: The Independent Cascade Model [KKTβ03]β’ Input: edge-weighted graph πΊ = (π, πΈ,p), initial seeds π0 β π.β’ At every synchronous step π‘ > 0, every node previously infected node
π’ β ππ‘ β ππ‘β1 successfully infects an uninfected neighbor π£ w.p. ππ’π£.ππ‘+1 -- set of infected nodes at step π‘ + 1.
β’ The influence function: π π0 = πΈ[ππ‘]
The MRC parallel computation model [KSVβ10]β’ Inspired by the MapReduce paradigm.β’ Synchronous round computation on π π; π£ on tuples. On every round:
1. Map: apply local transformation on tuples in a streaming fashion.2. Reduce: polytime computation on aggregates of tuples with same
key.β’ Constraintaints: π1βπ machines and space per machine; logππ rounds.
β’ Graph access model: Link server.
The Information Requirements of Influence Estimation
β’ Question: how much of the graph to we need to know in order to estimate the influence?
β’ Concretely: what is the query complexity of approximating π(π0)?
Theorem: Let π β 0,1
3, πΏ β [0,
1
2]. For large enough π = |π|, βπΊ = (π, πΈ, π)
for which getting an estimate π(π0) of π(π0) s.t. 1 β π π π0 β€ π π0 β€ π(π0) with confidence β₯ 1 β πΏ requires Ξ©((1 β 2πΏ) π link server queries.
Main algorithm: Intermediate Layer; Estimating the Expectation
Main intuitions:1. High/low influence few/many samples needed,
2. Can compute the Riemann integral for CDF (π π0 = 0
β1 β πΉ(πΌ) ππΌ)
Lemma (informal):w.h.p., VerifyGuess returns True if π > πππππ’ππππ.
Main Algorithm: Top Layer; Iterating on Candidate Values
Theorem: For any π β (0,1
4), InfEst provides a (1 + 8π)-approx. w.h.p.
Algorithm: Bottom Layer; Bounded SamplesHighlights: Input: Seeds π0 β π, link-server oracle π(β ), πΏ- # samples, bound π‘.1. Simultaneous prob-BFS in MapReduce, one layer per round.2. Finalize a sample when at least π‘ nodes infected.3. Return fraction of samples that reached bound.
Theorem (informal): 1) If πΏ is suff. small, # of tuples is Ξ©(π1.5 log4 π).2) If ππππ πΊ = logπ π, algorithm takes polylog rounds.
Experimental Results
1
2
5
3
6
7
4
πΊ
π
π
π₯
π π
π0
ππ₯π
ππ₯πππππππ
πππ
L-Sπ(π’)
π+(π’)
π
Running timeApprox. ratios (Epinions)
Heuristics(Epinions)
Scalable estimation of the influence of individuals in massive social networks