Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Influence at Scale: Distributed Computation of Complex Contagion in Networks
Brendan Lucier, Joel Oren, Yaron Singer Microsoft Research, University of Toronto, Harvard University
Motivation
โข Information requirements: how much of the graph do we need to examine? (query complexity).โข Efficient parallel computation: how to distribute computation; maintain high-volume intermediate data.
The FrameworkThe influence model: The Independent Cascade Model [KKTโ03]โข Input: edge-weighted graph ๐บ = (๐, ๐ธ,p), initial seeds ๐0 โ ๐.โข At every synchronous step ๐ก > 0, every node previously infected node
๐ข โ ๐๐ก โ ๐๐กโ1 successfully infects an uninfected neighbor ๐ฃ w.p. ๐๐ข๐ฃ.๐๐ก+1 -- set of infected nodes at step ๐ก + 1.
โข The influence function: ๐ ๐0 = ๐ธ[๐๐ก]
The MRC parallel computation model [KSVโ10]โข Inspired by the MapReduce paradigm.โข Synchronous round computation on ๐ ๐; ๐ฃ on tuples. On every round:
1. Map: apply local transformation on tuples in a streaming fashion.2. Reduce: polytime computation on aggregates of tuples with same
key.โข Constraintaints: ๐1โ๐ machines and space per machine; log๐๐ rounds.
โข Graph access model: Link server.
The Information Requirements of Influence Estimation
โข Question: how much of the graph to we need to know in order to estimate the influence?
โข Concretely: what is the query complexity of approximating ๐(๐0)?
Theorem: Let ๐ โ 0,1
3, ๐ฟ โ [0,
1
2]. For large enough ๐ = |๐|, โ๐บ = (๐, ๐ธ, ๐)
for which getting an estimate ๐(๐0) of ๐(๐0) s.t. 1 โ ๐ ๐ ๐0 โค ๐ ๐0 โค ๐(๐0) with confidence โฅ 1 โ ๐ฟ requires ฮฉ((1 โ 2๐ฟ) ๐ link server queries.
Main algorithm: Intermediate Layer; Estimating the Expectation
Main intuitions:1. High/low influence few/many samples needed,
2. Can compute the Riemann integral for CDF (๐ ๐0 = 0
โ1 โ ๐น(๐ผ) ๐๐ผ)
Lemma (informal):w.h.p., VerifyGuess returns True if ๐ > ๐๐๐๐๐ข๐๐๐๐.
Main Algorithm: Top Layer; Iterating on Candidate Values
Theorem: For any ๐ โ (0,1
4), InfEst provides a (1 + 8๐)-approx. w.h.p.
Algorithm: Bottom Layer; Bounded SamplesHighlights: Input: Seeds ๐0 โ ๐, link-server oracle ๐(โ ), ๐ฟ- # samples, bound ๐ก.1. Simultaneous prob-BFS in MapReduce, one layer per round.2. Finalize a sample when at least ๐ก nodes infected.3. Return fraction of samples that reached bound.
Theorem (informal): 1) If ๐ฟ is suff. small, # of tuples is ฮฉ(๐1.5 log4 ๐).2) If ๐๐๐๐ ๐บ = log๐ ๐, algorithm takes polylog rounds.
Experimental Results
1
2
5
3
6
7
4
๐บ
๐
๐
๐ฅ
๐ ๐
๐0
๐๐ฅ๐
๐๐ฅ๐๐๐๐๐๐๐
๐๐๐
L-S๐(๐ข)
๐+(๐ข)
๐
Running timeApprox. ratios (Epinions)
Heuristics(Epinions)
Scalable estimation of the influence of individuals in massive social networks