Social Influence & Community Detection

Outline Social Influence Community Detection

Social Influence & Community Detection

V.A. Traag

February 13, 2009


Outline

1 Social InfluenceIntroductionBA-modelSocial influence modelEmpirical resultsFurther research

2 Community DetectionIntroductionModularity & Potts modelNegative linksEmpirical exampleFurther research


Outline

1 Social InfluenceIntroductionBA-modelSocial influence modelEmpirical resultsFurther research

2 Community DetectionIntroductionModularity & Potts modelNegative linksEmpirical exampleFurther research


Introduction

• What items (e.g. movies, books) become popular?

• Based on an extension of the BA-model.(Social influence balancing parameter)

• Idea emerged from web based experiment of Salganik et al.(Science, 2006)


Experiment from Salganik et al.

More social influence 1...

More social influence 8

Social influence 1...

Social influence 8

No social influence 1...

No social influence 8

User arrival


Experiment from Salganik et al.

More social influence 1...

More social influence 8

Social influence 1...

Social influence 8

No social influence 1...

No social influence 8

User arrival

Mor

ein

equal

ity

and

unce

rtai

nty


BA-model

• Rich-get-richer effect.

• Web sites (items) attract links (votes) proportional to thenumber of links (votes).

k̇i = mki

∑

j kj

• Yields stationary degree distribution.

Pr(X = k) = 2m2k−3


Social influence

• Additional good-get-richer effect.

• Introduce quality φ ≥ 0 with mean quality µ and variance σ.

• Balance quality and popularity through parameter 0 ≤ λ ≤ 1.

• New differential equation

k̇i = m

[

(1 − λ)φi

∑

j φj

+ λki

∑

j kj

]

.


Theoretical results

Result is

ki (t) =

[

(

t

ti

)λ

− 1

]

(1 − λ)mφi

µλ.

from which we can see that:

• Votes increase with time

• Older items obtain more votes

• Better items obtain more votes (might catch up with older,but worse, items)

• Higher social influence, changes growth pattern: less quicklyat introduction, but keeps growing more.


Theoretical results

• For invariant quality, the “uncertainty” distribution is

Pr(X = k|φ) =µ((1 − λ)mφ)

1λ

(kλµ + (1 − λ)mφ)(1+1λ).

• Mean popularity and variance

E (X |φ) =mφ

µand Var(X |φ) =

E (X |φ)2

1 − 2λ.

• Expected number of votes rise with quality

• Uncertainty rises with quality and with social influence

• In congruence with experiment from Salganik et al.


Theoretical results

• Quality distribution is ρ(φ) with mean µ and variance σ.

• The “popularity” distribution can be deduced as

Pr(X = k) =

∫ φmax

φmin

ρ(φ) Pr(X = k|φ)dφ.

• In general, mean popularity and variance is

E (X ) = m and Var(X ) =m2(2σ(1 − λ) + µ2)

µ2(1 − 2λ).


Empirical results

• Quality usually a problem, how to estimate it?

• Workaround: assume a quality distribution (e.g. Dirac,Exponential).

• Compare empirical popularity distribution (#views, #sales) totheoretical distribution.

• Estimate social influence parameter λ using MLE.


10-4

10-3

10-2

10-1

100

10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103

HollywoodYouTube

Fit (Hollywood)Fit (YouTube)

k

Pr(

x>

k)

YouTube1 λ ≈ 0.878

Hollywood1 λ ≈ 0.663

1Assuming an exponential distribution


Other results

• Other research from Pennock et al. shows additional results.

• Hyperlink distribution per category of websites.

• Relatively high for companies (0.950) and newspapers (0.948).

• Relatively low for universities (0.612) and scientists (0.602).

• Might be used as a rough estimate of the amount of socialinfluence.


Social Influence

• Introduce parameter social influence parameter λ on network.

• Balance between own preferences and preferences of others.

• Spreading (cascading) of preferences.

• Updating of exclusive preferences might result in communitydetection algorithm.

• Popularity of items = size of communities?

One separate topic: estimate social influence in citationdistributions over the last few years. Has it increased?


Social Balance Theory

E1

E2

AB

C

D • Triads (a triple set of nodes) are balancedif their relationships are “symmetric”.

• Triad i , j , k is balanced if AijAikAjk = 1.

• If network is balanced, is can be split intwo communities. (Harary, 1953)

• Social balance can be extended tok-balanced: a k-cycle does not containexactly one negative edge.

• For unbalanced (or k-balanced) networks,how can communities be assigned suchthat nodes form cohesive groups?


Modularity

Definition

Modularity Q = 1m

∑

ij(Aij − pij)δ(σi , σj).

Newman & Girvan.

• Modularity can also be expressed as

Q =1

m

∑

c

ac − ec .

• Optimising modularity yields a good community assignment.


Potts approach

• Potts approach by Reichardt and Bornholdt: reward “allowed”links, penalise “forbidden” links.

Allowed • Links within communities(reward aij = γpij).

Forbidden • Absent links within communities(penalty bij = 1 − γpij).

• Formulated as a “energy/cost” function (Hamiltonian):

H =∑

ij

−aijAijδ(σi , σj ) + bij(1 − Aij)δ(σi , σj )

• Reformulated equals modularity (if γ = 1)

−1

mQ = H = −

∑

ij

(Aij − γpij)δ(σi , σj)

• Results in a tuneable (γ) version of modularity.


Problem with negative links

ik = 1 j k = 1

k k = −1

Negative links poses problem formodularity. Probabilities pij not welldefined.

A =

+ + −+ + −− − +

Q =1

m

∑

ij

(

Aij −kikj

m

)

δ(σi , σj)

= 0


Negative links

• Solution is to change “allowed” and “forbidden” links:

Allowed • Positive links within communities(reward aij = γp+

ij ).• Absent negative links within communities

(reward dij = λp−

ij ).Forbidden • Absent positive links within communities

(penalty bij = 1 − γp+ij ).

• Negative links within communities(penalty cij = 1 − λp−

ij ).

• Results in two separate Hamiltonian

H+ = −∑

ij

(A+ij − γp+

ij )δ(σi , σj) and

H− = −∑

ij

(A−

ij − λp−

ij )δ(σi , σj).


Hamiltonian

• We weigh both Hamiltonians equally.

• This results in

−1

mQ = H+ −H−

= −∑

ij

(Aij − (γp+ij − λp−

ij ))δ(σi , σj)

• Changing the expected values in modularity, allowscommunity detection in networks with negative links.


Empirical example

γ = 1, λ = 1


Empirical example

γ = 0.3, λ = 1


Empirical example

γ = 1, λ = 2


Further research

• Apply community detection scheme to citation networks.

• Communities in unsigned networks are ’thematic’ clusters.

• Communities in signed networks are ’positional’ clusters.

• For example: Dutch opinion makers.