22
Analysis of Social Media MLD 10-802, LTI 11-772 William Cohen 10-16-010

Analysis of Social Media MLD 10-802, LTI 11-772

  • Upload
    lauren

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Analysis of Social Media MLD 10-802, LTI 11-772. William Cohen 10- 16- 010. Review - LDA. “Mixed membership”. Latent Dirichlet Allocation. . Randomly initialize each z m,n Repeat for t=1,…. For each doc m, word n Find Pr( z mn = k |other z’s) - PowerPoint PPT Presentation

Citation preview

Page 1: Analysis of Social Media MLD 10-802, LTI 11-772

Analysis of Social MediaMLD 10-802, LTI 11-772

William Cohen10-16-010

Page 2: Analysis of Social Media MLD 10-802, LTI 11-772

Review - LDA

• Latent Dirichlet Allocation

z

w

M

N

a • Randomly initialize each zm,n

• Repeat for t=1,….• For each doc m, word n

• Find Pr(zmn=k|other z’s)

• Sample zmn according to that distr.

“Mixed membership”

Page 3: Analysis of Social Media MLD 10-802, LTI 11-772

Outline

• Stochastic block models & inference question• Review of text models

– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)

• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models

– Latent-space models, exchangeable graphs, p1, ERGM

Page 4: Analysis of Social Media MLD 10-802, LTI 11-772

Parkkinen et al paper

Page 5: Analysis of Social Media MLD 10-802, LTI 11-772

Another mixed membership block model

Page 6: Analysis of Social Media MLD 10-802, LTI 11-772

Another mixed membership block model

z=(zi,zj) is a pair of block ids

nz = #pairs z

qz1,i = #links to i from block z1

qz1,. = #outlinks in block z1

δ = indicator for diagonal

M = #nodes

Page 7: Analysis of Social Media MLD 10-802, LTI 11-772

Another mixed membership block model

Page 8: Analysis of Social Media MLD 10-802, LTI 11-772

Another mixed membership block model

Page 9: Analysis of Social Media MLD 10-802, LTI 11-772

Outline

• Stochastic block models & inference question• Review of text models

– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)

• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models

– Latent-space models, exchangeable graphs, p1, ERGM

Page 10: Analysis of Social Media MLD 10-802, LTI 11-772

Latent Space Model

• Each node i has a latent position in Euclidean space, z(i)

• z(i)’s drawn from a mixture of Gaussians• Probability of interaction between i and j

depend on the distance between z(i) and z(j)• Inference is a little more complicated…

[Handcock & Raftery, 2007]

Page 11: Analysis of Social Media MLD 10-802, LTI 11-772

Airoldi’s MMSBM

Page 12: Analysis of Social Media MLD 10-802, LTI 11-772
Page 13: Analysis of Social Media MLD 10-802, LTI 11-772
Page 14: Analysis of Social Media MLD 10-802, LTI 11-772

Outline

• Stochastic block models & inference question• Review of text models

– Mixture of multinomials & EM– LDA and Gibbs (or variational EM)

• Block models and inference• Mixed-membership block models• Multinomial block models and inference w/ Gibbs• Beastiary of other probabilistic graph models

– Latent-space models, exchangeable graphs, p1, ERGM

Page 15: Analysis of Social Media MLD 10-802, LTI 11-772

Exchangeable Graph Model

• Defined by a 2k x 2k table q(b1,b2)• Draw a length-k bit string b(n) like 01101 for

each node n from a uniform distribution.• For each pair of node n,m

– Flip a coin with bias q(b(n),b(m))– If it’s heads connect n,m

complicated• Pick k-dimensional vector u from a

multivariate normal w/ variance α and covariance β – so ui’s are correlated.

• Pass each ui thru a sigmoid so it’s in [0,1] – call that pi

• Pick bi using pi

Page 16: Analysis of Social Media MLD 10-802, LTI 11-772

Exchangeable Graph Model

• Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.

• Pass each ui thru a sigmoid so it’s in [0,1] – call that pi

• Pick bi using pi

If α is big then ux,uy are really big (or small) so px,py will end up in a corner.

0 1

1

Page 17: Analysis of Social Media MLD 10-802, LTI 11-772

Exchangeable Graph Model

• Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.

• Pass each ui thru a sigmoid so it’s in [0,1] – call that pi

• Pick bi using pi

If α is big then ux,uy are really big (or small) so px,py will end up in a corner.

0 1

1

Page 18: Analysis of Social Media MLD 10-802, LTI 11-772

The p1 model for a directed graph• Parameters, per node i:

– Θ: background edge probability

– αi: “expansiveness” – how extroverted is i?

– βi: “popularity” – how much do others want to be with i?

– ρij: “reciprocation” – how likely is i to respond to an incomping link with an outgoing one?

)Pr(log

)Pr(log

)Pr(log

)....Pr(log

ij

ijij

jiij

ij

ji

ji

ji

ji

a

a

Logistic-regression like procedure can be used to fit this to data from a graph

+ ρij

Page 19: Analysis of Social Media MLD 10-802, LTI 11-772

Exponential Random Graph Model

• Basic idea:– Define some features of the graph (e.g., number of edges,

number of triangles, …)– Build a MaxEnt-style model based on these features

• General: – includes Erdos-Renyi, p1, …

• Issues– Partition function is intractible– Alternative: model conditional pseudo-likelihood of a each

edge (i.e., Pr(edge|rest of graph)

Page 20: Analysis of Social Media MLD 10-802, LTI 11-772

Kroneker product graphs

Page 21: Analysis of Social Media MLD 10-802, LTI 11-772

Kroneker product graphs

Page 22: Analysis of Social Media MLD 10-802, LTI 11-772

Kroneker product graphs

• Good fit to many commonly-observed network properties– scale-free degree distribution– diameter– …

• Gradient descent can be used to fit an “initiator matrix” to a real adjacency matrix