Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Inference and Representation: Lab 9Extensions of LDA
Yacine Jernite
November 6, 2014
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Lecture plan
Notes on MCMC methods
LDA inference and learning
Variations of LDA
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Notes on MCMC
Stationary distribution of MCMC satisfies detailed balance:
T (x ′|x)P(x) = T (x |x ′)P(x ′)
Hence:P(x ′) =
∑x
T (x ′|x)P(x)
∀n > N, xn ∼ P
(xn, xn+1, . . . , xn+M) not i.i.d. from P
However, 1M
∑Ml=1 xn+l unbiased estimator of EP [x ]
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Notes on MCMC
Ergodicity of Gibbs SamplingIrreducible: iff all variables can be explored. Order or randomAperiodic: ∀X0,∀X ,∀n > N0,Pn(X |X0) > 0Depends on the model.
Collapsed Gibbs Sampling
A B C
A C
Figure : P(C |A) =∑
B P(B,C |A) =∑
B P(C |B)P(B|A).Empirical estimate of E[B] from sample of A.
Remember: drop all constants in derivations!
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Lecture plan: LDA
Brief history
Model description and independence assumptions
Treewidth and cost of exact inference
Approximate inference method
Learning algorithm
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
History of topic modelling
LSI (Deerwester et al., 1990): classifying documents frombag-of-word representation, SVD of tf-idf counts.
w1 w2 . . . wV
d1 0.01 0.021 . . . 0.005
d2 0.0031 0.102 . . . 0.3
. . . . . . . . . . . . . . .
dM 0.11 0.0041 . . . 0.093
pLSI (Hofmann, 1999):
p(d ,wn) = p(d)∑z
p(wn|z)p(z |d)
LDA (Blei and Jordan, 2003): Admixture model of text.
Extensions (2003-present): Tailored to many problems.
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Model description and independence assumptions
Figure : Plate models for pLSI (top) and LDA (bottom)
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Treewidth and cost of exact inference
Expanded model is a tree
BUT:
p(θ, z|w, α, β) =p(θ, z,w|α, β)
p(w|α, β)
with:
p(w|α, β) ∝∫ ( k∏
i=1
θαi−1i
) N∏n=1
k∑i=1
V∏j=1
(θiβij)w jn
dθ
Intractable.
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Approximate inference method: LDA
Mean field inference: see last week, assignment
Find fully factorized q(θ, z) = q(θ)∏n
i=1 q(zi ) that minimizesDKL(q||p(·|w, α, β))Easily get pseudo-marginals.
Gibbs sampling: see assignment
Sample θ ∼ p(θ|z,w, α, β), sample zi ∼ p(zi |θ, z−i ,w, α, β)Collapsed version: zi ∼ p(zi |z−i ,w, α, β)Get empirical estimates of marginals.
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Learning algorithm
Variational EM: see last week
E step is approximate inference: find qM step maxα,β Eq[log(p(θ, z,w, α, β))]
Bayesian Gibbs sampling
Bayesian prior on α, β reduces learning to inference.
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Learning algorithm: Gibbs Sampling
By putting a prior distribution on the parameters, they becomerandom variables which can be sampled within the Gibbs Samplingalgorithm:
α0
β0
Figure : Putting a Bayesian prior on the parameters: α ∼ Dirichlet(·;α0)(optional, see Mallet) and β ∼ Dirichlet(·;β0)
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Lecture plan: Variations of LDA
Time and influence
Correlated topics
Supervised topic models
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Dynamic topic models
First idea: introduce variability across time for α, β.
Figure : Dynamic Topic Model (Blei and Lafferty, 2006)
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Dynamic topic models
Parametrization of P(αt |αt−1) and P(βt |βt−1)
For the variational EM algorithms: keep dependences
βt-1
θt-1
zt-1
NM
βt
θt
zt
NM
βt+1
θt+1
zt+1
NM
Figure : q(z, θ, β) =∏k=1K q(βk,1, . . . , βk,T )
∏Tt=1
∏Dt
d=1
(q(θt,d)
∏Nt,d
n=1 q(zt,d,n))
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Dynamic topic models: modelling influence
Next step: what drives the changes in β:
Figure : Document Influence Model, (Gerrish and Blei, 2010)
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Dynamic topic models: Document Influence Model
Originally applied to measuring impact of scholarlypublications
Parametrization of influence:
βk,t+1|βk,t , (w, l, z)t
∼ N (·;βk,t + exp(−βk,t)∑d
ld ,k∑n
wd ,nzd ,n,k , σ2I )
Learnt with variational EM: same with independent l :q(z, θ, β) =∏
k=1K q(βk,1, . . . , βk,T )∏T
t=1
∏Dtd=1
(q(θt,d)q(lt,d)
∏Nt,d
n=1 q(zt,d ,n))
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Dynamic topic models: Document Influence Model
Dynamic Topic Model: http://pdf.aminer.org/000/334/
521/dynamic_topic_models.pdf
Document Influence Model: https://www.cs.princeton.
edu/~blei/papers/GerrishBlei2010.pdf
Application to musical influence: http:
//jmlr.org/proceedings/papers/v28/shalit13.pdf
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Introducing structure over topics
Correlated Topic Model (Blei and Lafferty, 2006)
models 2nd order moments of topics with logistic normal
distribution: θ0 ∼ N (·;α,Σ), and θk =exp(θ0k )∑k′ exp(θ
0k′ )
Original paper: fully factorized mean field approximation andTaylor expansionK 2 additional parameters, only models 2nd order correlations.
Pachinko Allocation Model (Li and McCallum, 2006)
DAG structure on topicszw is the path in the DAG
zwi ∼ Mult(·; θ(d)zw(i−1))
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Pachinko Allocation Model: Inference
Figure : Different realizations of the Pachinko Allocation Model. Howmany parameters are used? Effects on topics structure?
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Pachinko Allocation Model: Inference
Inference with a Gibbs Sampler for the Four-level PAM structure:
zw1 = 1, θ are sampled as in LDA, and:
P(zw2 = tk , zw3 = tp|D, z−w , α, β) ∝
n(d)1k + α1k∑
k ′ n(d)1k ′ + α1k ′
×n(d)kp + αkp∑
p′ n(d)kp′ + αkp′
× npw + βw∑w ′ npw ′ + βw ′
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Pachinko Allocation Model: Learning
Moments matching at every step of the Gibbs Sampler for αxy
αxy ∝∑d
n(d)xy
n(d)x
Results: 6 super-topics, 12 sub-topics in Figure 3 ofhttp://people.cs.umass.edu/~mccallum/papers/
pam-icml06.pdf
(Correlated Topic Model: https://www.cs.princeton.
edu/~blei/papers/BleiLafferty2006.pdf)
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Supervised Topic Models
The inferred θ or z can be used as features in many predictiontasks.
Performance can be improved by jointly training therepresentation and the predictor.
Hence, supervised LDA:
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA
Supervised Topic Models: MedLDA
Supervised Latent Dirichlet Allocation (using GeneralizedLinear Models) (Blei and McAuliffe, 2007)https://www.cs.princeton.edu/~blei/papers/
BleiMcAuliffe2007.pdf
MedLDA (max margin objective) (Zhu et al., 2009)http://www.cs.cmu.edu/~amahmed/papers/zhu_ahmed_
xing_icml09.pdf
Gibbs MedLDA (using SVMs and Gibbs Sampling) (Zhu etal., 2014)http://jmlr.org/papers/volume15/zhu14a/zhu14a.pdf
Yacine Jernite Inference and Representation: Lab 9 Extensions of LDA