68
Topic models Source: “Topic models”, David Blei, MLSS ‘09

Topic models Source: Topic models, David Blei, MLSS 09

Embed Size (px)

Citation preview

Page 1: Topic models Source: Topic models, David Blei, MLSS 09

Topic models

Source: “Topic models”, David Blei, MLSS ‘09

Page 2: Topic models Source: Topic models, David Blei, MLSS 09

Topic modeling - Motivation

Page 3: Topic models Source: Topic models, David Blei, MLSS 09

Discover topics from a corpus

Page 4: Topic models Source: Topic models, David Blei, MLSS 09

Model connections between topics

Page 5: Topic models Source: Topic models, David Blei, MLSS 09

Model the evolution of topics over time

Page 6: Topic models Source: Topic models, David Blei, MLSS 09

Image annotation

Page 7: Topic models Source: Topic models, David Blei, MLSS 09

Extensions*

• Malleable: Can be quickly extended for data with tags (side information), class label, etc

• The (approximate) inference methods can be readily translated in many cases

• Most datasets can be converted to ‘bag-of-words’ format using a codebook representation and LDA style models can be readily applied (can work with continuous observations too)

*YMMV

Page 8: Topic models Source: Topic models, David Blei, MLSS 09

Connection to ML research

Page 9: Topic models Source: Topic models, David Blei, MLSS 09

Latent Dirichlet Allocation

Page 10: Topic models Source: Topic models, David Blei, MLSS 09

LDA

Page 11: Topic models Source: Topic models, David Blei, MLSS 09

Probabilistic modeling

Page 12: Topic models Source: Topic models, David Blei, MLSS 09

Intuition behind LDA

Page 13: Topic models Source: Topic models, David Blei, MLSS 09

Generative model

Page 14: Topic models Source: Topic models, David Blei, MLSS 09

The posterior distribution

Page 15: Topic models Source: Topic models, David Blei, MLSS 09

Graphical models (Aside)

Page 16: Topic models Source: Topic models, David Blei, MLSS 09

LDA model

Page 17: Topic models Source: Topic models, David Blei, MLSS 09

Dirichlet distribution

Page 18: Topic models Source: Topic models, David Blei, MLSS 09

Dirichlet Examples

Darker implies lower magnitude

\alpha < 1 leads to sparser topics

Page 19: Topic models Source: Topic models, David Blei, MLSS 09

LDA

Page 20: Topic models Source: Topic models, David Blei, MLSS 09

Inference in LDA

Page 21: Topic models Source: Topic models, David Blei, MLSS 09

Example inference

Page 22: Topic models Source: Topic models, David Blei, MLSS 09

Example inference

Page 23: Topic models Source: Topic models, David Blei, MLSS 09

Topics vs words

Page 24: Topic models Source: Topic models, David Blei, MLSS 09

Explore and browse document collections

Page 25: Topic models Source: Topic models, David Blei, MLSS 09

Why does LDA “work” ?

Page 26: Topic models Source: Topic models, David Blei, MLSS 09

LDA is modular, general, useful

Page 27: Topic models Source: Topic models, David Blei, MLSS 09

LDA is modular, general, useful

Page 28: Topic models Source: Topic models, David Blei, MLSS 09

LDA is modular, general, useful

Page 29: Topic models Source: Topic models, David Blei, MLSS 09

Approximate inference

• An excellent reference is “On smoothing and inference for topic models” Asuncion et al. (2009).

Page 30: Topic models Source: Topic models, David Blei, MLSS 09

Posterior distribution for LDA

The only parameters we need to estimate are \alpha, \beta

Page 31: Topic models Source: Topic models, David Blei, MLSS 09

Posterior distribution

Page 32: Topic models Source: Topic models, David Blei, MLSS 09

Posterior distribution for LDA

• Can integrate out either \theta or z, but not both

• Marginalize \theta => z ~ Polya (\alpha)• Polya distribution also known as Dirichlet

compound multinomial (models “burstiness”)• Most algorithms marginalize out \theta

Page 33: Topic models Source: Topic models, David Blei, MLSS 09

MAP inference

• Integrate out z• Treat \theta as random variable• Can use EM algorithm• Updates very similar to that of PLSA (except

for additional regularization terms)

Page 34: Topic models Source: Topic models, David Blei, MLSS 09

Collapsed Gibbs sampling

Page 35: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference

Can think of this as extension of EM where we compute expectations w.r.t “variational distribution” instead of true posterior

Page 36: Topic models Source: Topic models, David Blei, MLSS 09

Mean field variational inference

Page 37: Topic models Source: Topic models, David Blei, MLSS 09

MFVI and conditional exponential families

Page 38: Topic models Source: Topic models, David Blei, MLSS 09

MFVI and conditional exponential families

Page 39: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference

Page 40: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference for LDA

Page 41: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference for LDA

Page 42: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference for LDA

Page 43: Topic models Source: Topic models, David Blei, MLSS 09

Collapsed variational inference

• MFVI: \theta, z assumed to be independent• \theta can be marginalized out exactly• Variational inference algorithm operating on

the “collapsed space” as CGS• Strictly better lower bound than VB• Can think of “soft” CGS where we propagate

uncertainty by using probabilities than samples

Page 44: Topic models Source: Topic models, David Blei, MLSS 09

Estimating the topics

Page 45: Topic models Source: Topic models, David Blei, MLSS 09

Inference comparison

Page 46: Topic models Source: Topic models, David Blei, MLSS 09

Comparison of updates

“On smoothing and inference for topic models” Asuncion et al. (2009).

MAP

VB

CVB0

CGS

Page 47: Topic models Source: Topic models, David Blei, MLSS 09

Choice of inference algorithm

• Depends on vocabulary size (V) , number of words per document (say N_i)

• Collapsed algorithms – Not parallelizable• CGS - need to draw multiple samples of topic

assignments for multiple occurrences of same word (slow when N_i >> V)

• MAP – Fast, but performs poor when N_i << V• CVB0 - Good tradeoff between computational

complexity and perplexity

Page 48: Topic models Source: Topic models, David Blei, MLSS 09

Supervised and relational topic models

Page 49: Topic models Source: Topic models, David Blei, MLSS 09

Supervised LDA

Page 50: Topic models Source: Topic models, David Blei, MLSS 09

Supervised LDA

Page 51: Topic models Source: Topic models, David Blei, MLSS 09

Supervised LDA

Page 52: Topic models Source: Topic models, David Blei, MLSS 09

Supervised LDA

Page 53: Topic models Source: Topic models, David Blei, MLSS 09

Variational inference in sLDA

Page 54: Topic models Source: Topic models, David Blei, MLSS 09

ML estimation

Page 55: Topic models Source: Topic models, David Blei, MLSS 09

Prediction

Page 56: Topic models Source: Topic models, David Blei, MLSS 09

Example: Movie reviews

Page 57: Topic models Source: Topic models, David Blei, MLSS 09

Diverse response types with GLMs

Page 58: Topic models Source: Topic models, David Blei, MLSS 09

Example: Multi class classification

Page 59: Topic models Source: Topic models, David Blei, MLSS 09

Supervised topic models

Page 60: Topic models Source: Topic models, David Blei, MLSS 09

Upstream vs downstream models

Upstream: Conditional modelsDownstream: The predictor variable is generated based on actually observed z than \theta which is E(z’s)

Page 61: Topic models Source: Topic models, David Blei, MLSS 09

Relational topic models

Page 62: Topic models Source: Topic models, David Blei, MLSS 09

Relational topic models

Page 63: Topic models Source: Topic models, David Blei, MLSS 09

Relational topic models

Page 64: Topic models Source: Topic models, David Blei, MLSS 09

Predictive performance of one type given the other

Page 65: Topic models Source: Topic models, David Blei, MLSS 09

Predicting links from documents

Page 66: Topic models Source: Topic models, David Blei, MLSS 09

Predicting links from documents

Page 67: Topic models Source: Topic models, David Blei, MLSS 09

Things we didn’t address

• Model selection: Non parametric Bayesian approaches

• Hyperparameter tuning• Evaluation can be a bit tricky (comparing

approximate bounds) for LDA, but can use traditional metrics in supervised versions

Page 68: Topic models Source: Topic models, David Blei, MLSS 09

Thank you!