46
Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 1 Byron C. Wallace

Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Machine Learning 2DS 4420 - Spring 2020

Topic Modeling 1Byron C. Wallace

Page 2: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Last time: Clustering —> Mixture Models —>

Expectation Maximization (EM)

Page 3: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Today: Topic models

Page 4: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Data:

Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:

D = {t(i)}Ni=1 where t(i) � RM

Model: p�,�(t, z) = p�(t|z)p�(z)

p�,�(t) =K�

z=1

p�(t|z)p�(z)

Joint:

Marginal:

Generative Story: z � Multinomial(�)

t � p�(·|z)

(Marginal) Log-likelihood:�(�) = HQ;

N�

i=1

p�,�(t(i))

=N�

i=1

HQ;K�

z=1

p�(t(i)|z)p�(z)

Mixture models

Slide credit: Matt Gormley and Eric Xing (CMU)

Page 5: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Data:

Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:

D = {t(i)}Ni=1 where t(i) � RM

Model: p�,�(t, z) = p�(t|z)p�(z)

p�,�(t) =K�

z=1

p�(t|z)p�(z)

Joint:

Marginal:

Generative Story: z � Multinomial(�)

t � p�(·|z)

(Marginal) Log-likelihood:�(�) = HQ;

N�

i=1

p�,�(t(i))

=N�

i=1

HQ;K�

z=1

p�(t(i)|z)p�(z)

Mixture models

Slide credit: Matt Gormley and Eric Xing (CMU)

Page 6: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Data:

Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:

D = {t(i)}Ni=1 where t(i) � RM

Model: p�,�(t, z) = p�(t|z)p�(z)

p�,�(t) =K�

z=1

p�(t|z)p�(z)

Joint:

Marginal:

Generative Story: z � Multinomial(�)

t � p�(·|z)

(Marginal) Log-likelihood:�(�) = HQ;

N�

i=1

p�,�(t(i))

=N�

i=1

HQ;K�

z=1

p�(t(i)|z)p�(z)

Mixture models

Slide credit: Matt Gormley and Eric Xing (CMU)

Page 7: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Data:

Assumewe are given data, D, consisting ofN fully unsupervised ex-amples in M dimensions:

D = {t(i)}Ni=1 where t(i) � RM

Model: p�,�(t, z) = p�(t|z)p�(z)

p�,�(t) =K�

z=1

p�(t|z)p�(z)

Joint:

Marginal:

Generative Story: z � Multinomial(�)

t � p�(·|z)

(Marginal) Log-likelihood:�(�) = HQ;

N�

i=1

p�,�(t(i))

=N�

i=1

HQ;K�

z=1

p�(t(i)|z)p�(z)

Mixture models

Slide credit: Matt Gormley and Eric Xing (CMU)

Page 8: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Naive Bayes

evaluate the classifier. One good evaluation metric is accuracy, which is defined as

# correctly predicted labels in the test set# of instances in the test set

(6)

2.2 Naive Bayes model

The naive Bayes model defines a joint distribution over words and classes–that is, the prob-ability that a sequence of words belongs to some class c.

p(c, w1:N |⇡, ✓) = p(c|⇡)NY

n=1

p(wn|✓c) (7)

⇡ is the probability distribution over the classes (in this case, the probability of seeing spame-mails and of seeing ham e-mails). It’s basically a discrete probability distribution overthe classes that sums to 1. ✓c is the class conditional probability distribution–that is, givena certain class, the probability distribution of the words in your vocabulary. Each class hasa probability for each word in the vocabulary (in this case, there is a set of probabilities forthe spam class and one for the ham class).Given a new test example, we can classify it using the model by calculating the conditionalprobability of the classes given the words in the e-mail p(c | wn) and see which class ismost likely. There is an implicit independence assumption behind the model. Since we’remultiplying the p(wn | ✓c) terms together, it is assumed that the words in a document areconditionally independent given the class.

2.3 Class prediction with naive Bayes

We classify using the posterior distribution of the classes given the words, which is propor-tional to the joint distribution since P (X|Y ) = P(X, Y)/P(Y) / P(X, Y). The posteriordistribution is

p(c|w1:N , ⇡, ✓) / p(c|⇡)NY

n=1

p(wn|✓c) (8)

Note that we don’t need a normalizing constant because we only care about which proba-bility is bigger. The classifier assigns the e-mail to the class which has highest probability.

2.4 Fitting a naive Bayes model with maximum likelihood

We find the parameters in the posterior distribution used in class prediction by learningon the labeled training data (in this case, the ham and spam e-mails). We use the maxi-mum likelihood method method in finding parameters that maximize the likelihood of theobserved data set. Given data {wd,1:N , cd}D

d=1, the likelihood under a certain model withparameters (✓1:C , ⇡) is

p(D|✓1:C , ⇡) =DY

d=1

p(cd|⇡)NY

n=1

p(wn|✓cd)!

3

evaluate the classifier. One good evaluation metric is accuracy, which is defined as

# correctly predicted labels in the test set# of instances in the test set

(6)

2.2 Naive Bayes model

The naive Bayes model defines a joint distribution over words and classes–that is, the prob-ability that a sequence of words belongs to some class c.

p(c, w1:N |⇡, ✓) = p(c|⇡)NY

n=1

p(wn|✓c) (7)

⇡ is the probability distribution over the classes (in this case, the probability of seeing spame-mails and of seeing ham e-mails). It’s basically a discrete probability distribution overthe classes that sums to 1. ✓c is the class conditional probability distribution–that is, givena certain class, the probability distribution of the words in your vocabulary. Each class hasa probability for each word in the vocabulary (in this case, there is a set of probabilities forthe spam class and one for the ham class).Given a new test example, we can classify it using the model by calculating the conditionalprobability of the classes given the words in the e-mail p(c | wn) and see which class ismost likely. There is an implicit independence assumption behind the model. Since we’remultiplying the p(wn | ✓c) terms together, it is assumed that the words in a document areconditionally independent given the class.

2.3 Class prediction with naive Bayes

We classify using the posterior distribution of the classes given the words, which is propor-tional to the joint distribution since P (X|Y ) = P(X, Y)/P(Y) / P(X, Y). The posteriordistribution is

p(c|w1:N , ⇡, ✓) / p(c|⇡)NY

n=1

p(wn|✓c) (8)

Note that we don’t need a normalizing constant because we only care about which proba-bility is bigger. The classifier assigns the e-mail to the class which has highest probability.

2.4 Fitting a naive Bayes model with maximum likelihood

We find the parameters in the posterior distribution used in class prediction by learningon the labeled training data (in this case, the ham and spam e-mails). We use the maxi-mum likelihood method method in finding parameters that maximize the likelihood of theobserved data set. Given data {wd,1:N , cd}D

d=1, the likelihood under a certain model withparameters (✓1:C , ⇡) is

p(D|✓1:C , ⇡) =DY

d=1

p(cd|⇡)NY

n=1

p(wn|✓cd)!

3

The model

Page 9: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Slide credit: Matt Gormley and Eric Xing (CMU)

Initialize parameters randomlywhile not converged

1. E-Step:Create one training example for each possible value of the latent variables Weight each example according to model’s confidenceTreat parameters as observed

2. M-Step:Set the parameters to the values that maximizes likelihoodTreat pseudo-counts from above as observed

(Soft) EM

Page 10: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

And for NB

For soft EM expected of timest occurs in C

Pct e PCZi L Count tin Xi

F Plz c Ix ila

Total Token Count in Xi

Page 11: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

TOPIC MODELS

Some content borrowed from: David Blei(Columbia)

Page 12: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Models: Motivation

• Suppose we have a giant dataset (“corpus”) of text, e.g., all of the NYTimes or all emails from a company

❖ Cannot read all documents ❖ But want to get a sense of what they contain

Page 13: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Models: Motivation• Topic models are a way of uncovering, well,

“topics” (themes) in a set of documents

• Topic models are unsupervised

• Can be viewed as a type of clustering, so follows naturally from prior lectures; will come back to this.

Page 14: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Models: Motivation• Topic models are a way of uncovering, well,

“topics” (themes) in a set of documents

• Topic models are unsupervised

• Can be viewed as a type of clustering, so follows naturally from prior lectures; will come back to this.

Page 15: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Models: Motivation• Topic models are a way of uncovering, well,

“topics” (themes) in a set of documents

• Topic models are unsupervised

• Can be viewed as a sort of soft clustering of documents into topics.

Page 16: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Modeling: Beyond Bag-of-Words

in this paper. To enable a direct comparison, Dirichlethyperpriors could also be placed on the hyperparame-ters of the priors described in section 3.

7. Conclusions

Creating a single model that integrates bigram-basedand topic-based approaches to document modeling hasseveral benefits. Firstly, the predictive accuracy of thenew model, especially when using prior 2, is signifi-cantly better than that of either latent Dirichlet al-location or the hierarchical Dirichlet language model.Secondly, the model automatically infers a separatetopic for function words, meaning that the other top-ics are less dominated by these words.

Acknowledgments

Thanks to Phil Cowans, David MacKay and FernandoPereira for useful discussions. Thanks to AndrewSu�eld for providing sparse matrix code.

References

Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I.

(2003). An introduction to MCMC for machine learning.

Machine Learning, 50, 5–43.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent

Dirichlet allocation. Journal of Machine Learning Re-search, 3, 993–1022.

Gri�ths, T. L., & Steyvers, M. (2004). Finding scientific

topics. Proceedings of the National Academy of Sciences,101, 5228–5235.

Gri�ths, T. L., & Steyvers, M. (2005). Topic mod-

eling toolbox. http://psiexp.ss.uci.edu/research/programs data/toolbox.htm.

Gri�ths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum,

J. B. (2004). Integrating topics and syntax. Advancesin Neural Information Processing Systems.

Jelinek, F., & Mercer, R. (1980). Interpolated estima-

tion of Markov source parameters from sparse data. In

E. Gelsema and L. Kanal (Eds.), Pattern recognition inpractice, 381–402. North-Holland publishing company.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journalof the American Statistical Association, 90, 773–795.

MacKay, D. J. C., & Peto, L. C. B. (1995). A hierarchical

Dirichlet language model. Natural Language Engineer-ing, 1, 289–307.

Minka, T. P. (2003). Estimating a Dirichlet dis-

tribution. http://research.microsoft.com/⇠minka/papers/dirichlet/.

Rennie, J. (2005). 20 newsgroups data set. http://people.csail.mit.edu/jrennie/20Newsgroups/.

Table 1. Top: The most commonly occurring words in

some of the topics inferred from the 20 Newsgroups train-

ing data by latent Dirichlet allocation. Middle: Some of

the topics inferred from the same data by the new model

with prior 1. Bottom: Some of the topics inferred by the

new model with prior 2. Each column represents a sin-

gle topic, and words appear in order of frequency of oc-

currence. Content words are in bold. Function words,

which are not in bold, were identified by their presence on

a standard list of stop words: http://ir.dcs.gla.ac.uk/resources/linguistic utils/stop words. All three sets

of topics were taken from models with 90 topics.

Latent Dirichlet allocation

the i that easter“number” is proteins ishtar

in satan the a

to the of the

espn which to have

hockey and i with

a of if but

this metaphorical “number” englishas evil you and

run there fact is

Bigram topic model (prior 1)

to the the the

party god and a

arab is between to

not belief warrior i

power believe enemy of

any use battlefield “number”

i there a is

is strong of in

this make there and

things i way it

Bigram topic model (prior 2)

party god “number” the

arab believe the to

power about tower a

as atheism clock and

arabs gods a of

political before power i

are see motherboard is

rolling atheist mhz “number”

london most socket it

security shafts plastic that

984

Example from Wallach, 2006

Topic 1 Topic 2 Topic 3 Topic 4

Page 17: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Key outputs• Topics Distributions over words; we hope these are

somehow thematically coherent

• Document-topics Probabilistic assignments of topics to documents

Page 18: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

https://en.wikipedia.org/wiki/Enron_scandal

Example: Enron emails

https://www.cs.cmu.edu/~enron/

Page 19: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Example: Enron emails1.3. Foundations 5

Table 1.1: Five topics from a twenty-five topic model fit on Enron e-mails. Exampletopics concern financial transactions, natural gas, the California utilities, federalregulation, and planning meetings. We provide the five most probable words fromeach topic (each topic is a distribution over all words).

Topic Terms3 trading financial trade product price6 gas capacity deal pipeline contract9 state california davis power utilities14 ferc issue order party case22 group meeting team process plan

and stock prices (Figure 1.1).The first half of a topic model connects topics to a jumbled “bag

of words”. When we say that a topic is about X, we are manuallyassigning a post hoc label (more on this in Chapter 3.1). It remains theresponsibility of the human consumer of topic models to go further andmake sense of these piles of straw (we discuss labeling the topics morein Chapter 3).

Making sense of one of these word piles by itself can be di�cult.The second half of a topic model links topics to individual documents.For example, the document in Figure 1.1 is about a California utility’sreaction to the short-term electricity market and exemplifies Topic 9from Figure ??. Considering examples of documents that are stronglyconnected to a topic, along with the words associated with the topic,can give us a more complete representation of the topic. If we get asense that Topic 9 is of interest, we can explore deeper to find otherdocuments.

1.3 Foundations

You might notice that we are using the general term “topic model”.There are many mathematical formulations of topic models and manyalgorithms that learn the parameters of those models from data. Al-

Example from Boyd-Graber, Hu and Mimno, 2017https://en.wikipedia.org/wiki/Enron_scandal

Page 20: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Document-topic probabilities6 The What and Wherefore of Topic Models

Yesterday, SDG&E filed a motion for adoption of an electricprocurement cost recovery mechanism and for an order short-ening time for parties to file comments on the mechanism. Theattached email from SDG&E contains the motion, an executivesummary, and a detailed summary of their proposals and rec-ommendations governing procurement of the net short energyrequirements for SDG&E’s customers. The utility requests a15-day comment period, which means comments would have tobe filed by September 10 (September 8 is a Saturday). Replycomments would be filed 10 days later.

Topic Probability9 0.4211 0.058 0.05

Figure 1.1: Example document from the Enron corpus and its association to topics.Although it does not contain the word “California”, it discusses a single Californiautility’s dissatisfaction with how much it is paying for electricity.

M × VM × K K × V ≈×

Topic Assignment

Topics

Dataset

Figure 1.2: A matrix formulation of finding K topics for a dataset with M documentsand V unique words. While this view of topic modeling includes approaches such aslatent semantic analysis (lsa, where the approximation is based on svd), we focuson probabilistic techniques in the rest of this survey.

Example from Boyd-Graber, Hu and Mimno, 2017

Page 21: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topics as Matrix Factorization

• One can view topics as a kind of matrix factorization

6 The What and Wherefore of Topic Models

Yesterday, SDG&E filed a motion for adoption of an electricprocurement cost recovery mechanism and for an order short-ening time for parties to file comments on the mechanism. Theattached email from SDG&E contains the motion, an executivesummary, and a detailed summary of their proposals and rec-ommendations governing procurement of the net short energyrequirements for SDG&E’s customers. The utility requests a15-day comment period, which means comments would have tobe filed by September 10 (September 8 is a Saturday). Replycomments would be filed 10 days later.

Topic Probability9 0.4211 0.058 0.05

Figure 1.1: Example document from the Enron corpus and its association to topics.Although it does not contain the word “California”, it discusses a single Californiautility’s dissatisfaction with how much it is paying for electricity.

M × VM × K K × V ≈×

Topic Assignment

Topics

Dataset

Figure 1.2: A matrix formulation of finding K topics for a dataset with M documentsand V unique words. While this view of topic modeling includes approaches such aslatent semantic analysis (lsa, where the approximation is based on svd), we focuson probabilistic techniques in the rest of this survey.

Figure from Boyd-Graber, Hu and Mimno, 2017

Page 22: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

• One can view topics as a kind of matrix factorization

6 The What and Wherefore of Topic Models

Yesterday, SDG&E filed a motion for adoption of an electricprocurement cost recovery mechanism and for an order short-ening time for parties to file comments on the mechanism. Theattached email from SDG&E contains the motion, an executivesummary, and a detailed summary of their proposals and rec-ommendations governing procurement of the net short energyrequirements for SDG&E’s customers. The utility requests a15-day comment period, which means comments would have tobe filed by September 10 (September 8 is a Saturday). Replycomments would be filed 10 days later.

Topic Probability9 0.4211 0.058 0.05

Figure 1.1: Example document from the Enron corpus and its association to topics.Although it does not contain the word “California”, it discusses a single Californiautility’s dissatisfaction with how much it is paying for electricity.

M × VM × K K × V ≈×

Topic Assignment

Topics

Dataset

Figure 1.2: A matrix formulation of finding K topics for a dataset with M documentsand V unique words. While this view of topic modeling includes approaches such aslatent semantic analysis (lsa, where the approximation is based on svd), we focuson probabilistic techniques in the rest of this survey.

• We will try and take a more probabilistic view, but useful to keep this in mind

Figure from Boyd-Graber, Hu and Mimno, 2017

Topics as Matrix Factorization

Page 23: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Probabilistic Word Mixtures

Topics:Words:

Idea: Model text as a mixture over words (ignore order)Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Latent Dirichlet allocation (LDA)

Simple intuition: Documents exhibit multiple topics.

Page 24: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Modeling

Idea: Model corpus of documents with shared topics

Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Topics(shared)

Words in Document(mixture over topics)

Topic Proportions(document-specific)

Page 25: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Modeling

• Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Topics(shared)

Words in Document(mixture over topics)

Topic Proportions(document-specific)

Page 26: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Modeling

• Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Topics(shared)

Words in Document(mixture over topics)

Topic Proportions(document-specific)

Page 27: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Topic Modeling

• Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Topics(shared)

Words in Document(mixture over topics)

Topic Proportions(document-specific)

Page 28: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

xdn | zdn=k ⇠ Discrete(�k)<latexit sha1_base64="OX8/ryM68neOZUArDd/DTpCJlCE=">AAAGHHicfZTNbtQwEIDd0oWy/LVw5JKylyKtqqSt2l6QqkJFj6Xqn1SvVo4zu2tt4gTbabe1/Co8Ak/AkRvihpA4wLPg7EZiE0c4UjSa+cYz4xk7zGImle//Wli8t9S6/2D5YfvR4ydPn62sPj+XaS4onNE0TsVlSCTEjMOZYiqGy0wAScIYLsLx28J+cQ1CspSfqtsMegkZcjZglCir6q8cTfoaUxHpyHDj4YRF3l1Fs/YGr3ljD0uWeFjBROl3TFIBCsw6vtY4BEVMf/y6v9LxN/zp8lwhKIXOPpqt4/7q0hccpTRPgCsaEymvAj9TPU2EYjQG08a5hIzQMRnClRU5SUD29LRk41Wsp0FPD1KugNOKmyaJTIgaOcoCllUtHdnAIKphS2VPF7tEINmQV73CxLTbOIKBPf5pZjoK4xyMPnl/YLTf3dnqBpu7poYIiEoi2PO79qsDQwHAS2Rvuxvs7LlMlosshn+QX2BFNgI43NA0SQiPNL4Gaq7s+WDgMhdQFGKbluhOYIxx4Blqfab2Np43Toy2gzGfoO3/xNSx2zmsKNRCtw5017TXnYN9bMKwGhUz52avmmmSN7C5w+YuJBxI1DOExpiQSRan3KlnMEdP52TgBo3nmLLHxZaxvdMRcXbMRs14NmIOe1LrzIkpxmWeIGKYENtnnGYgiEpFcelumBrFLGFK6tJuXC/G/+9l7fVgh6Y6lMU/DPWhcUgaxtPBrJ6dO6H21apyRZUN2FBUsVnjGsCsBpYHXJD2vQvqr5srnG9uBFb+sN3ZPyhfvmX0Er1C6yhAu2gfHaFjdIYo+ox+oN/oT+tT62vrW+v7DF1cKH1eoMpq/fwLOsw/og==</latexit><latexit sha1_base64="OX8/ryM68neOZUArDd/DTpCJlCE=">AAAGHHicfZTNbtQwEIDd0oWy/LVw5JKylyKtqqSt2l6QqkJFj6Xqn1SvVo4zu2tt4gTbabe1/Co8Ak/AkRvihpA4wLPg7EZiE0c4UjSa+cYz4xk7zGImle//Wli8t9S6/2D5YfvR4ydPn62sPj+XaS4onNE0TsVlSCTEjMOZYiqGy0wAScIYLsLx28J+cQ1CspSfqtsMegkZcjZglCir6q8cTfoaUxHpyHDj4YRF3l1Fs/YGr3ljD0uWeFjBROl3TFIBCsw6vtY4BEVMf/y6v9LxN/zp8lwhKIXOPpqt4/7q0hccpTRPgCsaEymvAj9TPU2EYjQG08a5hIzQMRnClRU5SUD29LRk41Wsp0FPD1KugNOKmyaJTIgaOcoCllUtHdnAIKphS2VPF7tEINmQV73CxLTbOIKBPf5pZjoK4xyMPnl/YLTf3dnqBpu7poYIiEoi2PO79qsDQwHAS2Rvuxvs7LlMlosshn+QX2BFNgI43NA0SQiPNL4Gaq7s+WDgMhdQFGKbluhOYIxx4Blqfab2Np43Toy2gzGfoO3/xNSx2zmsKNRCtw5017TXnYN9bMKwGhUz52avmmmSN7C5w+YuJBxI1DOExpiQSRan3KlnMEdP52TgBo3nmLLHxZaxvdMRcXbMRs14NmIOe1LrzIkpxmWeIGKYENtnnGYgiEpFcelumBrFLGFK6tJuXC/G/+9l7fVgh6Y6lMU/DPWhcUgaxtPBrJ6dO6H21apyRZUN2FBUsVnjGsCsBpYHXJD2vQvqr5srnG9uBFb+sN3ZPyhfvmX0Er1C6yhAu2gfHaFjdIYo+ox+oN/oT+tT62vrW+v7DF1cKH1eoMpq/fwLOsw/og==</latexit><latexit sha1_base64="OX8/ryM68neOZUArDd/DTpCJlCE=">AAAGHHicfZTNbtQwEIDd0oWy/LVw5JKylyKtqqSt2l6QqkJFj6Xqn1SvVo4zu2tt4gTbabe1/Co8Ak/AkRvihpA4wLPg7EZiE0c4UjSa+cYz4xk7zGImle//Wli8t9S6/2D5YfvR4ydPn62sPj+XaS4onNE0TsVlSCTEjMOZYiqGy0wAScIYLsLx28J+cQ1CspSfqtsMegkZcjZglCir6q8cTfoaUxHpyHDj4YRF3l1Fs/YGr3ljD0uWeFjBROl3TFIBCsw6vtY4BEVMf/y6v9LxN/zp8lwhKIXOPpqt4/7q0hccpTRPgCsaEymvAj9TPU2EYjQG08a5hIzQMRnClRU5SUD29LRk41Wsp0FPD1KugNOKmyaJTIgaOcoCllUtHdnAIKphS2VPF7tEINmQV73CxLTbOIKBPf5pZjoK4xyMPnl/YLTf3dnqBpu7poYIiEoi2PO79qsDQwHAS2Rvuxvs7LlMlosshn+QX2BFNgI43NA0SQiPNL4Gaq7s+WDgMhdQFGKbluhOYIxx4Blqfab2Np43Toy2gzGfoO3/xNSx2zmsKNRCtw5017TXnYN9bMKwGhUz52avmmmSN7C5w+YuJBxI1DOExpiQSRan3KlnMEdP52TgBo3nmLLHxZaxvdMRcXbMRs14NmIOe1LrzIkpxmWeIGKYENtnnGYgiEpFcelumBrFLGFK6tJuXC/G/+9l7fVgh6Y6lMU/DPWhcUgaxtPBrJ6dO6H21apyRZUN2FBUsVnjGsCsBpYHXJD2vQvqr5srnG9uBFb+sN3ZPyhfvmX0Er1C6yhAu2gfHaFjdIYo+ox+oN/oT+tT62vrW+v7DF1cKH1eoMpq/fwLOsw/og==</latexit><latexit sha1_base64="ZZEqi/4sMBZVC8LRe6l9DxkYkYM=">AAAGHHicfZTNbtQwEIBd6EJZ/lo4ctmylyKtqqRUbS9IVaGix1L1T6pXK8eZ3bXWcYLttNtafhUegSfgyA1xQ0gc4FlwtpHYxBGOFI1mvvHMeMaOMs6UDoJfC3fuLrbu3V960H746PGTp8srz05VmksKJzTlqTyPiALOBJxopjmcZxJIEnE4iyZvC/vZJUjFUnGsrzPoJ2Qk2JBRop1qsHwwHRhMZWxiK2wHJyzu3FQ0q2/wamfSwYolHaxhqs07pqgEDXYNXxocgSZ2MHk1WO4G68FsdXwhLIUuKtfhYGXxC45TmicgNOVEqYswyHTfEKkZ5WDbOFeQETohI7hwoiAJqL6ZlWw7Fetx2DfDVGgQtOJmSKISoseesoBVVUvHLjDIathS2TfFLjEoNhJVryix7TaOYeiOf5aZiSOegzVH7/esCXpbr3vhxratIRLikgh3gp776sBIAogS2dnshVs7PpPlMuPwDwoKrMhGgoArmiYJEbHBl0DthTsfDELlEopCXNMS0w2ttR58izqfmb2N541Ta9xgzCfo+j+1dex6DisKddC1B9007XXjYR+bMKzHxcz52etmmuQNbO6xuQ9JD5L1DKExJmSK8VR49Qzn6NmcDP2gfI4pe1xsyd2djom3YzZuxrMx89ijWmeObDEu8wSRo4S4PuM0A0l0KotLd8X0mLOEaWVKu/W9mPi/l7PXg+3b6lAW/ygy+9YjacRng1k9O39C3atV5YoqG7CRrGK3jWsAsxpYHnBBuvcurL9uvnC6sR46+cNmd3evfPmW0Av0Eq2hEG2jXXSADtEJougz+oF+oz+tT62vrW+t77fonYXS5zmqrNbPv+juP2I=</latexit>

zdn ⇠ Discrete(✓d)<latexit sha1_base64="O1i9JMBjmHNm9M2eTNvBxn+6U6w=">AAAGB3icfZTNbtQwEIDd0oWy/HQLRy4r9lKkVZW0VdtjVajgWKr+SfVq5Tizu1YdJ9hO/yw/AI/AU3BDHJAQN3gE3gZnN4JNHOFI0cjzjWfGM54o40zpIPi9sHhvqXX/wfLD9qPHT56udFafnao0lxROaMpTeR4RBZwJONFMczjPJJAk4nAWXb4u9GdXIBVLxbG+zWCQkLFgI0aJdlvDzubd0GAqYxNbYbtYsaSLNdxo84YpKkGDXcN6Apr8xeyrYacXrAfT1fWFsBR6e2i2DoerS99wnNI8AaEpJ0pdhEGmB4ZIzSgH28a5gozQSzKGCycKkoAamGl2tlvRHocDM0qFBkErZoYkKiF64m0WsKru0olzDLLqttwcmOKUGBQbi6pVlNh2G8cwcjc9jczEEc/BmqO3+9YE/e3NfrixY2uIhLgkwt2g7746MJYAokR2t/rh9q7PZLnMOPyDggIropEg4JqmSUJEbPAVUHvh7geDULmEIhGDo8T0QmutB89QZzPVt/G88sYaV+75APGVubF17HYOKxJ10K0H3TWddedhH5qwWes1RK+baZI3sLnH5j4kPUjWI4RGn5ApxlPh5TOao6d9MvKd8jmmrHFxJHfPNybeidmkGc8mzGOPapU5skW7zBNEjhPi6ozTDCTRqSwe3TXTE84SppUp9da3YuL/Vk5fd3Zgq01Z/KPIHFiPpBGfNmb17vwOdbOoyhVZNmBjWcVmhWsAsxpYXnBBunkX1qebL5xurIdOfr/V29svJ98yeoFeojUUoh20h96hQ3SCKPqEvqOf6FfrY+tz60vr6wxdXChtnqPKav34A4dxOBg=</latexit><latexit sha1_base64="O1i9JMBjmHNm9M2eTNvBxn+6U6w=">AAAGB3icfZTNbtQwEIDd0oWy/HQLRy4r9lKkVZW0VdtjVajgWKr+SfVq5Tizu1YdJ9hO/yw/AI/AU3BDHJAQN3gE3gZnN4JNHOFI0cjzjWfGM54o40zpIPi9sHhvqXX/wfLD9qPHT56udFafnao0lxROaMpTeR4RBZwJONFMczjPJJAk4nAWXb4u9GdXIBVLxbG+zWCQkLFgI0aJdlvDzubd0GAqYxNbYbtYsaSLNdxo84YpKkGDXcN6Apr8xeyrYacXrAfT1fWFsBR6e2i2DoerS99wnNI8AaEpJ0pdhEGmB4ZIzSgH28a5gozQSzKGCycKkoAamGl2tlvRHocDM0qFBkErZoYkKiF64m0WsKru0olzDLLqttwcmOKUGBQbi6pVlNh2G8cwcjc9jczEEc/BmqO3+9YE/e3NfrixY2uIhLgkwt2g7746MJYAokR2t/rh9q7PZLnMOPyDggIropEg4JqmSUJEbPAVUHvh7geDULmEIhGDo8T0QmutB89QZzPVt/G88sYaV+75APGVubF17HYOKxJ10K0H3TWddedhH5qwWes1RK+baZI3sLnH5j4kPUjWI4RGn5ApxlPh5TOao6d9MvKd8jmmrHFxJHfPNybeidmkGc8mzGOPapU5skW7zBNEjhPi6ozTDCTRqSwe3TXTE84SppUp9da3YuL/Vk5fd3Zgq01Z/KPIHFiPpBGfNmb17vwOdbOoyhVZNmBjWcVmhWsAsxpYXnBBunkX1qebL5xurIdOfr/V29svJ98yeoFeojUUoh20h96hQ3SCKPqEvqOf6FfrY+tz60vr6wxdXChtnqPKav34A4dxOBg=</latexit><latexit sha1_base64="O1i9JMBjmHNm9M2eTNvBxn+6U6w=">AAAGB3icfZTNbtQwEIDd0oWy/HQLRy4r9lKkVZW0VdtjVajgWKr+SfVq5Tizu1YdJ9hO/yw/AI/AU3BDHJAQN3gE3gZnN4JNHOFI0cjzjWfGM54o40zpIPi9sHhvqXX/wfLD9qPHT56udFafnao0lxROaMpTeR4RBZwJONFMczjPJJAk4nAWXb4u9GdXIBVLxbG+zWCQkLFgI0aJdlvDzubd0GAqYxNbYbtYsaSLNdxo84YpKkGDXcN6Apr8xeyrYacXrAfT1fWFsBR6e2i2DoerS99wnNI8AaEpJ0pdhEGmB4ZIzSgH28a5gozQSzKGCycKkoAamGl2tlvRHocDM0qFBkErZoYkKiF64m0WsKru0olzDLLqttwcmOKUGBQbi6pVlNh2G8cwcjc9jczEEc/BmqO3+9YE/e3NfrixY2uIhLgkwt2g7746MJYAokR2t/rh9q7PZLnMOPyDggIropEg4JqmSUJEbPAVUHvh7geDULmEIhGDo8T0QmutB89QZzPVt/G88sYaV+75APGVubF17HYOKxJ10K0H3TWddedhH5qwWes1RK+baZI3sLnH5j4kPUjWI4RGn5ApxlPh5TOao6d9MvKd8jmmrHFxJHfPNybeidmkGc8mzGOPapU5skW7zBNEjhPi6ozTDCTRqSwe3TXTE84SppUp9da3YuL/Vk5fd3Zgq01Z/KPIHFiPpBGfNmb17vwOdbOoyhVZNmBjWcVmhWsAsxpYXnBBunkX1qebL5xurIdOfr/V29svJ98yeoFeojUUoh20h96hQ3SCKPqEvqOf6FfrY+tz60vr6wxdXChtnqPKav34A4dxOBg=</latexit><latexit sha1_base64="wK/6vFJ0W2rB0VpbrgTLA0uYUTE=">AAAGB3icfZTNbtQwEIDd0oWy/HQLRy4r9lKkVZW0VdtjVajgWKr+Sc1q5Tizu1YdJ9hO/yw/AI/AU3BDHJAQN3gE3gZ7N4JNHOFI0cjzjWfGM544Z1SqIPi9sHhvqXX/wfLD9qPHT56udFafncqsEAROSMYycR5jCYxyOFFUMTjPBeA0ZnAWX752+rMrEJJm/Fjd5jBI8ZjTESVY2a1hZ/NuqCMiEp0YbrqRpGk3UnCj9BsqiQAFZi1SE1D4L2ZeDTu9YD2Yrq4vhKXQQ+U6HK4ufYuSjBQpcEUYlvIiDHI10FgoShiYdlRIyDG5xGO4sCLHKciBnmZnuhXtcTjQo4wr4KRipnEqU6wm3qaDZXWXTKxjEFW35eZAu1MSkHTMq1ZxatrtKIGRvelpZDqJWQFGH73dNzrob2/2w40dU0MEJCUR7gZ9+9WBsQDgJbK71Q+3d30mL0TO4B8UOMxFI4DDNcnSFPNER1dAzIW9nwi4LAS4RHQUp7oXGmM8eIZam6m+Hc0rb4y25Z4PMLrSN6aO3c5hLlEL3XrQXdNZdx72oQmbtV5D9KqZxkUDW3hs4UPCg0Q9Qmj0CbmkLONePqM5etonI98pm2PKGrsjmX2+CfZOzCfNeD6hHntUq8yRce0yT2AxTrGtc5TlILDKhHt011RNGE2pkrrUG9+K8v9bWX3d2YGpNqX7x7E+MB5JYjZtzOrd+R1qZ1GVc1k2YGNRxWaFawDzGlhesCPtvAvr080XTjfWQyu/3+rt7ZeTbxm9QC/RGgrRDtpD79AhOkEEfULf0U/0q/Wx9bn1pfV1hi4ulDbPUWW1fvwBNaI32A==</latexit>

Generative model for LDA

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics

Topics(shared)

Words in Document(mixture over topics)

Topic Proportions(document-specific)

Each document hasDifferent topic proportions

Topic Modeling

Page 29: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

LDA’s view of a document

Slide credit: William Cohen (CMU)

Page 30: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Example: Discovering scientific topics

Page 31: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Example InferenceExample inference

“Genetics” “Evolution” “Disease” “Computers”

human evolution disease computergenome evolutionary host models

dna species bacteria informationgenetic organisms diseases datagenes life resistance computers

sequence origin bacterial systemgene biology new network

molecular groups strains systemssequencing phylogenetic control model

map living infectious parallelinformation diversity malaria methods

genetics group parasite networksmapping new parasites softwareproject two united new

sequences common tuberculosis simulations

Page 32: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Example InferenceExample inference

1 8 16 26 36 46 56 66 76 86 96

Topics

Probability

0.0

0.1

0.2

0.3

0.4

Page 33: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Example InferenceExample inference (II)

problem model selection speciesproblems rate male forest

mathematical constant males ecologynumber distribution females fish

new time sex ecologicalmathematics number species conservationuniversity size female diversity

two values evolution populationfirst value populations natural

numbers average population ecosystemswork rates sexual populationstime data behavior endangered

mathematicians density evolutionary tropicalchaos measured genetic forests

chaotic models reproductive ecosystem

Page 34: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

climate change, gene knockout techniques, and apoptosis (pro-grammed cell death), the subject of the 2002 Nobel Prize inPhysiology. The cold topics were not topics that lacked preva-lence in the corpus but those that showed a strong decrease inpopularity over time. The coldest topics were 37, 289, and 75,corresponding to sequencing and cloning, structural biology, andimmunology. All these topics were very popular in about 1991and fell in popularity over the period of analysis. The NobelPrizes again provide a good means of validating these trends,with prizes being awarded for work on sequencing in 1993 andimmunology in 1989.

Tagging Abstracts. Each sample produced by our algorithm con-sists of a set of assignments of words to topics. We can use theseassignments to identify the role that words play in documents. Inparticular, we can tag each word with the topic to which it wasassigned and use these assignments to highlight topics that areparticularly informative about the content of a document. Theabstract shown in Fig. 6 is tagged with topic labels as superscripts.Words without superscripts were not included in the vocabularysupplied to the model. All assignments come from the samesingle sample as used in our previous analyses, illustrating the

kind of words assigned to the evolution topic discussed above(topic 280).

This kind of tagging is mainly useful for illustrating the contentof individual topics and how individual words are assigned, andit was used for this purpose in ref. 1. It is also possible to use theresults of our algorithm to highlight conceptual content in otherways. For example, if we integrate across a set of samples, we cancompute a probability that a particular word is assigned to themost prevalent topic in a document. This probability provides agraded measure of the importance of a word that uses informa-tion from the full set of samples, rather than a discrete measurecomputed from a single sample. This form of highlighting is usedto set the contrast of the words shown in Fig. 6 and picks out thewords that determine the topical content of the document. Suchmethods might provide a means of increasing the efficiency ofsearching large document databases, in particular, because it canbe modified to indicate words belonging to the topics of interestto the searcher.

ConclusionWe have presented a statistical inference algorithm for LatentDirichlet Allocation (1), a generative model for documents in

Fig. 5. The plots show the dynamics of the three hottest and three coldest topics from 1991 to 2001, defined as those topics that showed the strongest positiveand negative linear trends. The 12 most probable words in those topics are shown below the plots.

Fig. 6. A PNAS abstract (18) tagged according to topic assignment. The superscripts indicate the topics to which individual words were assigned in a singlesample, whereas the contrast level reflects the probability of a word being assigned to the most prevalent topic in the abstract, computed across samples.

5234 ! www.pnas.org"cgi"doi"10.1073"pnas.0307752101 Griffiths and Steyvers

From Griffiths and Steyvers, PNAS 2004

Page 35: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

From Naive Bayes to Topic Models (board)

Page 36: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Likelihoodlog (p(x d | � ,✓ d)) =

X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

log (p(x d | � ,✓ d)) =X

n

log (p(x dn | � ,✓ d))

=X

n

log

✓Y

v

p(x dn = v | � ,✓ d)I[x dn=v]◆

=X

n,v

I[x dn = v] log (p(x dn = v | � ,✓ d))

=X

n,v

I[x dn = v] log

ÇX

k

p(x dn = v, zdn=k | � ,✓ d)

å

=X

n,v

I[x dn = v] log

ÇX

k

p(zdn=k | ✓ d) p(x dn = v | zdn=k,�)

å

=X

n,v

I[x dn = v] log

ÇX

k

✓ d,k � k,v

å

= X log✓�<latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="LPiWkWvUWNYI5mpxmr9IeDTaWv4=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQglDwB6QKia0ob2wjptUd5WTuKnVxMkcp1wsfzO+yL7N7DSDJo7YVVoqVdbx75z/OT7OiZuEOGW2/W1ufqHx4OGjxWbr8ZOnz54vLb84TeOMeujEi8OYnrswRSEm6IRhFqLzhCIYuSE6c8fv1P7ZBNEUx+SYXSWoH8GA4CH2IJOmwfLCDQjjwAQhGrI1M1kDXkA5mPBLIQbcFyaIsG8Cj/rKCFzEoBCWNLhhbmCj3DLw101AcTBi663Xe2YLpFk0IOZ9ocmfxAZAhq+LDhIa+wM+ETU6e5NfV/rCD3u6f1/UZcCJJfVqeRNYPyn+d5L6S/EpP64/GiWY5Lbr27as7oFVc/x/8ivnkqdym0mNsHXv0dYEM8dWtaZ/kv9Yz4/7liwKaHqyVhW7IlronAtxF78SsRqo1Roste0NO39MfeEUi3bHmD5Hg+XGDfBjL4sQYV4I07Tn2Anrc0gZ9kIkWiBLUQK9MQxQTy4JjFDa5/mcEWZp99jp82FMGCJeyY3DKI0gG2lGBadlqzeSwoiWZQtjn6soPkpxQMpebiRLBz4aypmXZ8Z9N8yQ4N33+4Lb1vYby9ncERWEIr8gnF3bkr8qEFCESIHsblnO9q7OJBlNQnQH2QpT2VBE0IUXRxEkqkXIEz15PgCRNKNIFSKbFvG2I2TbqvAUlT75fgvMbl4KzkEpwellrGBXM5gqVEJXGnRdF+taw77WYT/uoJY9q6dhVsNmGpvpENUgWs0Q1WqiJMVhTLR6hjN0fk+Gumg4wxQ9ViFD+SH1oRYxGdXjyQhrbLfSma56b0sEpEEEZZ9BnCAKWUzVS3eB2SjEEWYpL/aF7oXJ/V5yvyp2IMqXUv27Lj8QGimHT34xy2en31A5k8qcqrIGC2gZmzauBkwqYHHAipTzzqlON31xurnhyPWnrXZnv5h8i8ZL45WxZjjGjtExPhhHxonhNdqNw0a38bm50nzb7DQLdn6u8FkxSk/z43ePMYEw</latexit><latexit sha1_base64="ZGSfvHkKuXAmqhUvswfMoyPCFyw=">AAAJkXicxZVbT9swFMcDdBvtbjAe9xJWaQIpQg1DwB6QEBPa0F5Yx03CXeUkbmrVcTLHKRfL34wvsm8zO82giSN2lZZKlXX8O+d/jo9z4iUEp7zT+TYzO9d48PDRfLP1+MnTZ88XFl+cpHHGfHTsxyRmZx5MEcEUHXPMCTpLGIKRR9CpN3qn90/HiKU4pkf8KkG9CIYUD7APuTL1F+duAIlDGxA04Ct2sgL8kAkwFpdS9kUgbRDhwAY+C7QReIhDKR1l8Ehu4MPc0g9WbcBwOOSrrdc7dgukWdSn9n2h6Z/EBkCFr4sOEhYHfTGWNTo7419X+iIOzk3/nqzLQFBH6dXyNnB+UvzvJPWX4hN+VH80WjDJbde3bVneAcv26P/kV84lT+U2kxph596jrQlmj5xqTf8k/5GZnwgcVRQw9FStOnZFtNA5k/IufiViNVCr1V9od9Y6+WObC7dYtK3iOewvNm5AEPtZhCj3CUzTc7eT8J6AjGOfINkCWYoS6I9giM7VksIIpT2Rzxlpl3aP3J4YxJQj6pfcBIzSCPKhYdRwWrb6QyWMWFm2MPaEjhKgFIe07OVFqnQQoIGaeXlmIvBIhqTovt+TouNsvnHc9S1ZQRgKCsLd7jjqVwVChhAtkO0Nx93cNpkkYwlBd1BHYzobhii68OMoglS3CPnyXJ0PQDTNGNKFqKZFou1K1bYqPEGVT77fAtObl1IIUEpwchkr2NUUpgtV0JUBXdfFujawr3XYjztoZM/raZjVsJnBZibEDIhVM0S1mihJMYmpUc9gis7vycAUJVNM0WMdkqgPaQCNiMmwHk+G2GC7lc509XtbIiALI6j6DOIEMchjpl+6C8yHBEeYp6LYl6YXpvd7qf2q2L4sX0r973liXxqkGj75xSyfnXlD1Uwqc7rKGixkZWzSuBowqYDFAWtSzTu3Ot3Mxcn6mqvWnzbau3vF5Ju3XlqvrBXLtbasXeuDdWgdW36j3ThodBufm0vNt83dZsHOzhQ+S1bpaX78Dj1igPA=</latexit>

Page 37: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

How to estimate parameters in PLSA?

Page 38: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Let’s implement… (in class exercise)

Page 39: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Evaluation: Are these topics any good?

• As for clustering: a bit tricky. Thoughts on how we might evaluate topics?

Page 40: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Likelihood of held-out data

<latexit sha1_base64="Ot+R1CnHY23oNHKIvRpVfR1ZFt4=">AAAF03icfZRPT9swFMANoxvr/gDbcRe0XnaoqoQh6BExoe00QUUBiVTIcV4TC8fJbKdQLF+m3abdtm+zL7Jvs7iNtCaO5kjR03u/989+dpgzKpXn/Vlbf7TRefxk82n32fMXL7e2d15dyKwQBMYkY5m4CrEERjmMFVUMrnIBOA0ZXIa3H6z9cgZC0oyfq3kOkxTHnE4pwapUnX2+2e55A2+xdl3Br4Qeqtbpzc7G7yDKSJECV4RhKa99L1cTjYWihIHpBoWEHJNbHMN1KXKcgpzoRaVmt2Y99yd6mnEFnNTcNE5lilXiKC0s61qSlIlB1NNWyom2USKQNOZ1rzA13W4QwbTctUVlOgpZAUaPPh4b7fUP3vf9vUPTQAREFeEPvX75NYFYAPAKGe73/YOhy+SFyBn8gzyL2WoEcLgjWZpiHulgBsRcl/sTAJeFANuIDsJU93xjjAMv0dJnYe8Gq8Z7o3VQKzCY6XvTxOYrmG20hOYO9NAW68HBvrRhgUpA4ZbqVTuNixa2cNjChYQDiWaF0JoTcklZxp1+piv0Yk6mblK2wlRnbEOy8ipG2ImYJ+14nlCHHTVOZmTsuKwSWMQpLs85yHIQWGXCXro7qhJGU6qkruzG9aL8/16lvZnsxNSH0v7DUJ8YhyQhWwxmfe/cCSUiqnO2yxYsFnVseXAtYN4Aqw22ZPne+c3XzRUu9ga+N/DP9ntHx9XLt4neoLfoHfLRITpCn9ApGiOCAH1HP9GvzrijO18735bo+lrl8xrVVufHX+c0IZU=</latexit>

zn<latexit sha1_base64="CL/33CPWRC9wnFZu4UuXjwKQH1Y=">AAAF1XicfZTNb9MwFMC9scIoXxscuUz0wqGakjFtPVZDExxH2Ze0VJPjvKbWHCfYTrfO8g1xQ9zgn+Ef4b/BbiPRxBGOFD2993tf9rPjglGpguDP2vqDjc7DR5uPu0+ePnv+Ymv75bnMS0HgjOQsF5cxlsAohzNFFYPLQgDOYgYX8c17Z7+YgZA056dqXsA4wymnE0qwsqrP99f8eqsX7AaLteMLYSX0hmi5Tq63N35HSU7KDLgiDEt5FQaFGmssFCUMTDcqJRSY3OAUrqzIcQZyrBe1mp2a9TQc60nOFXBSc9M4kxlWU0/pYFnXkqlNDKKetlKOtYuSgKQpr3vFmel2owQmdt8WlekkZiUYPfpwZHTQP3jXD/cOTQMRkFREOAj69msCqQDgFTLY74cHA58pSlEw+AcFDnPVCOBwS/IswzzR0QyIubL7EwGXpQDXiI7iTPdCY4wHL1Hrs7B3o1XjndE6qhUYzfSdaWLzFcw1aqG5B923xbr3sC9tWKSmoHBL9aqdxmULW3ps6UPCg0SzQmjNCYWkLOdeP5MVejEnEz8pW2GqM3Yhmb2MCfYiFtN2vJhSjx01TmZk3LisElikGbbnHOUFCKxy4S7dLVVTRjOqpK7sxvei/P9e1t5MdmzqQ+n+cayPjUeSmC0Gs753/oQSkdQ512ULloo6tjy4FrBogNUGO9K+d2HzdfOF873d0Mqf9nvDo+rl20Sv0Rv0FoXoEA3RR3SCzhBBKfqOfqJfnYuO6XztfFui62uVzytUW50ffwHVSSLi</latexit><latexit sha1_base64="CL/33CPWRC9wnFZu4UuXjwKQH1Y=">AAAF1XicfZTNb9MwFMC9scIoXxscuUz0wqGakjFtPVZDExxH2Ze0VJPjvKbWHCfYTrfO8g1xQ9zgn+Ef4b/BbiPRxBGOFD2993tf9rPjglGpguDP2vqDjc7DR5uPu0+ePnv+Ymv75bnMS0HgjOQsF5cxlsAohzNFFYPLQgDOYgYX8c17Z7+YgZA056dqXsA4wymnE0qwsqrP99f8eqsX7AaLteMLYSX0hmi5Tq63N35HSU7KDLgiDEt5FQaFGmssFCUMTDcqJRSY3OAUrqzIcQZyrBe1mp2a9TQc60nOFXBSc9M4kxlWU0/pYFnXkqlNDKKetlKOtYuSgKQpr3vFmel2owQmdt8WlekkZiUYPfpwZHTQP3jXD/cOTQMRkFREOAj69msCqQDgFTLY74cHA58pSlEw+AcFDnPVCOBwS/IswzzR0QyIubL7EwGXpQDXiI7iTPdCY4wHL1Hrs7B3o1XjndE6qhUYzfSdaWLzFcw1aqG5B923xbr3sC9tWKSmoHBL9aqdxmULW3ps6UPCg0SzQmjNCYWkLOdeP5MVejEnEz8pW2GqM3Yhmb2MCfYiFtN2vJhSjx01TmZk3LisElikGbbnHOUFCKxy4S7dLVVTRjOqpK7sxvei/P9e1t5MdmzqQ+n+cayPjUeSmC0Gs753/oQSkdQ512ULloo6tjy4FrBogNUGO9K+d2HzdfOF873d0Mqf9nvDo+rl20Sv0Rv0FoXoEA3RR3SCzhBBKfqOfqJfnYuO6XztfFui62uVzytUW50ffwHVSSLi</latexit><latexit sha1_base64="CL/33CPWRC9wnFZu4UuXjwKQH1Y=">AAAF1XicfZTNb9MwFMC9scIoXxscuUz0wqGakjFtPVZDExxH2Ze0VJPjvKbWHCfYTrfO8g1xQ9zgn+Ef4b/BbiPRxBGOFD2993tf9rPjglGpguDP2vqDjc7DR5uPu0+ePnv+Ymv75bnMS0HgjOQsF5cxlsAohzNFFYPLQgDOYgYX8c17Z7+YgZA056dqXsA4wymnE0qwsqrP99f8eqsX7AaLteMLYSX0hmi5Tq63N35HSU7KDLgiDEt5FQaFGmssFCUMTDcqJRSY3OAUrqzIcQZyrBe1mp2a9TQc60nOFXBSc9M4kxlWU0/pYFnXkqlNDKKetlKOtYuSgKQpr3vFmel2owQmdt8WlekkZiUYPfpwZHTQP3jXD/cOTQMRkFREOAj69msCqQDgFTLY74cHA58pSlEw+AcFDnPVCOBwS/IswzzR0QyIubL7EwGXpQDXiI7iTPdCY4wHL1Hrs7B3o1XjndE6qhUYzfSdaWLzFcw1aqG5B923xbr3sC9tWKSmoHBL9aqdxmULW3ps6UPCg0SzQmjNCYWkLOdeP5MVejEnEz8pW2GqM3Yhmb2MCfYiFtN2vJhSjx01TmZk3LisElikGbbnHOUFCKxy4S7dLVVTRjOqpK7sxvei/P9e1t5MdmzqQ+n+cayPjUeSmC0Gs753/oQSkdQ512ULloo6tjy4FrBogNUGO9K+d2HzdfOF873d0Mqf9nvDo+rl20Sv0Rv0FoXoEA3RR3SCzhBBKfqOfqJfnYuO6XztfFui62uVzytUW50ffwHVSSLi</latexit><latexit sha1_base64="EVtucYrJet2AkCUkTOzXDAuSwZo=">AAAF1XicfZTNb9MwFMC9scIoXxscuUz0wqGakjFtPU5DExzH2JfUVJXjvKbWHCfYzrbW8g1xQ9zgn+Ef4b/BbiPRxBGOFD2993tf9rPjglGpguDP2vqDjc7DR5uPu0+ePnv+Ymv75aXMS0HgguQsF9cxlsAohwtFFYPrQgDOYgZX8c17Z7+6BSFpzs/VrIBRhlNOJ5RgZVWf52M+3uoFu8Fi7fhCWAk9VK3T8fbG7yjJSZkBV4RhKYdhUKiRxkJRwsB0o1JCgckNTmFoRY4zkCO9qNXs1Kzn4UhPcq6Ak5qbxpnMsJp6SgfLupZMbWIQ9bSVcqRdlAQkTXndK85MtxslMLH7tqhMJzErweizD8dGB/2Dd/1w79A0EAFJRYSDoG+/JpAKAF4hg/1+eDDwmaIUBYN/UOAwV40ADnckzzLMEx3dAjFDuz8RcFkKcI3oKM50LzTGePAStT4LezdaNd4braNagdGtvjdNbLaCuUYtNPOgeVusuYd9acMiNQWFW6pX7TQuW9jSY0sfEh4kmhVCa04oJGU59/qZrNCLOZn4SdkKU52xC8nsZUywF7GYtuPFlHrsWeNkzowbl1UCizTD9pyjvACBVS7cpbujaspoRpXUld34XpT/38vam8lOTH0o3T+O9YnxSBKzxWDW986fUCKSOue6bMFSUceWB9cCFg2w2mBH2vcubL5uvnC5txta+dN+7+i4evk20Wv0Br1FITpER+gjOkUXiKAUfUc/0a/OVcd0vna+LdH1tcrnFaqtzo+/g3oiog==</latexit>

xn<latexit sha1_base64="R5fqLlqQnakjPB/hUCUoBDPFKv0=">AAAF1XicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaeqyGJjiOsm6TlmpynNfUmuME29nWWb4hbogbfBm+CN8Gu41EE0c4UvT03u/9s58dF4xKFQR/1tYfbHQePtp83H3y9NnzF1vbL89kXgoCY5KzXFzEWAKjHMaKKgYXhQCcxQzO4+v3zn5+A0LSnJ+qeQGTDKecTinByqo+313xq61esBss1o4vhJXQG6LlOrna3vgdJTkpM+CKMCzlZRgUaqKxUJQwMN2olFBgco1TuLQixxnIiV7UanZq1tNwoqc5V8BJzU3jTGZYzTylg2VdS2Y2MYh62ko50S5KApKmvO4VZ6bbjRKY2n1bVKaTmJVg9OjDkdFB/+BdP9w7NA1EQFIR4SDo268JpAKAV8hgvx8eDHymKEXB4B8UOMxVI4DDLcmzDPNERzdAzKXdnwi4LAW4RnQUZ7oXGmM8eIlan4W9G60a74zWUa3A6EbfmSY2X8Fcoxaae9B9W6x7D/vShkVqBgq3VK/aaVy2sKXHlj4kPEg0K4TWnFBIynLu9TNdoRdzMvWTshWmOmMXktnLmGAvYjFrx4sZ9dhR42RGxo3LKoFFmmF7zlFegMAqF+7S3VI1YzSjSurKbnwvyv/vZe3NZMemPpTuH8f62HgkidliMOt7508oEUmdc122YKmoY8uDawGLBlhtsCPtexc2XzdfONvbDa38ab83PKpevk30Gr1Bb1GIDtEQfUQnaIwIStF39BP96px3TOdr59sSXV+rfF6h2ur8+AvKZyLg</latexit><latexit sha1_base64="R5fqLlqQnakjPB/hUCUoBDPFKv0=">AAAF1XicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaeqyGJjiOsm6TlmpynNfUmuME29nWWb4hbogbfBm+CN8Gu41EE0c4UvT03u/9s58dF4xKFQR/1tYfbHQePtp83H3y9NnzF1vbL89kXgoCY5KzXFzEWAKjHMaKKgYXhQCcxQzO4+v3zn5+A0LSnJ+qeQGTDKecTinByqo+313xq61esBss1o4vhJXQG6LlOrna3vgdJTkpM+CKMCzlZRgUaqKxUJQwMN2olFBgco1TuLQixxnIiV7UanZq1tNwoqc5V8BJzU3jTGZYzTylg2VdS2Y2MYh62ko50S5KApKmvO4VZ6bbjRKY2n1bVKaTmJVg9OjDkdFB/+BdP9w7NA1EQFIR4SDo268JpAKAV8hgvx8eDHymKEXB4B8UOMxVI4DDLcmzDPNERzdAzKXdnwi4LAW4RnQUZ7oXGmM8eIlan4W9G60a74zWUa3A6EbfmSY2X8Fcoxaae9B9W6x7D/vShkVqBgq3VK/aaVy2sKXHlj4kPEg0K4TWnFBIynLu9TNdoRdzMvWTshWmOmMXktnLmGAvYjFrx4sZ9dhR42RGxo3LKoFFmmF7zlFegMAqF+7S3VI1YzSjSurKbnwvyv/vZe3NZMemPpTuH8f62HgkidliMOt7508oEUmdc122YKmoY8uDawGLBlhtsCPtexc2XzdfONvbDa38ab83PKpevk30Gr1Bb1GIDtEQfUQnaIwIStF39BP96px3TOdr59sSXV+rfF6h2ur8+AvKZyLg</latexit><latexit sha1_base64="R5fqLlqQnakjPB/hUCUoBDPFKv0=">AAAF1XicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaeqyGJjiOsm6TlmpynNfUmuME29nWWb4hbogbfBm+CN8Gu41EE0c4UvT03u/9s58dF4xKFQR/1tYfbHQePtp83H3y9NnzF1vbL89kXgoCY5KzXFzEWAKjHMaKKgYXhQCcxQzO4+v3zn5+A0LSnJ+qeQGTDKecTinByqo+313xq61esBss1o4vhJXQG6LlOrna3vgdJTkpM+CKMCzlZRgUaqKxUJQwMN2olFBgco1TuLQixxnIiV7UanZq1tNwoqc5V8BJzU3jTGZYzTylg2VdS2Y2MYh62ko50S5KApKmvO4VZ6bbjRKY2n1bVKaTmJVg9OjDkdFB/+BdP9w7NA1EQFIR4SDo268JpAKAV8hgvx8eDHymKEXB4B8UOMxVI4DDLcmzDPNERzdAzKXdnwi4LAW4RnQUZ7oXGmM8eIlan4W9G60a74zWUa3A6EbfmSY2X8Fcoxaae9B9W6x7D/vShkVqBgq3VK/aaVy2sKXHlj4kPEg0K4TWnFBIynLu9TNdoRdzMvWTshWmOmMXktnLmGAvYjFrx4sZ9dhR42RGxo3LKoFFmmF7zlFegMAqF+7S3VI1YzSjSurKbnwvyv/vZe3NZMemPpTuH8f62HgkidliMOt7508oEUmdc122YKmoY8uDawGLBlhtsCPtexc2XzdfONvbDa38ab83PKpevk30Gr1Bb1GIDtEQfUQnaIwIStF39BP96px3TOdr59sSXV+rfF6h2ur8+AvKZyLg</latexit><latexit sha1_base64="TFLGGUTr6qFYReHLRiCE2yIPgUY=">AAAF1XicfZTNb9MwFMC9scIoXxscuUz0wqGakjFtPU5DExzH2JfUVJXjvKbWHCfYztbO8g1xQ9zgn+Ef4b/BbiPRxBGOFD2993tf9rPjglGpguDP2vqDjc7DR5uPu0+ePnv+Ymv75aXMS0HgguQsF9cxlsAohwtFFYPrQgDOYgZX8c17Z7+6BSFpzs/VvIBRhlNOJ5RgZVWfZ2M+3uoFu8Fi7fhCWAk9VK3T8fbG7yjJSZkBV4RhKYdhUKiRxkJRwsB0o1JCgckNTmFoRY4zkCO9qNXs1Kzn4UhPcq6Ak5qbxpnMsJp6SgfLupZMbWIQ9bSVcqRdlAQkTXndK85MtxslMLH7tqhMJzErweizD8dGB/2Dd/1w79A0EAFJRYSDoG+/JpAKAF4hg/1+eDDwmaIUBYN/UOAwV40ADnckzzLMEx3dAjFDuz8RcFkKcI3oKM50LzTGePAStT4LezdaNc6M1lGtwOhWz0wTm69grlELzT3ovi3WvYd9acMiNQWFW6pX7TQuW9jSY0sfEh4kmhVCa04oJGU59/qZrNCLOZn4SdkKU52xC8nsZUywF7GYtuPFlHrsWeNkzowbl1UCizTD9pyjvACBVS7cpbujaspoRpXUld34XpT/38vam8lOTH0o3T+O9YnxSBKzxWDW986fUCKSOue6bMFSUceWB9cCFg2w2mBH2vcubL5uvnC5txta+dN+7+i4evk20Wv0Br1FITpER+gjOkUXiKAUfUc/0a/OVcd0vna+LdH1tcrnFaqtzo+/eJgioA==</latexit>

�<latexit sha1_base64="aDS6LXjdUwMKJ2qhSEXt/yLFw8g=">AAAF13icfZRPb9MwFMC9scIof7bBkctELxyqKRnT1mM1NMFxVOs2tFST47y21hwn2E63zrK4IW6IG3wXvgjfBruNRBNHOFL09N7v/bOfHeeMShUEf9bWH2y0Hj7afNx+8vTZ863tnRfnMisEgSHJWCYuYyyBUQ5DRRWDy1wATmMGF/HNO2e/mIGQNONnap7DKMUTTseUYGVVwygGha+3O8FesFi7vhCWQqePluv0emfjd5RkpEiBK8KwlFdhkKuRxkJRwsC0o0JCjskNnsCVFTlOQY70olqzW7GehSM9zrgCTipuGqcyxWrqKR0sq1oytYlBVNOWypF2URKQdMKrXnFq2u0ogbHduUVlOolZAUYP3h8bHXQP33bD/SNTQwQkJRH2gq796sBEAPAS6R10w8Oez+SFyBn8gwKHuWoEcLglWZpinuhoBsRc2f2JgMtCgGtER3GqO6ExxoOXqPVZ2NvRqvHOaB1VCoxm+s7UsfkK5hq10NyD7pti3XvY5yYsUlM7bw3Vq2YaFw1s4bGFDwkPEvUKoTEn5JKyjHv9jFfoxZyM/aRshSnP2IVk9jom2IuYT5vxfEo9dlA7mYFx47JKYDFJsT3nKMtBYJUJd+luqZoymlIldWk3vhfl//ey9nqyE1MdSvePY31iPJLEbDGY1b3zJ5SIpMq5Lhuwiahiy4NrAPMaWG6wI+17F9ZfN184398LrfzxoNM/Ll++TfQKvUZvUIiOUB99QKdoiAii6Dv6iX61PrW+tL62vi3R9bXS5yWqrNaPv+ttI6c=</latexit><latexit sha1_base64="aDS6LXjdUwMKJ2qhSEXt/yLFw8g=">AAAF13icfZRPb9MwFMC9scIof7bBkctELxyqKRnT1mM1NMFxVOs2tFST47y21hwn2E63zrK4IW6IG3wXvgjfBruNRBNHOFL09N7v/bOfHeeMShUEf9bWH2y0Hj7afNx+8vTZ863tnRfnMisEgSHJWCYuYyyBUQ5DRRWDy1wATmMGF/HNO2e/mIGQNONnap7DKMUTTseUYGVVwygGha+3O8FesFi7vhCWQqePluv0emfjd5RkpEiBK8KwlFdhkKuRxkJRwsC0o0JCjskNnsCVFTlOQY70olqzW7GehSM9zrgCTipuGqcyxWrqKR0sq1oytYlBVNOWypF2URKQdMKrXnFq2u0ogbHduUVlOolZAUYP3h8bHXQP33bD/SNTQwQkJRH2gq796sBEAPAS6R10w8Oez+SFyBn8gwKHuWoEcLglWZpinuhoBsRc2f2JgMtCgGtER3GqO6ExxoOXqPVZ2NvRqvHOaB1VCoxm+s7UsfkK5hq10NyD7pti3XvY5yYsUlM7bw3Vq2YaFw1s4bGFDwkPEvUKoTEn5JKyjHv9jFfoxZyM/aRshSnP2IVk9jom2IuYT5vxfEo9dlA7mYFx47JKYDFJsT3nKMtBYJUJd+luqZoymlIldWk3vhfl//ey9nqyE1MdSvePY31iPJLEbDGY1b3zJ5SIpMq5Lhuwiahiy4NrAPMaWG6wI+17F9ZfN184398LrfzxoNM/Ll++TfQKvUZvUIiOUB99QKdoiAii6Dv6iX61PrW+tL62vi3R9bXS5yWqrNaPv+ttI6c=</latexit><latexit sha1_base64="aDS6LXjdUwMKJ2qhSEXt/yLFw8g=">AAAF13icfZRPb9MwFMC9scIof7bBkctELxyqKRnT1mM1NMFxVOs2tFST47y21hwn2E63zrK4IW6IG3wXvgjfBruNRBNHOFL09N7v/bOfHeeMShUEf9bWH2y0Hj7afNx+8vTZ863tnRfnMisEgSHJWCYuYyyBUQ5DRRWDy1wATmMGF/HNO2e/mIGQNONnap7DKMUTTseUYGVVwygGha+3O8FesFi7vhCWQqePluv0emfjd5RkpEiBK8KwlFdhkKuRxkJRwsC0o0JCjskNnsCVFTlOQY70olqzW7GehSM9zrgCTipuGqcyxWrqKR0sq1oytYlBVNOWypF2URKQdMKrXnFq2u0ogbHduUVlOolZAUYP3h8bHXQP33bD/SNTQwQkJRH2gq796sBEAPAS6R10w8Oez+SFyBn8gwKHuWoEcLglWZpinuhoBsRc2f2JgMtCgGtER3GqO6ExxoOXqPVZ2NvRqvHOaB1VCoxm+s7UsfkK5hq10NyD7pti3XvY5yYsUlM7bw3Vq2YaFw1s4bGFDwkPEvUKoTEn5JKyjHv9jFfoxZyM/aRshSnP2IVk9jom2IuYT5vxfEo9dlA7mYFx47JKYDFJsT3nKMtBYJUJd+luqZoymlIldWk3vhfl//ey9nqyE1MdSvePY31iPJLEbDGY1b3zJ5SIpMq5Lhuwiahiy4NrAPMaWG6wI+17F9ZfN184398LrfzxoNM/Ll++TfQKvUZvUIiOUB99QKdoiAii6Dv6iX61PrW+tL62vi3R9bXS5yWqrNaPv+ttI6c=</latexit><latexit sha1_base64="kSBDf/XaBhLxR2gd7E4qD97QhD4=">AAAF13icfZRPb9MwFMC9scIof7bBkctELxyqKRnT1uM0NMFxVOs21FST47y21hwn2E63zrK4IW6IG3wXvgjfBruNRBNHOFL09N7v/bOfHeeMShUEf9bWH2y0Hj7afNx+8vTZ863tnRcXMisEgQHJWCauYiyBUQ4DRRWDq1wATmMGl/HNO2e/nIGQNOPnap7DKMUTTseUYGVVgygGha+3O8FesFi7vhCWQgeV6+x6Z+N3lGSkSIErwrCUwzDI1UhjoShhYNpRISHH5AZPYGhFjlOQI72o1uxWrOfhSI8zroCTipvGqUyxmnpKB8uqlkxtYhDVtKVypF2UBCSd8KpXnJp2O0pgbHduUZlOYlaA0f33J0YH3cO33XD/yNQQAUlJhL2ga786MBEAvER6B93wsOczeSFyBv+gwGGuGgEcbkmWppgnOpoBMUO7PxFwWQhwjegoTnUnNMZ48BK1Pgt7O1o13hmto0qB0UzfmTo2X8Fcoxaae9B9U6x7D/vchEVqauetoXrVTOOigS08tvAh4UGiXiE05oRcUpZxr5/xCr2Yk7GflK0w5Rm7kMxexwR7EfNpM55Pqcf2ayfTN25cVgksJim25xxlOQisMuEu3S1VU0ZTqqQu7cb3ovz/XtZeT3ZqqkPp/nGsT41HkpgtBrO6d/6EEpFUOddlAzYRVWx5cA1gXgPLDXakfe/C+uvmCxf7e6GVPx50jk/Kl28TvUKv0RsUoiN0jD6gMzRABFH0Hf1Ev1qfWl9aX1vfluj6WunzElVW68dfmZ4jZw==</latexit>

✓<latexit sha1_base64="dtgpQ6KClwybVS33dqS0RC75e2U=">AAAF2HicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaepyGJjiOat0mlmpynNfUzHGC7WzrLEvcEDfEDT4LX4Rvg91GookjHCl6eu/3/tnPjgtGpQqCPyur99Y69x+sP+w+evzk6bONzeenMi8FgRHJWS7OYyyBUQ4jRRWD80IAzmIGZ/HVW2c/uwYhac5P1KyAcYZTTieUYGVVp5GagsKXG71gO5ivLV8IK6F3gBbr+HJz7XeU5KTMgCvCsJQXYVCoscZCUcLAdKNSQoHJFU7hwoocZyDHel6u2apZT8KxnuRcASc1N40zmWE19ZQOlnUtmdrEIOppK+VYuygJSJryulecmW43SmBit25emU5iVoLRw3eHRgf9vTf9cGffNBABSUWEg6BvvyaQCgBeIYPdfrg38JmiFAWDf1DgMFeNAA43JM8yzBMdXQMxF3Z/IuCyFOAa0VGc6V5ojPHgBWp95vZutGy8NVpHtQKja31rmthsCXONWmjmQXdtse487HMbtpi3lupVO43LFrb02NKHhAeJZoXQmhMKSVnOvX4mS/R8TiZ+UrbEVGfsQjJ7HxPsRSym7XgxpR47bJzM0LhxWSawSDNszznKCxBY5cJduhuqpoxmVEld2Y3vRfn/vay9mezI1IfS/eNYHxmPJDGbD2Z97/wJJSKpc67LFiwVdWxxcC1g0QCrDXakfe/C5uvmC6c726GVP+z2Dg6rl28dvUSv0GsUon10gN6jYzRCBH1C39FP9KvzsfOl87XzbYGurlQ+L1BtdX78Ba2VJCs=</latexit><latexit sha1_base64="dtgpQ6KClwybVS33dqS0RC75e2U=">AAAF2HicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaepyGJjiOat0mlmpynNfUzHGC7WzrLEvcEDfEDT4LX4Rvg91GookjHCl6eu/3/tnPjgtGpQqCPyur99Y69x+sP+w+evzk6bONzeenMi8FgRHJWS7OYyyBUQ4jRRWD80IAzmIGZ/HVW2c/uwYhac5P1KyAcYZTTieUYGVVp5GagsKXG71gO5ivLV8IK6F3gBbr+HJz7XeU5KTMgCvCsJQXYVCoscZCUcLAdKNSQoHJFU7hwoocZyDHel6u2apZT8KxnuRcASc1N40zmWE19ZQOlnUtmdrEIOppK+VYuygJSJryulecmW43SmBit25emU5iVoLRw3eHRgf9vTf9cGffNBABSUWEg6BvvyaQCgBeIYPdfrg38JmiFAWDf1DgMFeNAA43JM8yzBMdXQMxF3Z/IuCyFOAa0VGc6V5ojPHgBWp95vZutGy8NVpHtQKja31rmthsCXONWmjmQXdtse487HMbtpi3lupVO43LFrb02NKHhAeJZoXQmhMKSVnOvX4mS/R8TiZ+UrbEVGfsQjJ7HxPsRSym7XgxpR47bJzM0LhxWSawSDNszznKCxBY5cJduhuqpoxmVEld2Y3vRfn/vay9mezI1IfS/eNYHxmPJDGbD2Z97/wJJSKpc67LFiwVdWxxcC1g0QCrDXakfe/C5uvmC6c726GVP+z2Dg6rl28dvUSv0GsUon10gN6jYzRCBH1C39FP9KvzsfOl87XzbYGurlQ+L1BtdX78Ba2VJCs=</latexit><latexit sha1_base64="dtgpQ6KClwybVS33dqS0RC75e2U=">AAAF2HicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaepyGJjiOat0mlmpynNfUzHGC7WzrLEvcEDfEDT4LX4Rvg91GookjHCl6eu/3/tnPjgtGpQqCPyur99Y69x+sP+w+evzk6bONzeenMi8FgRHJWS7OYyyBUQ4jRRWD80IAzmIGZ/HVW2c/uwYhac5P1KyAcYZTTieUYGVVp5GagsKXG71gO5ivLV8IK6F3gBbr+HJz7XeU5KTMgCvCsJQXYVCoscZCUcLAdKNSQoHJFU7hwoocZyDHel6u2apZT8KxnuRcASc1N40zmWE19ZQOlnUtmdrEIOppK+VYuygJSJryulecmW43SmBit25emU5iVoLRw3eHRgf9vTf9cGffNBABSUWEg6BvvyaQCgBeIYPdfrg38JmiFAWDf1DgMFeNAA43JM8yzBMdXQMxF3Z/IuCyFOAa0VGc6V5ojPHgBWp95vZutGy8NVpHtQKja31rmthsCXONWmjmQXdtse487HMbtpi3lupVO43LFrb02NKHhAeJZoXQmhMKSVnOvX4mS/R8TiZ+UrbEVGfsQjJ7HxPsRSym7XgxpR47bJzM0LhxWSawSDNszznKCxBY5cJduhuqpoxmVEld2Y3vRfn/vay9mezI1IfS/eNYHxmPJDGbD2Z97/wJJSKpc67LFiwVdWxxcC1g0QCrDXakfe/C5uvmC6c726GVP+z2Dg6rl28dvUSv0GsUon10gN6jYzRCBH1C39FP9KvzsfOl87XzbYGurlQ+L1BtdX78Ba2VJCs=</latexit><latexit sha1_base64="oMiyQFUFxbkN7ZoLVRKcGnWf1C8=">AAAF2HicfZRPb9MwFMC9scIo/zY4cpnohUM1JWPaepyGJjiOau0mlmpynNfWzHGC7XTrLEvcEDfEDT4LX4Rvg91GookjHCl6eu/3/tnPjnNGpQqCP2vr9zZa9x9sPmw/evzk6bOt7edDmRWCwIBkLBMXMZbAKIeBoorBRS4ApzGD8/j6rbOfz0BImvEzNc9hlOIJp2NKsLKqYaSmoPDVVifYDRZrxxfCUuigcp1ebW/8jpKMFClwRRiW8jIMcjXSWChKGJh2VEjIMbnGE7i0IscpyJFelGt2KtazcKTHGVfAScVN41SmWE09pYNlVUumNjGIatpSOdIuSgKSTnjVK05Nux0lMLZbt6hMJzErwOj+u2Ojg+7Bm264d2hqiICkJMJe0LVfHZgIAF4ivf1ueNDzmbwQOYN/UOAwV40ADjckS1PMEx3NgJhLuz8RcFkIcI3oKE51JzTGePAStT4LeztaNd4araNKgdFM35o6Nl/BXKMWmnvQXVOsOw/73IQt562hetVM46KBLTy28CHhQaJeITTmhFxSlnGvn/EKvZiTsZ+UrTDlGbuQzN7HBHsR82kznk+px/ZrJ9M3blxWCSwmKbbnHGU5CKwy4S7dDVVTRlOqpC7txvei/P9e1l5PdmKqQ+n+caxPjEeSmC0Gs7p3/oQSkVQ512UDNhFVbHlwDWBeA8sNdqR978L66+YLw73d0Mof9jtHx+XLt4leolfoNQrRITpC79EpGiCCPqHv6Cf61frY+tL62vq2RNfXSp8XqLJaP/4CW8Yj6w==</latexit>

Fit modelObserved data Held-out data

Page 41: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

“Intrusion detection”

Word Intrusion Topic Intrusion

Figure 2: Screenshots of our two human tasks. In the word intrusion task (left), subjects are presented with a setof words and asked to select the word which does not belong with the others. In the topic intrusion task (right),users are given a document’s title and the first few sentences of the document. The users must select which ofthe four groups of words does not belong.

word is selected at random from a pool of words with low probability in the current topic (to reducethe possibility that the intruder comes from the same semantic group) but high probability in someother topic (to ensure that the intruder is not rejected outright due solely to rarity). All six words arethen shuffled and presented to the subject.

3.2 Topic intrusion

The topic intrusion task tests whether a topic model’s decomposition of documents into a mixture oftopics agrees with human judgments of the document’s content. This allows for evaluation of thelatent space depicted by Figure 1(b). In this task, subjects are shown the title and a snippet from adocument. Along with the document they are presented with four topics (each topic is represented bythe eight highest-probability words within that topic). Three of those topics are the highest probabilitytopics assigned to that document. The remaining intruder topic is chosen randomly from the otherlow-probability topics in the model.

The subject is instructed to choose the topic which does not belong with the document. As before, ifthe topic assignment to documents were relevant and intuitive, we would expect that subjects wouldselect the topic we randomly added as the topic that did not belong. The formulation of this taskprovides a natural way to analyze the quality of document-topic assignments found by the topicmodels. Each of the three models we fit explicitly assigns topic weights to each document; this taskdetermines whether humans make the same association.

Due to time constraints, subjects do not see the entire document; they only see the title and firstfew sentences. While this is less information than is available to the algorithm, humans are goodat extrapolating from limited data, and our corpora (encyclopedia and newspaper) are structured toprovide an overview of the article in the first few sentences. The setup of this task is also meaningfulin situations where one might be tempted to use topics for corpus exploration. If topics are usedto find relevant documents, for example, users will likely be provided with similar views of thedocuments (e.g. title and abstract, as in Rexa).

For both the word intrusion and topic intrusion tasks, subjects were instructed to focus on themeanings of words, not their syntactic usage or orthography. We also presented subjects with theoption of viewing the “correct” answer after they submitted their own response, to make the tasksmore engaging. Here the “correct” answer was determined by the model which generated the data,presented as if it were the response of another user. At the same time, subjects were encouraged tobase their responses on their own opinions, not to try to match other subjects’ (the models’) selections.In small experiments, we have found that this extra information did not bias subjects’ responses.

4 Experimental results

To prepare data for human subjects to review, we fit three different topic models on two corpora.In this section, we describe how we prepared the corpora, fit the models, and created the tasksdescribed in Section 3. We then present the results of these human trials and compare them to metricstraditionally used to evaluate topic models.

4

From Chang et al., 2009

Page 42: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

“Intrusion detection”

Word Intrusion Topic Intrusion

Figure 2: Screenshots of our two human tasks. In the word intrusion task (left), subjects are presented with a setof words and asked to select the word which does not belong with the others. In the topic intrusion task (right),users are given a document’s title and the first few sentences of the document. The users must select which ofthe four groups of words does not belong.

word is selected at random from a pool of words with low probability in the current topic (to reducethe possibility that the intruder comes from the same semantic group) but high probability in someother topic (to ensure that the intruder is not rejected outright due solely to rarity). All six words arethen shuffled and presented to the subject.

3.2 Topic intrusion

The topic intrusion task tests whether a topic model’s decomposition of documents into a mixture oftopics agrees with human judgments of the document’s content. This allows for evaluation of thelatent space depicted by Figure 1(b). In this task, subjects are shown the title and a snippet from adocument. Along with the document they are presented with four topics (each topic is represented bythe eight highest-probability words within that topic). Three of those topics are the highest probabilitytopics assigned to that document. The remaining intruder topic is chosen randomly from the otherlow-probability topics in the model.

The subject is instructed to choose the topic which does not belong with the document. As before, ifthe topic assignment to documents were relevant and intuitive, we would expect that subjects wouldselect the topic we randomly added as the topic that did not belong. The formulation of this taskprovides a natural way to analyze the quality of document-topic assignments found by the topicmodels. Each of the three models we fit explicitly assigns topic weights to each document; this taskdetermines whether humans make the same association.

Due to time constraints, subjects do not see the entire document; they only see the title and firstfew sentences. While this is less information than is available to the algorithm, humans are goodat extrapolating from limited data, and our corpora (encyclopedia and newspaper) are structured toprovide an overview of the article in the first few sentences. The setup of this task is also meaningfulin situations where one might be tempted to use topics for corpus exploration. If topics are usedto find relevant documents, for example, users will likely be provided with similar views of thedocuments (e.g. title and abstract, as in Rexa).

For both the word intrusion and topic intrusion tasks, subjects were instructed to focus on themeanings of words, not their syntactic usage or orthography. We also presented subjects with theoption of viewing the “correct” answer after they submitted their own response, to make the tasksmore engaging. Here the “correct” answer was determined by the model which generated the data,presented as if it were the response of another user. At the same time, subjects were encouraged tobase their responses on their own opinions, not to try to match other subjects’ (the models’) selections.In small experiments, we have found that this extra information did not bias subjects’ responses.

4 Experimental results

To prepare data for human subjects to review, we fit three different topic models on two corpora.In this section, we describe how we prepared the corpora, fit the models, and created the tasksdescribed in Section 3. We then present the results of these human trials and compare them to metricstraditionally used to evaluate topic models.

4

Which word doesn’t belong?

From Chang et al., 2009

Page 43: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

“Intrusion detection”

Word Intrusion Topic Intrusion

Figure 2: Screenshots of our two human tasks. In the word intrusion task (left), subjects are presented with a setof words and asked to select the word which does not belong with the others. In the topic intrusion task (right),users are given a document’s title and the first few sentences of the document. The users must select which ofthe four groups of words does not belong.

word is selected at random from a pool of words with low probability in the current topic (to reducethe possibility that the intruder comes from the same semantic group) but high probability in someother topic (to ensure that the intruder is not rejected outright due solely to rarity). All six words arethen shuffled and presented to the subject.

3.2 Topic intrusion

The topic intrusion task tests whether a topic model’s decomposition of documents into a mixture oftopics agrees with human judgments of the document’s content. This allows for evaluation of thelatent space depicted by Figure 1(b). In this task, subjects are shown the title and a snippet from adocument. Along with the document they are presented with four topics (each topic is represented bythe eight highest-probability words within that topic). Three of those topics are the highest probabilitytopics assigned to that document. The remaining intruder topic is chosen randomly from the otherlow-probability topics in the model.

The subject is instructed to choose the topic which does not belong with the document. As before, ifthe topic assignment to documents were relevant and intuitive, we would expect that subjects wouldselect the topic we randomly added as the topic that did not belong. The formulation of this taskprovides a natural way to analyze the quality of document-topic assignments found by the topicmodels. Each of the three models we fit explicitly assigns topic weights to each document; this taskdetermines whether humans make the same association.

Due to time constraints, subjects do not see the entire document; they only see the title and firstfew sentences. While this is less information than is available to the algorithm, humans are goodat extrapolating from limited data, and our corpora (encyclopedia and newspaper) are structured toprovide an overview of the article in the first few sentences. The setup of this task is also meaningfulin situations where one might be tempted to use topics for corpus exploration. If topics are usedto find relevant documents, for example, users will likely be provided with similar views of thedocuments (e.g. title and abstract, as in Rexa).

For both the word intrusion and topic intrusion tasks, subjects were instructed to focus on themeanings of words, not their syntactic usage or orthography. We also presented subjects with theoption of viewing the “correct” answer after they submitted their own response, to make the tasksmore engaging. Here the “correct” answer was determined by the model which generated the data,presented as if it were the response of another user. At the same time, subjects were encouraged tobase their responses on their own opinions, not to try to match other subjects’ (the models’) selections.In small experiments, we have found that this extra information did not bias subjects’ responses.

4 Experimental results

To prepare data for human subjects to review, we fit three different topic models on two corpora.In this section, we describe how we prepared the corpora, fit the models, and created the tasksdescribed in Section 3. We then present the results of these human trials and compare them to metricstraditionally used to evaluate topic models.

4

Which topic doesn’t belong?

From Chang et al., 2009

Page 44: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Summing up

• PLSA is a simple ad-mixture model that uncovers topics (distributions over words) and soft-assigns instances to these.

• We saw parameter estimation via Expectation-Maximization.

• Next time: Introducing priors into topic models — Latent Dirichlet Allocation (LDA).

★ This will motivate sampling-based estimation

Page 45: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Summing up

• PLSA is a simple ad-mixture model that uncovers topics (distributions over words) and soft-assigns instances to these.

• We saw parameter estimation via Expectation-Maximization.

• Next time: Introducing priors into topic models — Latent Dirichlet Allocation (LDA).

★ This will motivate sampling-based estimation

Page 46: Machine Learning 2 - Northeastern University...Topic Modeling: Beyond Bag-of-Words in this paper. To enable a direct comparison, Dirichlet ... Firstly, the predictive accuracy of the

Summing up

• PLSA is a simple ad-mixture model that uncovers topics (distributions over words) and soft-assigns instances to these.

• We saw parameter estimation via Expectation-Maximization.

• Next time: Introducing priors into topic models — Latent Dirichlet Allocation (LDA).

★ This will motivate sampling-based estimation