Example

• 16,000 documents

• 100 topic

• Picked those with large p(w|z)

• Given a new document, compute and • words allocated to each topic• approximates p(zn|w)• See cases where these values are relatively

large• 4 topics found

New document?i n

Unseen document (contd.)

• Bag of words - William Randolph Hearst Foundation assigned to different topics

Applications and empirical results

• Document modeling

• Document classification

• Collaborative filtering

Document modeling

• Task: density estimation, high likelihood to unseen document

• Measure of goodness: perplexity

• Monotonically decreases in the likelihood

The experiment

Articles Terms

Scientific abstracts

5,225 28,414

Newswire articles

16,333 23,075

The experiment (contd.)

• Preprocessed– stop words– appearing once

• 10% held for training

• Trained with the same stopping criteria

Results

Overfitting in Mixture of unigrams

• Peaked posterior in the training set

• Unseen document with unseen word

• Word will have very small probability

• Remedy: smoothing

Overfitting in pLSI

• Mixture of topics allowed• Marginalize over d to find p(w)• Restriction to having the same topic

proportions as training documents• “Folding in” ignore p(z|d) parameters and

refit p(z|dnew)

• Documents can have different proportions of topics

• No heuristics

Document classification

• Generative or discriminative• Choice of features in document

classification• LDA as dimensionality reduction

technique• as LDA features)w(

The experiment

• Binary classification• 8000 documents, 15,818 words• True label not known• 50 topic• Trained SVM on the LDA features• Compared with SVM on all word features• LDA reduced feature space by 99.6%

GRAIN vs NOT GRAIN

EARN vs NOT EARN

LDA in document classification

• Feature space reduced, performance improved

• Results need further investigation• Use for feature selection

Collaborative filtering

• Collection of users and movies they prefer• Trained on observed users• Task: given unobserved user and all

movies preferred but one, predict the held out movie

• Only users who positively rated 100 movies

• Trained on 89% of data

Some quantities required…• Probability of held out movie p(w|wobs)

– For mixture of unigrams and pLSI sum out topic variable

– For LDA sum out topic and Dirichlet variables (quantity efficient to compute)

Results

Further work

• Other approaches for inference and parameter estimation

• Embedded in another model

• Other types of data

• Partial exchangeability

Example – Visual words

• Document = image• Words = image features: bars, circles• Topics = face, airplane• Bag of words = no spatial relationship

between objects

Visual words

Identifying the visual words and topics

Conclusion

• Exchangeability, De Finetti Theorem• Dirichlet distribution Generative Bag of words Independence assumption in Dirichlet

distribution - correlated topics

Implementations

• In C (by one of the authors)– http://www.cs.princeton.edu/~blei/lda-c/

• In C and Matlab– http://chasen.org/~daiti-m/dist/lda/

References• Latent Dirichlet allocation, D. Blei, A. Ng, and M. Jordan.

In Journal of Machine Learning Research, 3:993-1022, 2003

• Discovering object categories in image collections. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. MIT AI Lab Memo AIM-2005-005, February, 2005

• Correlated topic models, David Blei and John Lafferty, Advances in Neural Information Processing Systems 18, 2005.

Example

Documents

Pass - Jazz...Joe Pass JAZZ LINES REH VIDEO Example 1 Cmaj7 Example 2 Example 3 Example 4 Cmaj7 Example 5 H.O. P.o. Examplc 6 H.O. Example 7 H.O. H.O. H.O. Example 8 Crnaj7 Example

Splash Screen. Lesson Menu Objectives Vocabulary Example 1 Example 2 Example 3 Example 4 Quick Quiz

€¦ · Web viewWrite here the English version of your "Resumo". Example text, example text, example text, example text, example text, example text, example text, example text, example

Example Scripts - Cisco ... Example Scripts. The Example Scripts

Example Non Example

Example 1 Example 2 Allowed Not Allowed Example 3

Example Restricted Appraisal Report - Southsfreappraisal.com/.../docs/Example-Restricted-Industrial-Condo.pdf · Example Industrial Condominium, Doral, FL I ... Example Restricted

Trial Example Example Publication

1 The Practice Room · Grade _____ Comments: Example 11 Grade Completion Comprehension A All examples are complete. ... Example 12 Example 13 . 12 Example 14 Example 15 . 13 Example

Lesson 6 Contents Example 1Multiply Powers Example 2Multiply Monomials Example 3Divide Powers Example 4Divide Powers Example 5Divide Powers to Solve a

Example text This is an example text. Example text This is an example text. Example text This is an example text

CH19.Problems. Example 2: Example 3: Example 4:

· Web viewFounder Effect example Bottleneck Example Gene Flow example Nonrandom Mating example Mutation example Natural Selection - Selecting a variation of traits in a population

Specification by Example - By Example

Example: Electrostatic Potential Energy Example: Three

Splash Screen Lesson 3 Contents Example 1Independent Events Example 2Dependent Events Example 3Mutually Exclusive Events Example 4Inclusive Events

Kenlayer Example Kenlayer Example

Propositional Logic Example Example Example · Propositional Logic Example Example Example Consider the following sentences: 1 If it rains, the aquaphobes won’t vote 2 John will

1. HCN Example 1. Example

Property address 1 Example Block Example Road, Example ......Property address 1 Example Block, Example Road, Example Town, Example County, Example PC 8 1 1 1 Open plan living room