20
Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007) Presented by Lihan He ECE, Duke University March 3rd, 2008

Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007)

Embed Size (px)

DESCRIPTION

Nonparametric Bayes Pachinko Allocation by Li, Blei and McCallum (UAI 2007). Presented by Lihan He ECE, Duke University March 3rd, 2008. Outlines. Reviews on Topic Models (LDA, CTM) Pachinko Allocation (PAM) Nonparametric Pachinko Allocation Experimental Results Conclusions. - PowerPoint PPT Presentation

Citation preview

Page 1: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Nonparametric Bayes Pachinko Allocationby

Li, Blei and McCallum (UAI 2007)

Presented by Lihan He

ECE, Duke University

March 3rd, 2008

Page 2: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Reviews on Topic Models (LDA, CTM)

Pachinko Allocation (PAM)

Nonparametric Pachinko Allocation

Experimental Results

Conclusions

Outlines

Page 3: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Notation and terminology

• Word: the basic unit from a vocabulary of size V (includes V distinct words). The vth word is represented by

• Document: a sequence of N words.

• Corpus: a collection of M documents.

• Topic: a multinomial distribution over words.

T

V

vthw

dim

]00100[

],,,[ 21 NwwwW

},,,{ 21 MWWWD

• The words in a document are exchangeable;

• Documents are also exchangeable.

Assumptions:

Reviews on Topic Models – Notation

Page 4: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

βα, fixed unknown parameters

Random variables (w are observable)

fixed known parameterskVNM ,,,

Generative process for each document W in a corpus D:

1. Choose

2. For each of the N words in the document W

(a) Choose a topic

(b) Choose a word

dimareand),(~ kαθαDirichletθ

nw)(~ lMultinomiazn

matrixais),(~ VklMultinomiawnzn ββ

θ

)1|1( ijij zwp

Reviews on Topic Models - Latent Dirichlet Allocation (LDA)

z

wN

M

wz,,

is a document-level variable, z and w are word-level variables.

Page 5: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Limitations:

1. Because of the independence assumption implicit in the Dirichlet distribution, LDA is unable to capture the correlation between different topics.

2. Manually select the number of topics k.

)(~ αDirichletθ

,1

)1(][

0020

jijiCov

k

ii

10

0 is usually very large for the posterior

Reviews on Topic Models - Latent Dirichlet Allocation (LDA)

Page 6: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Generative process for each document W in a corpus D:

1. Choose

2. For each of the N words in the document W

(a) Choose a topic

(b) Choose a word

nw))((~ ηflMultinomiazn

matrixais),(~ VklMultinomiawnzn ββ

)}log(exp{)|(1

k

i

T iezηηzp

Reviews on Topic Models - Correlated Topic Models (CTM)

j

ii j

i

e

ef

)(

Key point: the topic proportions are drawn from a logistic normal distribution rather than a Dirichlet distribution.

θ

z

wN

M

),(~ kN

Page 7: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Limitations:

1. Limited to pair-wise correlations between topics, and the number of parameters in the covariance matrix grows as the square of the number of topics.

2. Manually select the number of topics k.

]})[log(])[log(exp{),|( 121 μ

θμ

θΣμθf

jj

T

jj

Σ

Reviews on Topic Models - Correlated Topic Models (CTM)

Page 8: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Pachinko Allocation Model (PAM)

In PAM, the concept of topics are extended to be distributions not only over words (as in LDA and CTM), but also over other topics.

The structure of PAM is extremely flexible.

Pachinko: a Japanese game, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom.

Page 9: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Four-level PAMt

t

tz

wN

M

r

r

rz

Sβ,tr ,α

SkVNM ,,,,

wzz trtr ,,,,

fixed known parameters

fixed unknown parameters

random variables

Generative process for each document W in a corpus D:

1. Choose

2. For each of the S super-topics, choose

3. For each of the N words in the document W

(a) Choose a super-topic

(b) Choose a sub-topic

(c) Choose a word

dimareand),(~ SαθαDirichletθ rrrr

)(~ )( rztt lMultinomiaz

matrixais),(~ VklMultinomiawtzn ββ

)(~ rr lMultinomiaz

r mixting weights for super-topic

t mixting weights for sub-topic

dimareand),(~ kαθαDirichletθ tttt

nw

Pachinko Allocation Model (PAM)

root

super-topic

sub-topic

word

Page 10: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Advantage:

Capture correlations between topics by a super-topic layer.

Limitation:

Manually select the number of super-topics S and the number of sub-topics k.

Pachinko Allocation Model (PAM)

Page 11: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Nonparametric Pachinko Allocation

Assumes an HDP-based prior for PAM

Based on a 5-level hierarchical Chinese restaurant process

Automatically decides the super-topic number S and the sub-topic number k

Chinese restaurant process:

P (a new customer sits at an occupied table t)

P (a new customer sits at an unoccupied table)

')'(

)(

ttC

tC

')'(

ttC

denoted as ).,)}(({ ttCCRP

Page 12: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Nonparametric Pachinko Allocation

root

super-topic

sub-topic

word customer

restaurant

category

dish

Notation:

There are infinite numbers of super-topic and sub-topic.

Both super-topic (category) and sub-topic (dish) are globally shared among all documents.

Sampling for super-topics involves two-level CRP.

Sampling for sub-topics involves three-level CRP.

Page 13: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Nonparametric Pachinko Allocation

Generative process:

A customer x arrives at restaurant rj

1. He chooses the kth entryway ejk in the restaurant from2. If ejk is a new entryway, a category cl is associated to it from3. After choosing the category, the customer makes the decision for which table he will

sit at. He chooses table tjln from 4. If the customer sits at an existing table, he will share the menu and dish with other

customers at the same table. Otherwise, he will choose a menu mlp for the new table from

5. If the customer gets an existing menu, he will eat the dish on the menu. Otherwise, he samples dish dm for the new menu from

),)},(({ 0kkjCCRP

),})',(({ 0'lj

jlCCRP

),)},,(({ 1nnljCCRP

),}),,'(({ 1'pj

pljCCRP

),}),'(({ 1'ml

mlCCRP

Page 14: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Nonparametric Pachinko Allocation

Graphical Model

Model parameters: scalars and base H. Two-level clustering of indicator variables, with first level clustering

using 2-layer CRP and second level clustering using 3-layer CRP. Atoms are all drawn from base H.

11010 ,,,,

NM

Page 15: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Experimental Results

Datasets:

20 newsgroup comp5 dataset: 5 different newsgroups, 4,836 documents, including 468,252 words and 35,567 unique words.

Rexa dataset: digital library of computer science. Randomly choose 5,000 documents, including 350,760 words and 25,597 unique words.

NIPS dataset: 1,647 abstracts of NIPS paper from 1987-1999, including 114,142 words and 11,708 unique words.

Likelihood Comparison:

Page 16: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Experimental Results

Topic Examples

20 newsgroup comp5 dataset

Page 17: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Experimental Results

Topic Examples

NIPS dataset

Nonparametric Bayes PAM discovers the sparse structure.

Page 18: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Conclusions

A nonparametric Bayesian prior for pachinko allocation is presented based on a variant of the hierarchical Dirichlet process;

Nonparametric PAM automatically discovers topic correlations as well as determining the numbers of topics at different levels;

The topic structure discovered by nonparametric PAM is usually sparse.

Page 19: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

Appendix: Hierarchical Latent Dirichlet Allocation (hLDA)

z

wN

M

Key difference from LDA:

Topics are organized as an L-level tree structure, instead of a kxV matrix.

L is prespecified manually.

β

Generative process for each document W in a corpus D:

1. Choose a path from the root of the topic tree to a leaf. The path includes L topics.

2. Choose

3. For each of the N words in the document W

(a) Choose a topic

(b) Choose a word is a V-dim vector, which is the multinomial parameter for the znth topic along the path from root to leaf, chosen by step 1.

dimareand),(~ LαθαDirichletθ

nw)(~ lMultinomiazn

)()( ),(~nn zzn ββlMultinomiaw

Page 20: Nonparametric Bayes Pachinko Allocation by  Li, Blei and McCallum (UAI 2007)

References:

W. Li, D. M. Blei, and A. McCallum. Nonparametric Bayes pachinko allocation. In Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI), 2007.

W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of International Conference on Machine Learning (ICML), 2006.

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022, 2003.

D. M. Blei and J. D. Lafferty. Correlated topic model. In Advances in Neural Information Processing Systems (NIPS), 2006.

D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (NIPS), 2004.

J. Aitchison and S. M. Shen. Logistic-normal distributions: Some properties and uses. Biometrika, vol.67, no.2, pp.261-272, 1980.