45
Unsupervised Learning Of Finite Mixture Models With Deterministic Annealing For Large-scale Data Analysis Student: Jong Youl Choi Advisor: Geoffrey Fox SALSA project h*p://salsahpc.indiana.edu Thesis Defense, January 12, 2012 School of Informatics and Computing Pervasive Technology Institute Indiana University

Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Unsupervised Learning Of Finite Mixture Models

With Deterministic Annealing For Large-scale Data Analysis"

Student: Jong Youl Choi"Advisor: Geoffrey Fox!

SALSA  project    h*p://salsahpc.indiana.edu  

Thesis Defense, January 12, 2012"

School of Informatics and Computing!Pervasive Technology Institute!

Indiana University!

Page 2: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

1!

Outline!

Jong Youl Choi (Jan 12, 2012)!

Page 3: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

2! Jong Youl Choi (Jan 12, 2012)!

Page 4: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Learning from observed data!▸  Inferring a data generating process!▸  Essential structure, abstract, summary, …!

Jong Youl Choi (Jan 12, 2012)!3!

Machine Learning Problem!

Observations! Hidden Components!

Page 5: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Component Model!–  A mixture of simple

distributions (components)!–  Hidden or latent

components!–  Serve as abstract or

summary of data!▸  Generative Model!

–  Simulate observed random sample!

▸  Convenient and flexible !▸  Model fitting is hard!

–  Too many parameters!

Jong Youl Choi (Jan 12, 2012)!4!

Finite Mixture Model (FMM)!

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

Two  Gaussian  Mixture  Model  

Page 6: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Two Finite Mixture Models!

▸  Traditional model!▸  Component ≈ Cluster!▸  Clustering, Gaussian

Mixture (GM), GTM, …!

▸  Factor model!▸  Component ≈ Generator!▸  PLSA!

Jong Youl Choi (Jan 12, 2012)!5!

!1 !K· · ·

xn

"K"1

FMM  Type-­‐1   FMM  Type-­‐2  

KX

k=1

⇡k = 1KX

k=1

ik = 1

!1 !K

x1 x2 xN

· · ·

· · ·"11 "1K "N1 "NK

Page 7: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Bayes’ Theorem!!!–  𝛳i: Parameters!i: Parameters!–  X : Observations!–  P(𝛳i) : Prior!i) : Prior!–  P(X | 𝛳i) : Likelihood!i) : Likelihood!

▸  Maximum Likelihood Estimator (MLE)!–  Used to find the most plausible 𝛳, given X !–  Maximize likelihood or log-likelihood !! Optimization problem!

Jong Youl Choi (Jan 12, 2012)!6!

Model Fitting!

ObservaAons   Hidden  Components  

Page 8: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Problems in MLE!–  Observation X is often not complete!–  Latent (hidden) variable Z exists!–  Hard to explore whole parameter space!

▸  EM algorithm!–  Random initialization 𝛳old!

–  E-step : Expectation P(Z | X, 𝛳old)!–  M-step : Maximize (log-)likelihood!–  Repeat E-,M-step until converge.!

Jong Youl Choi (Jan 12, 2012)!7!

Expectation-Maximization (EM) Algorithm!

Page 9: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Motivation!▸  Local optimum problem!

–  Easily trap in the local optimum!

–  Sensitive to initial conditions or parameters!

–  High-variance solution!–  SA, GA, …!

▸  Overfitting problem!–  Poor generalization quality!–  Directly related with

predicting power!–  Early stopping, cross

validation, …!

Jong Youl Choi (Jan 12, 2012)!8!

(Maxima and Minima, Wikipedia)!

Validation Error!

Training Error!

Underfitting! Overfitting!

(Overfitting, Wikipedia)!

Page 10: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Solve FMM with DA!–  Avoid local optimum problem!–  Avoid overfitting problem!

▸  Present DA applications!–  GTM with DA (DA-GTM)!–  PLSA with DA (DA-PLSA)!

▸  Experimental results!–  Data visualization!–  Text mining!

Jong Youl Choi (Jan 12, 2012)!9!

Contributions!

Page 11: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

10! Jong Youl Choi (Jan 12, 2012)!

Page 12: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Optimization!–  Gradually lowering numeric

temperature!–  No stochastic process !

▸  Local optimum avoidance!–  Tracing the global solution by

changing level of smoothness !–  Smoothed bumpy!

▸  Principle of Maximum Entropy!–  A solution with maximum entropy!–  Minimize the free energy F of log-

likelihood!–  Eventually, we will have

maximized log-likelihood!

Jong Youl Choi (Jan 12, 2012)!11!

Deterministic Annealing (DA)!

Page 13: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Finite Mixture Model with EM!

Jong Youl Choi (Jan 12, 2012)!12!

E-step!

M-step!

Update Log-Likelihood!

Converged!No!

Yes!

Page 14: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Finite Mixture Model with DA!

Jong Youl Choi (Jan 12, 2012)!13!

Update Temperature!

E-step!

M-step!

Update Free Energy!

Converged!

Last?!

No!Yes!

No!

Set Temp High!

•  Minimize free energy!•  Free energy F =

f(Temp, Entropy of Log-Likelihood)!

•  Annealing (High Low)!•  High temperature!

-  Soft (or fuzzy) association !-  Smooth cost function!

•  Low temperature!-  Hard association!-  Bumpy cost function!-  Revealing full complexity!

Page 15: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Free Energy !

!!▸  General form for Finite Mixture Model!

–  Cost function: !

Jong Youl Choi (Jan 12, 2012)!14!

Free Energy for Finite Mixture Model!

Zn =

KX

k=1

exp

✓�dnkT

-  D : expected cost <dnk>!-  S : Shannon entropy!-  T : computational temperature!-  Zn : partition function!

= �TNX

n=1

lnZn

F = D � TS

FFMM = �T

NX

n=1

log

KX

k=1

{c(n, k) p(xn|yk)}1T

dnk = � log p(xn|yk)

Page 16: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

15! Jong Youl Choi (Jan 12, 2012)!

Page 17: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Dimension Reduction!

▸  Simplification, feature selection/extraction, visualization, etc. !

▸  Preserve the original data’s information as much as possible in lower dimension!

16!

High  Dimensional  Data   Low  Dimensional  Data  PubChem  Data  (166  dimensions)  

Jong Youl Choi (Jan 12, 2012)!

Page 18: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  An algorithm for dimension reduction !–  Find an optimal K latent variables in a latent space !–  f is a non-linear mappings!–  Strict Gaussian mixture model!–  EM model fitting!

▸  DA optimization can improve the fitting process!Jong Youl Choi (Jan 12, 2012)!17!

Generative Topographic Mapping!

Data Space (D dimension)Latent Space (L dimension)

zk yk

xn

f

K  latent  points  (Components)  N  data  points  

Page 19: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Dim1

Dim2

−1.5

−1.0

−0.5

0.0

0.5

1.0

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●● ●

●●

●● ●●● ●●

●●

● ●●●

●● ●●● ●

●●

●●

● ●● ●

●● ●●

●●

●●●

● ● ●●●

●● ● ●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

● ●●●●●

●●

●●

●●

●●

●● ●●●●●

●●●

●●

●●

● ● ●●●

●●

●● ●

●●●

● ●

●● ● ●

●● ●●

● ●●

●● ● ●●●●

● ●

● ●●

●●●

●●

●●

● ●● ●

●●

●● ●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●

● ●

●●

●● ●

●●●

●●

●●

●●● ●●

●●●

● ●● ●

●● ●

● ●●●

● ●

●●

● ●

● ●●

●●

●●●

●●

●●●

● ●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●● ●●● ●●

●●

● ●●

●●

●● ●●

●●

●●

●●

● ●● ●

● ●●

●●

●●●

● ●●

●●

●● ● ●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●

●● ● ●●

●●

● ●

●●●

● ●

●●

● ●

●● ●

●●

● ●

●● ● ●●●●

●●

● ●●

●●●

●●

● ●● ●

● ●●

●●

●●●

●●●

●●●●

●●●

●●

● ●

●●

● ●● ●

●●

●●

●●● ●

●●● ●

● ●

●●

●●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

−2 −1 0 1 2

Maximum Log−Likelihood = 1721.554

Dim1

Dim

2

−1.0

−0.5

0.0

0.5

1.0

●●

●●●

●●●

● ●●

●●

●●

● ●

● ●●

●●

●●● ●

●●● ●●

●●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

● ●●

● ●

● ●●

●●

●●

●●

●● ●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

−1.0 −0.5 0.0 0.5 1.0

▸  Computational complexity is O(KN), where !–  N is the number of data points !–  K is the number of latent variables or clusters. K << N !

▸  Efficient, compared with MDS which is O(N2)!▸  Produce more separable map (right) than PCA (left)!

Jong Youl Choi (Jan 12, 2012)!18!

Advantages of GTM!

PCA   GTM  

Oil flow data!-. 1000 points!-. 12 Dimensions!-. 3 Clusters!

Page 20: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  GTM Model Setting!–  Strict Gaussian assumption!

–  Constant mixing weight!

19!

Free Energy for GTM!

Jong Youl Choi (Jan 12, 2012)!

FFMM = �T

NX

n=1

log

KX

k=1

{c(n, k) p(xn|yk)}1T

p(xn|yk) = N (xn|µk,�k)

c(n, k) =1

K

Page 21: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

GTM with Deterministic Annealing!

20!

�TNX

n=1

ln

(✓1

K

◆ 1T

KX

k=1

p(xn|yk)1T

)NX

n=1

ln

(1

K

KX

k=1

p(xn|yk)

)

Maximize  log-­‐likelihood  L   Minimize  free  energy  F  

  Very  sensiAve  to  parameters  

  Trapped  in  local  opAma    Faster  

  Less  sensiAve  to  poor  parameters      Avoid  local  opAmum    Require  more  computaAonal  Ame  

When  T  =  1,  L  =  -­‐F.  

EM-­‐GTM    (TradiAonal  method)  

DA-­‐GTM    (New  algorithm)  

Objec)ve  Func)on  

Op)miza)on  

Pros  &  Cons  

Jong Youl Choi (Jan 12, 2012)!

Page 22: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Traditional method : static cooling schedule!▸  Adaptive cooling, a dynamic cooling schedule!

–  Able to adjust the problem on the fly!–  Move to a temperature at which F may change!

Jong Youl Choi (Jan 12, 2012)!21!

Cooling Schedules!

Iteration

Temp

2

3

4

5

200 400 600 800 1000

Linear  

Iteration

Temp

1

2

3

4

5

200 400 600 800 1000

ExponenAal  

IteraAons  

Tempe

rature  

Iteration

Temp

1

2

3

4

5

200 400 600 800 1000 1200

AdapAve  

Page 23: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Discrete behavior of DA!–  In some temperatures, the free energy is stable!–  At a specific temperature, start to explode, which is

known as critical temperature Tc!

▸  Critical temperature Tc!–  Free energy F is drastically changing at Tc!

–  Second derivative test : Hessian matrix loose its positive definiteness at Tc!

–  det ( H ) = 0 at Tc , where!

Jong Youl Choi (Jan 12, 2012)!22!

Phase Transition!

H =

2

64H11 · · · H1K...

...HK1 · · · HKK

3

75 Hkk =�2F

�yk�yTrk

Hkk0 =�2F

�yk�yTrk0

Page 24: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

DA-GTM with Adaptive Cooling!

23!

Iteration

����

kelih

ood

valu

e

����

����

����

����

����

���� ���� ���� ����

TypeLikelihood

Everage Log-Likelihoodof EM-GTM

Iteration

Tem

pera

ture

1

2

3

4

5

6

7●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

2000 4000 6000 8000

Type● Temp

Starting Temperature

1st Critical Temperature

Jong Youl Choi (Jan 12, 2012)!

Oil flow data (1000 points with 12 Dimensions)!

Progress of log-likelihood! Adaptive changes in cooling schedule!

Page 25: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Jong Youl Choi (Jan 12, 2012)!24!

DA-GTM Result!

Start Temperature

Log−

Like

lihoo

d (ll

h)

0.0

0.5

1.0

1.5

2.0

N/A 5 7 9

TypeEMAdaptiveExp−AExp−B

(α = 0.99)!

Start Temperature (1st Tc = 4.64)!

(α = 0.95)!

5.0! 7.0! 9.0!N/A!

Log-

Like

lihoo

d!

Oil flow data (1000 points with 12 Dimensions)!

Page 26: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Conclusion!

▸  GTM with Deterministic Annealing (DA-GTM)!–  Overcome short-comes of traditional EM method !–  Avoid local optimum !–  Robust against poor initial parameters!

▸  Phase-transitions in DA-GTM!–  Use Hessian matrix for detection!–  Eigenvalue computation!

▸  Adaptive cooling schedule!–  New convergence approach!–  Dynamically determine next convergence point!

25! Jong Youl Choi (Jan 12, 2012)!

Page 27: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

26! Jong Youl Choi (Jan 12, 2012)!

Page 28: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Corpus Analysis!

▸  Polysems!–  A word with multiple

meanings!–  E.g., ‘thread’!

▸  Synonyms!–  Different words that

have similar meaning, a topic!

–  E.g., ‘car’ and ‘automotive’!

Jong Youl Choi (Jan 12, 2012)!27!

Wa    

Wb    

Wc    

Wb    

Wc    

Wa    

Wb    

wa    

Wc    

Wd    

Document   Word  

Corpus  

Page 29: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Topic model !–  Assume latent K topics generating words!–  Each document is a mixture of K topics!

▸  FMM Type-2!–  The original proposal used EM for model fitting!

Jong Youl Choi (Jan 12, 2012)!28!

Probabilistic Latent Semantic Analysis (PLSA)!

Doc 1! Doc 2! Doc N!

Topic 1! Topic K!Topic 2! …!

…!

Page 30: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

An Example of DA-PLSA!Topic 1 Topic 2 Topic 3 Topic 4 Topic 5

percent

million

year

sales

billion

new

company

last

corp

share

stock

market

index

million

percent

stocks

trading

shares

new

exchange

soviet

gorbachev

party

i

president

union

gorbachevs

government

new

news

bush

dukakis

percent

i

jackson

campaign

poll

president

new

israel

percent

computer

aids

year

new

drug

virus

futures

people

two

Top  10  list  of  the  best  words  of  the  AP  news  dataset  for  30  topics.  Processed  by  DA-­‐PLSA  and  shown  only  5  topics  among  30  topics    

Jong Youl Choi (Jan 12, 2012)!29!

(AP:  2,246  documents  &  10,473  words)  

Page 31: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Predictive power !–  Maintain good performance on unseen data!–  A generalized model is preferable!

Jong Youl Choi (Jan 12, 2012)!30!

Overfitting Problem!

Model  

Training!

Query(Inference)!

Sample!

Unseen Data!

Queried  Documents  

 

Page 32: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  PLSA Model Setting!–  Use Multinomial distribution!

–  Flexible mixing weight!

31!

Free Energy for PLSA!

Jong Youl Choi (Jan 12, 2012)!

FFMM = �T

NX

n=1

log

KX

k=1

{c(n, k) p(xn|yk)}1T

c(n, k) = nk

p(xn|yk) = Multi(xn|✓k)

Page 33: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  DA can control smoothness!–  Smoothed solution at high temperature!–  Getting specific as annealing!–  Early stopping to get a smoothed (general) model!

▸  Stop condition!–  Use a V-fold cross validation method!–  Measure total perplexity, sum of log-likelihood of both

training set and testing set !

▸  Tempered-EM, proposed by Hofmann (the original author of PLSA), but annealing is done in a reversed way!

Jong Youl Choi (Jan 12, 2012)!32!

Overfitting Avoidance in DA!

Page 34: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

&KDQJHV�RI�/RJï/LkHOLKRRG

Temperature

/RJïOLkHOLKRRG

���

���

���

��

��

��

�5�������

Training Set

Testing Set

Mix B

Mix C

A B C D

Annealing in DA-PLSA!

Annealing progresses from high temp to low temp!

Over-fitting at Temp=1!

Improved fitting quality with training set during annealing!

Early-stop temperatures depending on schemes:!

A (a=0.0, b=1.0)!B (a=0.5, b=0.5)!C (a=0.9, b=0.1)!D (a=1.0, b=0.0)!

Jong Youl Choi (Jan 12, 2012)!33!

Testing Set!

Training Set!

Mix C!

Mix B!

Page 35: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Predicting Power in DA-PLSA!Log of word probabilities of AP data (100 topics for 10,473 words)!

Early stop (Temp = 49.98)!

20 40 60 80 100

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000ï40

ï35

ï30

ï25

ï20

ï15

ï10

ï5

0

Wor

d In

dex!

Over-fitting (Temp = 1.0)!

20 40 60 80 100

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000ï40

ï35

ï30

ï25

ï20

ï15

ï10

ï5

0

Wor

d In

dex!

Jong Youl Choi (Jan 12, 2012)!34!

High  Word  Probability  

Low  Word  Probability  

Page 36: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

AP data with DA-PLSA!

Jong Youl Choi (Jan 12, 2012)!35!

Test  Set  Only   Training  Set  Only  

(AP:  2,246  documents  &  10,473  words)  

Page 37: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

NIPS data with DA-PLSA!

Jong Youl Choi (Jan 12, 2012)!36!

Maximum Sum of Log−Likelihood of Training And Testing Sets

Latent space dimensions

Log−

likel

ihoo

d

−3000

−2500

−2000

●●

1 5 10 50 100 500

Method● DA−Train

DA−TestEM−TrainEM−Test

Maximum Sum of Log−Likelihood of Training And Testing Sets

Latent space dimensions

Log−

likel

ihoo

d−10000

−8000

−6000

−4000

−2000

●●

●●

●●

1 5 10 50 100 500

Method● DA−Train

DA−TestEM−TrainEM−Test

Test  Set  Only   Training  Set  Only  

(NIPS:  1,500  doc  &  12,419  words)    

Page 38: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

DA-PLSA with DA-GTM!

Corpus!(Set of documents)!

Embedded Corpus in 3D!

Corpus in K-dimension!

DA-PLSA!

DA-GTM!

Jong Youl Choi (Jan 12, 2012)!37!

Page 39: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

AP Data Top Topic Words!▸  In the previous picture, we found among 500

topics:!Topic 331! Topic 435! Topic 424! Topic 492! Topic 445! Topic 406!

lately !oferrell !

mandate !ACK!fcc !

cardboard !commuter !

exam !kuwaits !fabrics!

lately oferrell !ACK!fcc !

mandate !cardboard !

exam !commuter !

fabrics !corroon !

mandate !kuwaits !

cardboard !commuter !

ACK!fcc !

lately !exam !fabrics !oferrell !

mandate !kuwaits !

cardboard !commuter !

lately !ACK!exam !

fcc !oferrell !fabrics!

mandate !lately !ACK!

cardboard !fcc !

commuter !oferrell !exam !

kuwaits !fabrics!

plunging !referred !informal !

Anticommu. !origin !details !relieve !

psychologist !lately !

thatcher!

ACK : acknowledges !Anticommu. : anticommunist!

Jong Youl Choi (Jan 12, 2012)!38!

Page 40: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

EM vs. DA-{GTM, PLSA}!

�TNX

n=1

ln

(✓1

K

◆ 1T

KX

k=1

p(xn|yk)1T

)NX

n=1

ln

(1

K

KX

k=1

p(xn|yk)

)

Objec)v

e  Func)o

ns  

EM   DA  

Maximize  log-­‐likelihood  L   Minimize  free  energy  F  OpAmizaAon  

  Very  sensiAve    Trapped  in  local  opAma    Faster  

  Less  sensiAve  to  an  iniAal  condiAon    Find  global  opAmum    Require  more  computaAonal  Ame  

Pros  &  Cons  

39!

Note:  When  T  =  1,  L  =  -­‐F.    This  implies  EM  can  be  treated  as  a  special  case  in  DA  

GTM  

PLSA   �TNX

n=1

lnKX

k=1

{ nkMulti(xn|yk)}1T

NX

n=1

lnKX

k=1

{ nkMulti(xn|yk)}

Jong Youl Choi (Jan 12, 2012)!

Page 41: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

I.  Finite Mixture Model (FMM)!

II.  Model Fitting with Deterministic Annealing (DA)!

III.  Generative Topographic Mapping with Deterministic Annealing (DA-GTM)!

IV.  Probabilistic Latent Semantic Analysis with Deterministic Annealing (DA-PLSA)!

V.  Conclusion And Future Work!

40! Jong Youl Choi (Jan 12, 2012)!

Page 42: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Finite Mixture Model (FMM) problems!–  FMM-1 and FMM-2!–  Maximize log-likelihood for model fitting (MLE)!–  Traditional solutions use EM!

▸  Solve FMMs with DA!–  Avoid local optimum problem!–  Find generalized (smoothed) solution!

▸  Enhance and develop two data mining algorithms!–  DA-GTM!–  DA-PLSA!

Jong Youl Choi (Jan 12, 2012)!41!

Conclusion!

Page 43: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  Determine number of components!–  Help to choose the right number of clusters, topics,

the number of lower dimension, …!–  Bayesian model selection, minimum description

length (MDL), Bayesian information criteria (BIC), …!–  Need to develop in a DA framework!

▸  Quality study for DA-PLSA!–  Comparison with LDA!–  Precision and recall measurements!

▸  Performance study for data-intensive analysis!–  MPI, MapReduce, PGAS, …!

Jong Youl Choi (Jan 12, 2012)!42!

Future Work!

Page 44: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

▸  [CCPE] J. Y. Choi, S.-H. Bae, J. Qiu, B. Chen, and D. Wild. Browsing large scale cheminformatics data with dimension reduction. Concurrency and Computation: Practice and Experience, 2011.!

▸  [ECMLS] J. Y. Choi, S.-H. Bae, J. Qiu, G. Fox, B. Chen, and D. Wild. Browsing large scale cheminformatics data with dimension reduction. In Workshop on Emerging Computational Methods for Life Sciences (ECMLS), in conjunction with the 19th ACM International Symposium on High Performance Distributed Computing (HPDC) 2010, HPDC ’10, pages 503–506, Chicago, Illinois, June 2010. ACM.!

▸  [CCGrid] J. Y. Choi, S.-H. Bae, X. Qiu, and G. Fox. High performance dimension reduction and visualization for large high-dimensional data analysis. Cluster Computing and the Grid, IEEE International Symposium on, 0:331–340, 2010. !

▸  [ICCS] J. Y. Choi, J. Qiu, M. Pierce, and G. Fox. Generative Topographic Mapping by Deterministic Annealing. In Proceedings of the 10th International Conference on Computational Science and Engineering (ICCS 2010), 2010.!

Jong Youl Choi (Jan 12, 2012)!43!

Related Publications!

Page 45: Unsupervised Learning Of Finite Mixture Models With ...cgl.soic.indiana.edu/presentations/Jong_damix.v8.pptx.pdf · 7 2000 4000 6000 8000 Type Temp Starting Temperature 1st Critical

Thank you!!"

Question?"

Email me at [email protected]"

Jong Youl Choi (Jan 12, 2012)!44!