21
NMF with Python http://amath.unist.ac.kr 2016.01.21~22 Kyunghoon Kim

NMF with python

Embed Size (px)

Citation preview

Page 1: NMF with python

NMF with Python

http://amath.unist.ac.kr

2016.01.21~22

Kyunghoon Kim

Page 2: NMF with python
Page 3: NMF with python

Why use Low Rank Approximation?

• Data Compression and Storage when k << r

• Remove noise and uncertainty

⇒ improved performance on data mining task of retrieval(e.g., find similar items)

⇒ improved performance on data mining task of clustering

http://langvillea.people.cofc.edu/NISS-NMF.pdf

Page 4: NMF with python

Weakness of Low Rank Approximation

• storage are usually completely dense

• interpretation of basis vectors is difficult due to mixed signs

• good truncation point k is hard to determine

Page 5: NMF with python

Weakness of Low Rank Approximation

• storage are usually completely dense

• interpretation of basis vectors is difficult due to mixed signs

• good truncation point k is hard to determine

All create basis vectors that are mixed in sign. Negative elements make interpretation difficult!

Page 6: NMF with python

use low-rank approximation with nonnegative factors to improve weaknesses of truncated-SVD

Ak = Uk⌃kVTk

Ak = WkHk

nonneg nonneg

nonneg nonnegnonneg

mixed mixed

IDEA of NMF

Page 7: NMF with python
Page 8: NMF with python

columns of W are the underlying basis vectors,

i.e., each of the m columns of A can be built from r columns of W.

A

Interpretation of NMF

Page 9: NMF with python

A

columns of H give the weights associated witheach basis vector.

Ake1 = WkH⇤12

664

...w1...

3

775

2

664

...w2...

3

775

2

664

...wk...

3

775h11 h21 hk1+ · · ·++=

Page 10: NMF with python
Page 11: NMF with python
Page 12: NMF with python
Page 13: NMF with python
Page 14: NMF with python

• basis vectors are not ⊥ ⇒ can have overlap of topicswi

• can restrict W, H to be sparse

• immediate interpretation

large ’s ⇒ basis vector is mostly about terms j how much doc1 is pointing in the “direction” of topicvector

• NMF is algorithm-dependent: W, H not unique

Properties of NMF

wij

hi1

wi

wi

Page 15: NMF with python
Page 16: NMF with python

A ⇡ WH

W,H � 0s.t.

min||A�WH||2F

Mean squared error objective function

Page 17: NMF with python

A ⇡ WH

W,H � 0s.t.

min||A�WH||2F

Mean squared error objective function

Nonlinear Optimization Problem

• convex in W or H, but not both ⇒ tough to get global min

• huge # unknowns: mk for W and kn for H

• above objective is one of many possible

Page 18: NMF with python

http://math.stackexchange.com/questions/393447/why-does-the-non-negative-matrix-factorization-problem-non-convex

Page 19: NMF with python
Page 20: NMF with python

http://nimfa.biolab.si/

Page 21: NMF with python

import nimfanmf = nimfa.Nmf(matrix, seed="random_vcol", rank=2, max_iter=2000)

fit = nmf()W = fit.basis()H = fit.coef()

Python Library; NIMFA

pip install nimfaInstallation

Code http://nimfa.biolab.si/