Initializations for Nonnegative Matrix Factorizationslrace/nmfinitializations.pdf ·...

Initializations for Nonnegative Matrix Factorization

Shaina Race

North Carolina State University

slrace at ncsu.edu

March 26, 2012

Shaina Race (NCSU) NMF Initializations March 26, 2012 1 / 17

Motivation

NMF is a nonconvex optimization problem with inequality constraints

minW ,H||A−WH||

W ,H ≥ 0

and iterative methods are necessary for its solution.

Motivation

Current NMF algorithms typically converge slowly and then at localminima [1].

The majority of algorithms in the literature initialize W and Hrandomly.

Results of any given algorithm are not unique with different randominitializations. Several instances must be run so that the best solutioncan be chosen. This is expensive.

Motivation

Initializations from matrix A

Meyer et al. [2] suggested method called ”random Acol” whichinitializes each column of the matrix W by taking the average of prandom rows of A.

If A is sparse this makes more sense than creating random dense W

Pro: A very inexpensive technique

Con: A ”minimal” upgrade, solution is still not unique.

”Centroid” Initialization

Wild et al. suggested the columns of W by first clustering thecolumns of A and using the k centroid vectors from these clusters asthe columns of W.

Pros: Improves error of factorization and number of iterationssubstantially over random initialization. Result of NMF algorithm canbe made unique. Intuitive.

Cons: Computationally more complex because it involves additionalclustering (specifically Spherical k-means was suggested, which itselfmust be initialized!).

”Centroid” Initialization

Wild et al. suggested the columns of W by first clustering thecolumns of A and using the k centroid vectors from these clusters asthe columns of W.

Pros: Improves error of factorization and number of iterationssubstantially over random initialization. Result of NMF algorithm canbe made unique. Intuitive.

Cons: Computationally more complex because it involves additionalclustering (specifically Spherical k-means was suggested, which itselfmust be initialized!).

Exact NMF

Suppose a matrix A had an entirely positive singular valuedecomposition of rank k. Then, using Uk and V T

k to denote thesingular vectors associated with nontrivial singular values, we couldprecisely factor

A = WH

whereW = Uk

√S and H =

√SV T

Then k is defined to be the nonnegative rank of the matrix A.

Precise factorization is wishful thinking in practice!

Exact NMF

A = WH

whereW = Uk

√S and H =

√SV T

Exact NMF

A = WH

whereW = Uk

√S and H =

√SV T

NNDSVD Initialization

First decompose matrix A into its rank k SVD.

A =k∑

σiCi where Ci = uivTi

Then decompose each C into positive and negative components,

Ci = C+i − C−

Obviously, C+i is closest nonnegative matrix to Ci in F-norm.

A =k∑

Ci = C+i − C−

A =k∑

Ci = C+i − C−

Lemma 1:

Consider any matrix C ∈ <m×n such that rank(C ) = 1, and writeC = C+ − C−. Then rank(C+), rank(C−) ≤ 2.

Proof:

C = xyT = (x+−x−)(y+−y−)T = (x+y+T +x−y−T )−(x+y−T +x−y+T )

=⇒ C+ = x+y+T + x−y−T and C− = x−y+T + x+y−T

Lemma 1:

Proof:

Lemma 1:

Proof:

C+ is nonnegative, thus Perron-Frobenius ensures that its first leftand right singular vectors will also be nonnegative. Because of specialstructure of C+ it turns out its second left and right singular vectorsare also positive.

Proof:

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

||y±||be the normalized x+, x−, y+, y−

Let µ± = ||x±||||y±||.Then,

C+ = µ+x+y+T

+ µ−x−y−T

is the (nonnegative) SVD of C+.

The term involving the max of µ+, µ− is referred to as the ”dominantsingular triplet”

C+ is nonnegative, thus Perron-Frobenius ensures that its first leftand right singular vectors will also be nonnegative. Because of specialstructure of C+ it turns out its second left and right singular vectorsare also positive.Proof:

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

Let µ± = ||x±||||y±||.Then,

C+ = µ+x+y+T

+ µ−x−y−T

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

Let µ± = ||x±||||y±||.Then,

C+ = µ+x+y+T

+ µ−x−y−T

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

Let µ± = ||x±||||y±||.

C+ = µ+x+y+T

+ µ−x−y−T

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

Let µ± = ||x±||||y±||.Then,

C+ = µ+x+y+T

+ µ−x−y−T

C+ = x+y+T + x−y−T

Let x± =x±

||x±||and y± =

Let µ± = ||x±||||y±||.Then,

C+ = µ+x+y+T

+ µ−x−y−T

So. Let Cj = ujvTj , where ui and vTi are left and right singular

vectors of the matrix to be factored, A.

C+j is nearest positive approximation to Cj . C+

j has rank 2.

Compute SVD of C+j and let µj , xj , yj be the dominant singular

triplet.Initialize first column,row vectors in W,H respectively using usingdominant singular triplets of A. Initialize subsequent column,rowvectors in W,H respectively using using dominant singular triplets ofC+j .

W (:, 1) =√σ1u1

H(1, :) =√σ1v

W (:, j) =√σjµjxj

H(j , :) =√σjµ1y

vectors of the matrix to be factored, A.C+j is nearest positive approximation to Cj . C+

j has rank 2.

W (:, 1) =√σ1u1

H(1, :) =√σ1v

H(j , :) =√σjµ1y

j has rank 2.

triplet.

Initialize first column,row vectors in W,H respectively using usingdominant singular triplets of A. Initialize subsequent column,rowvectors in W,H respectively using using dominant singular triplets ofC+j .

W (:, 1) =√σ1u1

H(1, :) =√σ1v

H(j , :) =√σjµ1y

j has rank 2.

W (:, 1) =√σ1u1

H(1, :) =√σ1v

H(j , :) =√σjµ1y

NMF solution becomes unique.Number of iterations reduces, sometimes drastically.The computation time is generally reduced, even with the addedcomputational effort for the SVD.We can bound the error of the method.

Performance of Algorithm is convincing for some data sets, not soconvincing for others.

Results from NNDSVD Initialization

Interesting Results

I used the Centroid initialization on W using the centroids of theactual clusters (i.e. the generally unknown answers) in a documentset. I then ran ACLS. I used the resulting H to determine clusters (bylooking at the maximum element in each column) with 79% success.While this is an improvement upon the average 60% success rateobtained by NMF with random initialization, it still suggests thatmaybe the cluster information might be better used another way?

On the same dataset, I simply used the initialization of H (withoutany updates) to cluster the documents in the same manner with74%success. After 500 multiplicative update iterations (using hoyer’ssnmf code) This accuracy was reduced to 23%!!. After convergencefrom ALCS, accuracy was reduced to 69% success!

References

E. Gallopoulos C. Boutsidis.Svd based initialization: A head start for nonnegative matrixfactorization.Pattern Recognition, 41:1350–1362, 2008.

David Duling Amy N. Langville Carl D. Meyer Russell Albright,James Cox.Algorithms, initializations, and convergence for the nonnegative matrixfactorization.Math 81706, North Carolina State University.

The End

Initializations for Nonnegative Matrix Factorizationslrace/nmfinitializations.pdf ·...

Documents

Hybrid Projective Nonnegative Matrix Factorization with ... · Hybrid Projective Nonnegative Matrix Factorization with Drum Dictionaries for Harmonic/Percussive Source ... The main

Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis

Nonnegative Matrix Factorization for Clustering · Nonnegative Matrix Factorization for Clustering Haesun Park hpark@cc.gatech.edu School of Computational Science and Engineering

COMPUTING A NONNEGATIVE MATRIX SANJEEV ARORA AND · 2020-02-09 · SANJEEV ARORA nonnegative matrix nonnegative matrices of size, denotes the Frobenius norm; ... RAVI KANNAN‡, AND

Robust Collaborative Nonnegative Matrix Factorization for ...bioucas/files/ieee_tgrs_2016_R_CoNMF.pdf · LI et al.: ROBUST COLLABORATIVE NONNEGATIVE MATRIX FACTORIZATION FOR HYPERSPECTRAL

Bayesian Nonnegative Matrix Factorization with Stochastic ...jwp2128/Teaching/E6892/papers/chapter.pdf · Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference

1 UNC, Stat & OR Nonnegative Matrix Factorization

Algorithms, Initializations, and Convergence for the ...meyer.math.ncsu.edu/Meyer/PS_Files/NMFInitAlgConv.pdf · Algorithms, Initializations, and Convergence for the Nonnegative Matrix

Nonnegative Matrix Factorization: Algorithms and Applications · 2018-05-25 · Nonnegative Matrix Factorization: Algorithms and Applications Haesun Park hpark@cc.gatech.edu School

Weighted Nonnegative Matrix Factorization and Face Feature ...perso.uclouvain.be/paul.vandooren/publications/BlondelHV07.pdf · Weighted Nonnegative Matrix Factorization and Face

A Topographical Nonnegative Matrix Factorization algorithmgeza.kzoo.edu/~erdi/IJCNN2013/HTMLFiles/PDFs/P143-1244.pdf · A Topographical Nonnegative Matrix Factorization algorithm

Nonnegative matrix factorization for segmentation analysis · Nonnegative matrix factorization for segmentation analysis Research thesis In Partial Fulﬂllment of the Requirements

Symmetric Nonnegative Matrix Factorization for Graph Clustering

Nonnegative Matrix and Tensor Factorizations

1 Hyperspectral Unmixing Via L Sparsity-constrained Nonnegative Matrix …users.cecs.anu.edu.au/~arobkell/papers/tgrs11.pdf · Sparsity-constrained Nonnegative Matrix Factorization

Algorithms and Applications for Approximate Nonnegative Matrix

Nonnegative Matrix Factorization - Complexity, Algorithms ... · Nonnegative Matrix Factorization, Neural Computation 2012. Householder XIX Nonnegative Matrix Factorization: Complexity,

Collaborative filtering using orthogonal nonnegative matrix

Nonnegative Matrix Factorization for Semi-supervised ...cseweb.ucsd.edu/~yoc002/paper/mlj_nmf.pdf · Nonnegative Matrix Factorization for Semi-supervised Dimensionality Reduction

Nonnegative Matrix Factorization for Spectral Data …users.wfu.edu/plemmons/papers/PPP_LAA_sub.pdf · Nonnegative Matrix Factorization for Spectral Data Analysis V. Paul Pauca⁄