Guaranteed Learning of Latent Variable Models through...

Guaranteed Learning of Latent Variable Models

through Spectral and Tensor Methods

Anima Anandkumar

U.C. Irvine

Application 1: Clustering

Basic operation of grouping data points.

Hypothesis: each data point belongs to an unknown group.

Application 1: Clustering

Basic operation of grouping data points.

Hypothesis: each data point belongs to an unknown group.

Probabilistic/latent variable viewpoint

The groups represent different distributions. (e.g. Gaussian).

Each data point is drawn from one of the given distributions. (e.g.Gaussian mixtures).

Application 2: Topic Modeling

Document modeling

Observed: words in document corpus.

Hidden: topics.

Goal: carry out document summarization.

Application 3: Understanding Human Communities

Social Networks

Observed: network of social ties, e.g. friendships, co-authorships

Hidden: groups/communities of actors.

Application 4: Recommender Systems

Recommender System

Observed: Ratings of users for various products, e.g. yelp reviews.

Goal: Predict new recommendations.

Modeling: Find groups/communities of users and products.

Application 5: Feature Learning

Feature Engineering

Learn good features/representations for classification tasks, e.g.image and speech recognition.

Sparse representations, low dimensional hidden structures.

Application 6: Computational Biology

Observed: gene expression levels

Goal: discover gene groups

Hidden variables: regulators controlling gene groups

How to model hidden effects?

Basic Approach: mixtures/clusters

Hidden variable h is categorical.

Advanced: Probabilistic models

Hidden variable h has more general distributions.

Can model mixed memberships.

x1 x2 x3 x4 x5

This talk: basic mixture model. Tomorrow: advanced models.

Challenges in Learning

Basic goal in all mentioned applications

Discover hidden structure in data: unsupervised learning.

Challenge: Conditions for Identifiability

When can model be identified (given infinite computation and data)?

Does identifiability also lead to tractable algorithms?

Challenge: Efficient Learning of Latent Variable Models

Maximum likelihood is NP-hard.

Practice: EM, Variational Bayes have no consistency guarantees.

Efficient computational and sample complexities?

Challenge: Efficient Learning of Latent Variable Models

Maximum likelihood is NP-hard.

Practice: EM, Variational Bayes have no consistency guarantees.

Efficient computational and sample complexities?

In this series: guaranteed and efficient learning through spectral methods

What this talk series will cover

This talk

Start with the most basic latent variable model: Gaussian mixtures.

This talk

Basic spectral approach: principal component analysis (PCA).

This talk

Beyond correlations: higher order moments.

This talk

Tensor notations, PCA extension to tensors.

This talk

Guaranteed learning of Gaussian mixtures.

This talk

Introduce topic models. Present a unified viewpoint.

This talk

Tomorrow: using tensor methods for learning latent variable modelsand analysis of tensor decomposition method.

This talk

Tomorrow: using tensor methods for learning latent variable modelsand analysis of tensor decomposition method.

Wednesday: implementation of tensor methods.

Resources for Today’s Talk

“A Method of Moments for Mixture Models and Hidden MarkovModels.” by A. , D. Hsu, and S.M. Kakade. Proc. of COLT, June2012.

“Tensor Decompositions for Learning Latent Variable Models.” by A.,R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky, Oct. 2012.

Resources available athttp://newport.eecs.uci.edu/anandkumar/MLSS.html

Outline

1 Introduction

2 Warm-up: PCA and Gaussian Mixtures

3 Higher order moments for Gaussian Mixtures

4 Tensor Factorization Algorithm

5 Learning Topic Models

6 Conclusion

Warm-up: PCA

Optimization problem

For (centered) points xi ∈ Rd , find projection P

with Rank(P ) = k s.t.

minP∈Rd×d

i∈[n]

‖xi − Pxi‖2.

Warm-up: PCA

minP∈Rd×d

i∈[n]

‖xi − Pxi‖2.

Result: If S = Cov(X) and S = UΛU⊤ is eigen decomposition, we haveP = U(k)U

⊤(k), where U(k) are top-k eigen vectors.

Warm-up: PCA

minP∈Rd×d

i∈[n]

‖xi − Pxi‖2.

Result: If S = Cov(X) and S = UΛU⊤ is eigen decomposition, we haveP = U(k)U

⊤(k), where U(k) are top-k eigen vectors.

By Pythagorean theorem:∑

i ‖xi − Pxi‖2 =∑

i ‖xi‖2 −∑

i ‖Pxi‖2.Maximize: 1

i ‖Pxi‖2 = 1n

Pxix⊤i P

= Tr[PSP⊤].

Review of linear algebra

For a matrix S, u is an eigenvector if Su = λu and λ is eigenvalue.

For symm. S ∈ Rd×d, there are d eigen values.

S =∑

i∈[d] λiuiu⊤i . U is orthogonal.

S =∑

Rayleigh Quotient

For matrix S with eigenvalues λ1 ≥ λ2 . . . λd and correspondingeigenvectors u1, . . . ud, then

max‖z‖=1

z⊤Sz = λ1, min‖z‖=1

z⊤Sz = λd,

and the optimizing vectors are u1 and ud.

S =∑

Rayleigh Quotient

For matrix S with eigenvalues λ1 ≥ λ2 . . . λd and correspondingeigenvectors u1, . . . ud, then

max‖z‖=1

z⊤Sz = λ1, min‖z‖=1

z⊤Sz = λd,

and the optimizing vectors are u1 and ud.

Optimal Projection

maxP :P 2=I

Rank(P )=k

Tr(P⊤SP ) = λ1 + λ2 . . . + λk and P spans {u1, . . . , uk}.

PCA on Gaussian Mixtures

k Gaussians: each sample is x = Ah+ z.

h ∈ [e1, . . . , ek], the basis vectors. E[h] = w.

A ∈ Rd×k: columns are component means.

Let µ := Aw be the mean.

z ∼ N (0, σ2I) is white Gaussian noise.

E[(x− µ)(x− µ)⊤] =∑

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

E[(x− µ)(x− µ)⊤] =∑

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

How the above equation is obtained

E[(x− µ)(x− µ)⊤] = E[(Ah− µ)(Ah− µ)⊤] + E[zz⊤]

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

E[(x− µ)(x− µ)⊤] =∑

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

The vectors {ai − µ} are linearly dependent:∑

i wi(ai − µ) = 0. ThePSD matrix

i∈[k]wi(ai − µ)(ai − µ)⊤ has rank ≤ k − 1.

E[(x− µ)(x− µ)⊤] =∑

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

(k − 1)-PCA on covariance matrix ∪{µ} yields span(A).

Lowest eigenvalue of covariance matrix yields σ2.

E[(x− µ)(x− µ)⊤] =∑

i∈[k]

wi(ai − µ)(ai − µ)⊤ + σ2I.

(k − 1)-PCA on covariance matrix ∪{µ} yields span(A).

Lowest eigenvalue of covariance matrix yields σ2.

How to Learn A?

Learning through Spectral Clustering

Learning A through Spectral Clustering

Project samples x on to span(A).

Distance-based clustering (e.g. k-means).

A series of works, e.g. Vempala & Wang.

Learning through Spectral Clustering

Learning A through Spectral Clustering

Project samples x on to span(A).

Distance-based clustering (e.g. k-means).

A series of works, e.g. Vempala & Wang.

Failure to cluster under large variance.

Learning Gaussian Mixtures Without Separation Constraints?

Beyond PCA: Spectral Methods on Tensors

How to learn the component means A (not just its span) withoutseparation constraints?

PCA is a spectral method on (covariance) matrices.

◮ Are higher order moments helpful?

What if number of components is greater than observeddimensionality k > d?

◮ Do higher order moments help to learn overcomplete models?

What if the data is not Gaussian?

◮ Moment-based Estimation of probabilistic latent variable models?

Outline

1 Introduction

6 Conclusion

Tensor Notation for Higher Order Moments

Multi-variate higher order moments form tensors.

Are there spectral operations on tensors akin to PCA?

Matrix

E[x⊗ x] ∈ Rd×d is a second order tensor.

E[x⊗ x]i1,i2 = E[xi1xi2 ].

For matrices: E[x⊗ x] = E[xx⊤].

Tensor

E[x⊗ x⊗ x] ∈ Rd×d×d is a third order tensor.

E[x⊗ x⊗ x]i1,i2,i3 = E[xi1xi2xi3 ].

Third order moment for Gaussian mixtures

Consider mixture of k Gaussians: each sample is x = Ah+ z.

A ∈ Rd×k: columns are component means. µ := Aw be the mean.

E[x⊗ x⊗ x] =∑

wiai ⊗ ai ⊗ ai + σ2∑

(µ⊗ ei ⊗ ei + . . .)

Intuition behind equation

E[x⊗ x⊗ x] = E[(Ah)⊗ (Ah)⊗ (Ah)] + E[(Ah)⊗ z ⊗ z] + . . .

wi · ai ⊗ ai ⊗ ai + σ2∑

µ⊗ ei ⊗ ei + . . .

How to recover parameters A and w from third order moment?

Simplifications

E[x⊗ x⊗ x] =∑

(µ⊗ ei ⊗ ei + . . .)

σ2 is obtained from σmin(Cov(x)).

E[x⊗ x] =∑

iwiai ⊗ ai + σ2I.

Simplifications

E[x⊗ x⊗ x] =∑

(µ⊗ ei ⊗ ei + . . .)

σ2 is obtained from σmin(Cov(x)).

E[x⊗ x] =∑

iwiai ⊗ ai + σ2I.

Can obtain

M3 =∑

wiai ⊗ ai ⊗ ai

M2 =∑

wiai ⊗ ai.

How to obtain parameters A and w from M2 and M3?

Tensor Slices

M3 =∑

wiai ⊗ ai ⊗ ai, M2 =∑

wiai ⊗ ai.

Multilinear transformation of tensor

M3(B,C,D) :=∑

wi(B⊤ai) · (C⊤ai) · (D⊤ai)

Tensor Slices

M3 =∑

wiai ⊗ ai.

Multilinear transformation of tensor

M3(B,C,D) :=∑

wi(B⊤ai) · (C⊤ai) · (D⊤ai)

Slice of a tensor

M3(I, I, r) =∑

iwiai ⊗ ai〈ai, r〉 = ADiag(w)Diag(A⊤r)A⊤

M3(I, I, r) = A · Diag(w)Diag(A⊤r) ·A⊤

M2 = A · Diag(w) ·A⊤

Eigen-decomposition

M3(I, I, r) = A · Diag(w)Diag(A⊤r) ·A⊤, M2 = A · Diag(w) · A⊤.

Eigen-decomposition

Assumption: A ∈ Rd×k has full column rank.

Eigen-decomposition

M2 = UΛU⊤ be eigen-decomposition. U ∈ Rd×k.

U⊤M2U = Λ ∈ Rk×k is invertible.

Eigen-decomposition

U⊤M3(I, I, r)U) (

U⊤M2U)−1

= V · Diag(λ̃) · V −1.

Eigen-decomposition

U⊤M3(I, I, r)U) (

U⊤M2U)−1

= V · Diag(λ̃) · V −1.

Substitution: X = (U⊤A)Diag(A⊤r)(U⊤A)−1.

We have vi ∝ U⊤ai .

Eigen-decomposition

U⊤M3(I, I, r)U) (

U⊤M2U)−1

= V · Diag(λ̃) · V −1.

Substitution: X = (U⊤A)Diag(A⊤r)(U⊤A)−1.

We have vi ∝ U⊤ai .

Technical Detail

r = Uθ and θ drawn uniformly from sphere to ensure eigen gap.

Learning Gaussian Mixtures through

Eigen-decomposition of Tensor Slices

U⊤M3(I, I, r)U) (

U⊤M2U)−1

= V Diag(λ̃)V −1.

Recovery of A (method 1)

Since vi ∝ U⊤ai, recover A upto scale: ai ∝ Uvi .

Learning Gaussian Mixtures through

Eigen-decomposition of Tensor Slices

U⊤M3(I, I, r)U) (

U⊤M2U)−1

= V Diag(λ̃)V −1.

Since vi ∝ U⊤ai, recover A upto scale: ai ∝ Uvi .

Let Θ ∈ Rk×k be a random rotation matrix.

Consider k slices M3(I, I, θi) and find eigen-decomposition above.

Let Λ̃ = [λ̃1| . . . |λ̃k] be the matrix of eigenvalues of all slices.

Recover A as : A = UΘ−1Λ̃.

Putting it together

Implications

Learn Gaussian mixtures through slices of third-order moment.

Guaranteed learning through eigen decomposition.

“A Method of Moments for Mixture Models and Hidden Markov Models.” by A. ,

D. Hsu, and S.M. Kakade. Proc. of COLT, June 2012.

Putting it together

Implications

Shortcomings

The resulting product is not symmetric. Eigen-decompositionV Diag(λ̃)V −1 does not result in orthonormal V . More involved inpractice.

Putting it together

Implications

Shortcomings

Require good eigen-gap in Diag(λ̃) for recovery. For r = Uθ, where θis drawn uniformly from unit sphere, gap is 1/k2.5. Numericalinstability in practice.

Putting it together

Implications

Shortcomings

M3(I, I, r) is only a (random) slice of the tensor. Full information isnot utilized.

Putting it together

Implications

Shortcomings

M3(I, I, r) is only a (random) slice of the tensor. Full information isnot utilized.

More efficient learning methods using higher order moments?

Outline

1 Introduction

6 Conclusion

Tensor Factorization

Recover A and w from M2 and M3.

M2 =∑

wiai ⊗ ai, M3 =∑

wiai ⊗ ai ⊗ ai.

= + ....

Tensor M3 w1 · a1 ⊗ a1 ⊗ a1 w2 · a2 ⊗ a2 ⊗ a2

a⊗ a⊗ a is a rank-1 tensor since, its (i1, i2, i3)th entry is ai1ai2ai3 .

M3 is a sum of rank-1 terms.

M2 =∑

wiai ⊗ ai ⊗ ai.

= + ....

When is it the most compact representation? (Identifiability).

Can we recover the decomposition? (Algorithm?)

M2 =∑

wiai ⊗ ai ⊗ ai.

= + ....

When is it the most compact representation? (Identifiability).

Can we recover the decomposition? (Algorithm?)

The most compact representation is known as CP-decomposition(CANDECOMP/PARAFAC).

Initial Ideas

M3 =∑

wiai ⊗ ai ⊗ ai.

What if A has orthogonal columns?

Initial Ideas

M3 =∑

wiai ⊗ ai ⊗ ai.

M3(I, a1, a1) =∑

iwi〈ai, a1〉2ai = w1a1.

Initial Ideas

M3 =∑

wiai ⊗ ai ⊗ ai.

M3(I, a1, a1) =∑

ai are eigenvectors of tensor M3.

Analogous to matrix eigenvectors:Mv = M(I, v) = λv.

Initial Ideas

M3 =∑

wiai ⊗ ai ⊗ ai.

M3(I, a1, a1) =∑

ai are eigenvectors of tensor M3.

Analogous to matrix eigenvectors:Mv = M(I, v) = λv.

Two Problems

How to find eigenvectors of a tensor?

A is not orthogonal in general.

Whitening

M3 =∑

wiai ⊗ ai.

Find whitening matrix W s.t. W⊤A = V is an orthogonal matrix.

When A ∈ Rd×k has full column rank, it is an invertible

transformation.

Wa1a2a3

Using Whitening to Obtain Orthogonal Tensor

Tensor M3 Tensor T

Multi-linear transform

M3 ∈ Rd×d×d and T ∈ R

k×k×k.

T = M3(W,W,W ) =∑

iwi(W⊤ai)

T =∑

i∈[k]

wi · vi ⊗ vi ⊗ vi is orthogonal.

Dimensionality reduction when k ≪ d.

How to Find Whitening Matrix?

M3 =∑

wiai ⊗ ai.

Wa1a2a3

Use pairwise moments M2 to find W s.t. W⊤M2W = I.

Eigen-decomposition of M2 = UDiag(λ̃)U⊤, thenW = UDiag(λ̃−1/2).

V := W⊤ADiag(w)1/2 is an orthogonal matrix.

T = M3(W,W,W ) =∑

w−1/2i (W⊤ai

√wi)

λivi ⊗ vi ⊗ vi, λi := w−1/2i .

T is an orthogonal tensor.

Orthogonal Tensor Eigen Decomposition

T =∑

i∈[k]

λivi ⊗ vi ⊗ vi, 〈vi, vj〉 = δi,j, ∀ i, j.

T (I, v1, v1) =∑

i λi〈vi, v1〉2vi = λ1v1.

vi are eigenvectors of tensor T .

T =∑

i∈[k]

T (I, v1, v1) =∑

Tensor Power Method

Start from an initial vector v. v 7→ T (I, v, v)

‖T (I, v, v)‖ .

T =∑

i∈[k]

T (I, v1, v1) =∑

Tensor Power Method

Start from an initial vector v. v 7→ T (I, v, v)

‖T (I, v, v)‖ .

Canonical recovery method

Randomly initialize the power method. Run to convergence to obtainv with eigenvalue λ.

Deflate: T − λv ⊗ v ⊗ v and repeat.

Putting it together

Gaussian mixture: x = Ah+ z, where E[h] = w.

z ∼ N (0, σ2I).

M2 =∑

wiai ⊗ ai ⊗ ai.

Obtain whitening matrix W from SVD of M2.

Use W for multilinear transform: T = M3(W,W,W ).

Find eigenvectors of T through power method and deflation.

What about learning other latent variable models?

Outline

1 Introduction

6 Conclusion

Topic Modeling

Probabilistic Topic Models

Useful abstraction for automatic categorization of documents

Observed: words. Hidden: topics.

Bag of words: order of words does not matter

Graphical model representation

l words in a document x1, . . . , xl.

h: proportions of topics in a document.

Word xi generated from topic yi.

Exchangeability: x1 ⊥⊥ x2 ⊥⊥ . . . |h

A(i, j) := P[xm = i|ym = j] :

topic-word matrix.Words

Topics

Mixture

x1 x2 x3 x4 x5

y1 y2 y3 y4 y5

Distribution of the topic proportions vector h

If there are k topics, distribution over the simplex ∆k−1

∆k−1 := {h ∈ Rk, hi ∈ [0, 1],

hi = 1}.

Simplification for today: there is only one topic in each document.◮ h ∈ {e1, . . . , ek}: basis vectors.

Tomorrow: Latent Dirichlet allocation.

Distribution of the topic proportions vector h

If there are k topics, distribution over the simplex ∆k−1

∆k−1 := {h ∈ Rk, hi ∈ [0, 1],

hi = 1}.

Simplification for today: there is only one topic in each document.◮ h ∈ {e1, . . . , ek}: basis vectors.

Tomorrow: Latent Dirichlet allocation.

Formulation as Linear Models

Distribution of the words x1, x2, . . .

Order d words in vocabulary. If x1 is jth word, assign ej ∈ Rd.

Distribution of each xi: supported on vertices of ∆d−1.

Properties

Pr[x1|topic h = i] = E[x1|topic h = i] = ai

Linear Model: E[xi|h] = Ah .

Multiview model: h is fixed and multiple words (xi) are generated.

Geometric Picture for Topic Models

Topic proportions vector (h)

Document

Single topic (h)

x3Word generation (x1, x2, . . .)

Gaussian Mixtures vs. Topic Models(spherical) Mixture of Gaussian:

k means: a1, . . . ak

Component h = i with prob. wi

observe x, with spherical noise,

x = ai + z, z ∼ N (0, σ2i I)

(single) Topic Models

k topics: a1, . . . ak

Topic h = i with prob. wi

observe l (exchangeable) words

x1, x2, . . . xl i.i.d. from ai

Unified Linear Model: E[x|h] = Ah

Gaussian mixture: single view, spherical noise.

Topic model: multi-view, heteroskedastic noise.

Three words per document suffice for learning.

M3 =∑

wiai ⊗ ai.

Outline

1 Introduction

6 Conclusion

Recap: Basic Tensor Decomposition Method

Toy Example in MATLAB

Simulated Samples: Exchangeable Model

Whiten The Samples◮ Second Order Moments◮ Matrix Decomposition

Orthogonal Tensor Eigen Decomposition◮ Third Order Moments◮ Power Iteration

Model Parameters

Hidden State:h ∈ basis {e1, . . . , ek}k = 2

Observed States:xi ∈ basis {e1, . . . , ed}d = 3

Conditional Independency:x1 ⊥⊥ x2 ⊥⊥ x3|hTransition Matrix: A

Exchangeability:E[xi|h] = Ah, ∀i ∈ 1, 2, 3

x1 x2 x3

Model Parameters

Hidden State:h ∈ basis {e1, . . . , ek}k = 2

Observed States:xi ∈ basis {e1, . . . , ed}d = 3

Conditional Independency:x1 ⊥⊥ x2 ⊥⊥ x3|hTransition Matrix: A

Exchangeability:E[xi|h] = Ah, ∀i ∈ 1, 2, 3

Generate Samples Snippet

for t = 1 : n% generate h for this sampleh category=(rand()>0.5) + 1;h(t,h category)=1;transition cum=cumsum(A true(:,h category));% generate x1 for this sample | hx category=find(transition cum> rand(),1);x1(t,x category)=1;% generate x2 for this sample | hx category=find(transition cum >rand(),1);x2(t,x category)=1;% generate x3 for this sample | hx category=find(transition cum > rand(),1);x3(t,x category)=1;end

Whiten The Samples

Second Order Moments

M2 =1n

t xt1 ⊗ xt2

Whitening Matrix

W = UwL−0.5w ,

[Uw, Lw] =k-svd(M2)

Whiten Data

yt1 = W⊤xt1

Orthogonal Basis

V = W⊤A→ V ⊤V = I

Whitening Snippet

fprintf(’The second order moment M2:’);M2 = x1′*x2/n[Uw, Lw, Vw]= svd(M2);fprintf(’M2 singular values:’); LwW = Uw(:,1:k)* sqrt(pinv(Lw(1:k,1:k)));y1 = x1 * W; y2 = x2 * W; y3 = x3 * W;

a1 6⊥ a2 v1 ⊥ v2

Third Order Moments

t∈[n]

yt1 ⊗ yt2 ⊗ yt3 ≈∑

i∈[k]

λivi ⊗ vi ⊗ vi, V ⊤V = I

Gradient Ascent

T (I, v1, v1) =1

t∈[n]

〈v1, yt2〉〈v1, yt3〉yt1 ≈∑

λi〈vi, v1〉2vi = λ1v1.

T ← T −∑

λjv⊗3

j , v ← T (I, v, v)

‖T (I, v, v)‖

Power Iteration Snippet

.V = zeros(k,k); Lambda = zeros(k,1);

.for i = 1:k

. v old = rand(k,1); v old = normc(v old);

. for iter = 1 : Maxiter

. v new = (y1’* ((y2*v old).*(y3*v old)))/n;

. if i >1

. % deflation

. for j = 1: i-1

. v new=v new-(V(:,j)*(v old’*V(:,j))2)* Lambda(j);

. lambda = norm(v new);v new = normc(v new);

. if norm(v old - v new) < TOL

. fprintf(′Converged at iteration %d.′ , iter);

. V(:,i) = v new; Lambda(i,1) = lambda;

. break;

. v old = v new;

T ← T −∑

λjv⊗3

j , v ← T (I, v, v)

‖T (I, v, v)‖

Power Iteration Snippet

.V = zeros(k,k); Lambda = zeros(k,1);

.for i = 1:k

. v old = rand(k,1); v old = normc(v old);

. for iter = 1 : Maxiter

. v new = (y1’* ((y2*v old).*(y3*v old)))/n;

. if i >1

. % deflation

. for j = 1: i-1

. v new=v new-(V(:,j)*(v old’*V(:,j))2)* Lambda(j);

. lambda = norm(v new);v new = normc(v new);

. if norm(v old - v new) < TOL

. fprintf(′Converged at iteration %d.′ , iter);

. V(:,i) = v new; Lambda(i,1) = lambda;

. break;

. v old = v new;

iter 0

iter 1iter 2iter 3iter 4iter 5

Green: Groundtruth

Red: Estimation at each iteration

Conclusion

x1 x2x3

Gaussian mixtures and topic models. Unified linear representation.

Learning through higher order moments.

Tensor decomposition via whitening and power method.

Tomorrow’s lecture

Latent variable models and moments. Analysis of tensor power method.

Wednesday’s lecture: Implementation of tensor method.

Code snippet available at

http://newport.eecs.uci.edu/anandkumar/MLSS.html

Guaranteed Learning of Latent Variable Models through...

Documents

Turbidity/TSS/MLSS Controller turbidity m… · User's Manual U-TU1-MYEN2 Turbidity/TSS/MLSS Controller Supmea Automation Co.,Ltd. Headquarters 5th floor,Building 4,Singapore Hangzhou

H-1series MLSS meter HU-200SS - Horiba

Topic models Source: Topic models, David Blei, MLSS 09

MLSS Score

MLSS Tutorial on: Deep Belief Nets - University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/Hinton_1.pdf · MLSS Tutorial on: Deep Belief Nets (An updated and extended version

,RongGe2,DanielHsu ,ShamM.Kakade ,andMatusTelgarsky ...newport.eecs.uci.edu/anandkumar/pubs/AnandkumarEtal_Tensor12.pdf · tensors, introduced in [LMV00, Remark 3] and analyzed in

innoCon 6800S Suspended Solids /MLSS Analyzer · MLSS Analyzer 2. innoSens 810S Probe（0-30.0 g/L） The innoCon 6800S SS/MLSS analyzer is based on the 90 ° light scattering principle

Brain and Reinforcement Learning - Jason Yosinskiyosinski.com/mlss12/media/slides/MLSS-2012-Doya-Neural...Neural Computation Unit Okinawa ... MLSS 2012 in Kyoto Brain and Reinforcement

Deep Reinforcement Learning - Amazon Web Servicesrl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/2016-MLSS-RL.pdf · Games A di erent kind of optimization problem (min-max) but

Flocculation-Flotation vs. MBR for High MLSS Secondary ...cleanwatertech.com/library/renderingemmbrmlss.pdf · Flocculation-Flotation vs. MBR for High MLSS Secondary Clarification

Murka workshop on MLSS

MLSS・界面計 - satosokuteiki.com · mlss測定水深測定 mlss測定水深測定 mlss測定水深測定 mlss測定水深測定 im-100p 圧力検知式水深、界面判断、相関式モード

To my Parents, my Family, and the memory of my Grandfather Anandkumar …euler.nmt.edu/~olegm/papers/Anand.pdf · 2019-05-20 · Anandkumar Shetiya Submitted in Partial Fulﬁllment

EEP updates Web page items Safety, Sizing and Sitingchesprocott.org/content/uploads/2014/07/EEP-MLSS-Training-2016-4... · MLSS = 50+50+10+10 = 120’ total 10’ 10’ * *Simplified

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SEA - 5/01/15

Monitoring systems for water treatment · do do cond cond cond ph ph cond cond cond ph ph ph ph ph ph orp turb ph turb oil nh4+ nh4+ nh4+ sio2 cond na+ sio2 mlss sl sl sl mlss mlss

Happy New Year! - Australian National Universitysgould/teaching/mlss-2015.pdf · Happy New Year! Structured Prediction for Computer Vision MLSS, Sydney 2015 Stephen Gould 19 February

Mortgage Technology Innovation - Fannie Maefanniemae.com/resources/file/research/mlss/pdf/mlss-072616-topic...Jul 26, 2016 · Mortgage Technology Innovation Q2 2016 Topic Analysis

mn mlss!ons - COnnecting REpositories · of' ~ . . : • __ , . • -,~, :- :' :.'~ • . _ :~-~ :, ' " '":-' " ' " . " ' : . . :. : . | ' • p~. .... . • • • • .... ;

Googlebase Information Pack for MLSs and MLS vendors