81
Applications of Random Matrices in Spectral Computations and Machine Learning Dimitris Achlioptas UC Santa Cruz

Applications of Random Matrices in Spectral Computations and

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applications of Random Matrices in Spectral Computations and

Applications of Random Matrices in Spectral Computations and

Machine Learning

Dimitris AchlioptasUC Santa Cruz

Page 2: Applications of Random Matrices in Spectral Computations and

This talk

Viewpoint:use randomness to “transform” the data

Page 3: Applications of Random Matrices in Spectral Computations and

This talk

Viewpoint:use randomness to “transform” the data

Random Projections

Fast Spectral Computations

Sampling in Kernel PCA

Page 4: Applications of Random Matrices in Spectral Computations and

The Setting

Page 5: Applications of Random Matrices in Spectral Computations and

The Setting

n d

n d

n × d

Page 6: Applications of Random Matrices in Spectral Computations and

The Setting

n d

n d

n × d

Page 7: Applications of Random Matrices in Spectral Computations and

The Setting

n d

n d

n × d

P

Output: AP

Page 8: Applications of Random Matrices in Spectral Computations and

The Johnson-Lindenstrauss lemma

Page 9: Applications of Random Matrices in Spectral Computations and

The Johnson-Lindenstrauss lemma

Algorithm:Projecting onto a random hyperplane (subspace) of dimension

succeeds with probability

Page 10: Applications of Random Matrices in Spectral Computations and

Applications

Approximation algorithms [Charikar’02]

Hardness of approximation [Trevisan ’97]

Learning mixtures of Gaussians [Arora, Kannan ‘01]

Approximate nearest-neighbors [Kleinberg ’97]

Data-stream computations [Alon et al. ‘99, Indyk ‘00]

Min-cost clustering [Schulman ‘00]

….Information Retrieval (LSI) [Papadimitriou et al. ‘97]

Page 11: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplane

Page 12: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

[Johnson Lindenstrauss 82]

[Dasgupta Gupta 99]

[Indyk Motwani 99]

Page 13: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

Intuition:Each column of P points to a uniformly random direction in

Page 14: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

Intuition:Each column of P points to a uniformly random direction in Each column is an unbiased,independent estimator of

(via its squared inner product)

Page 15: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

Intuition:Each column of P points to a uniformly random direction in Each column is an unbiased,independent estimator of

(via its squared inner product)

is the average estimate(since we take the sum)

Page 16: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

With orthonormalization:Estimators are “equal”Estimators are “uncorrelated”

Page 17: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

With orthonormalization:Estimators are “equal”Estimators are “uncorrelated”

Without orthonormalization:

Page 18: Applications of Random Matrices in Spectral Computations and

How to pick a random hyperplaneTake wherethe are independent

random variables

With orthonormalization:Estimators are “equal”Estimators are “uncorrelated”

Without orthonormalization:

Same thing!

Page 19: Applications of Random Matrices in Spectral Computations and

Orthonormality: Take #1

Random vectors in high-dimensional Euclidean space

are very nearly orthonormal.

Page 20: Applications of Random Matrices in Spectral Computations and

Orthonormality: Take #1

Random vectors in high-dimensional Euclidean space

are very nearly orthonormal.

Do they have to be uniformly random?

Is the Gaussian distribution magical?

Page 21: Applications of Random Matrices in Spectral Computations and

JL with binary coinsTake wherethe are independentrandom variables with

Page 22: Applications of Random Matrices in Spectral Computations and

JL with binary coins

Benefits:Much faster in practice Only operations (no )

Fewer random bitsDerandomization

Slightly smaller(!) k

Take wherethe are independentrandom variables with

∗±

Page 23: Applications of Random Matrices in Spectral Computations and

JL with binary coinsTake wherethe are independentrandom variables with

Preprocessing with arandomized FFT

[Ailon, Chazelle ‘06]

Page 24: Applications of Random Matrices in Spectral Computations and

Let’s at least look at the data

Page 25: Applications of Random Matrices in Spectral Computations and

The Setting

n d

n d

n × d

P

Output: AP

Page 26: Applications of Random Matrices in Spectral Computations and

Low Rank Approximations

Spectral Norm:

Page 27: Applications of Random Matrices in Spectral Computations and

Low Rank Approximations

Spectral Norm:

Frobenius Norm:

Page 28: Applications of Random Matrices in Spectral Computations and

Low Rank Approximations

Spectral Norm:

Frobenius Norm:

Page 29: Applications of Random Matrices in Spectral Computations and

Low Rank Approximations

Spectral Norm:

Frobenius Norm:

Page 30: Applications of Random Matrices in Spectral Computations and

Low Rank Approximations

Spectral Norm:

Frobenius Norm:

Page 31: Applications of Random Matrices in Spectral Computations and

How to compute Ak

Start with a random

Page 32: Applications of Random Matrices in Spectral Computations and

How to compute Ak

Start with a random

Repeat until fixpoint

Have each row in A vote for x:

Page 33: Applications of Random Matrices in Spectral Computations and

How to compute Ak

Start with a random

Repeat until fixpoint

Have each row in A vote for x:

Synthesize a new candidate by combining the

rows of A according to their enthusiasm for x:

(This is power iteration on . Also known as PCA.)

Page 34: Applications of Random Matrices in Spectral Computations and

How to compute Ak

Start with a random

Repeat until fixpoint

Have each row in A vote for x:

Synthesize a new candidate by combining the

rows of A according to their enthusiasm for x:

(This is power iteration on . Also known as PCA.)

Project A on subspace orthogonal to x and repeat

Page 35: Applications of Random Matrices in Spectral Computations and

PCA for Denoising

Assume that we perturb the entries of a matrix Aby adding independent Gaussian noise

Page 36: Applications of Random Matrices in Spectral Computations and

PCA for Denoising

Claim: If σ is not “too big” then the optimal projections for are “close” to those for A.

Assume that we perturb the entries of a matrix Aby adding independent Gaussian noise

Page 37: Applications of Random Matrices in Spectral Computations and

PCA for Denoising

Claim: If σ is not “too big” then the optimal projections for are “close” to those for A.

Assume that we perturb the entries of a matrix Aby adding independent Gaussian noise

Intuition: • The perturbation vectors are nearly orthogonal

Page 38: Applications of Random Matrices in Spectral Computations and

PCA for Denoising

Assume that we perturb the entries of a matrix Aby adding independent Gaussian noise

Claim: If σ is not “too big” then the optimal projections for are “close” to those for A.

Intuition: • The perturbation vectors are nearly orthogonal• No small subspace accommodates many of them

Page 39: Applications of Random Matrices in Spectral Computations and

RigorouslyLemma: For any matrices A and

Page 40: Applications of Random Matrices in Spectral Computations and

RigorouslyLemma: For any matrices A and

Perspective: For any fixed x we have w.h.p.

Page 41: Applications of Random Matrices in Spectral Computations and

Two new ideas

A rigorous criterion for choosing k:

Stop when A-Ak has

“as much structure as” a random matrix

Page 42: Applications of Random Matrices in Spectral Computations and

Two new ideas

A rigorous criterion for choosing k:

Stop when A-Ak has

“as much structure as” a random matrix

Computation-friendly noise:

Page 43: Applications of Random Matrices in Spectral Computations and

Two new ideas

A rigorous criterion for choosing k:

Stop when A-Ak has

“as much structure as” a random matrix

Computation-friendly noise:

Inject data-dependent noise

Page 44: Applications of Random Matrices in Spectral Computations and

Quantization

Page 45: Applications of Random Matrices in Spectral Computations and

Quantization

Page 46: Applications of Random Matrices in Spectral Computations and

Quantization

Page 47: Applications of Random Matrices in Spectral Computations and

Quantization

Page 48: Applications of Random Matrices in Spectral Computations and

Sparsification

Page 49: Applications of Random Matrices in Spectral Computations and

Sparsification

Page 50: Applications of Random Matrices in Spectral Computations and

Sparsification

Page 51: Applications of Random Matrices in Spectral Computations and

Sparsification

Page 52: Applications of Random Matrices in Spectral Computations and

Accelerating spectral computationsBy injecting sparsification/quantization “noise”we can accelerate spectral computations:

• Fewer/simpler arithmetic operations• Reduced memory footprint

Page 53: Applications of Random Matrices in Spectral Computations and

Accelerating spectral computationsBy injecting sparsification/quantization “noise”we can accelerate spectral computations:

• Fewer/simpler arithmetic operations• Reduced memory footprint

Amount of “noise” that can be tolerated increases with redundancy in data

Page 54: Applications of Random Matrices in Spectral Computations and

Accelerating spectral computationsBy injecting sparsification/quantization “noise”we can accelerate spectral computations:

• Fewer/simpler arithmetic operations• Reduced memory footprint

Amount of “noise” that can be tolerated increases with redundancy in data

L2 error can be quadratically better than “Nystrom”

Page 55: Applications of Random Matrices in Spectral Computations and

Orthonormality: Take #2

Matrices with independent, 0-mean entries are

“white noise” matrices

Page 56: Applications of Random Matrices in Spectral Computations and

A scalar analogue

Page 57: Applications of Random Matrices in Spectral Computations and

A scalar analogue

Crude quantization at extremely high rate

Page 58: Applications of Random Matrices in Spectral Computations and

A scalar analogue

Crude quantization at extremely high rate +low-pass filter

Page 59: Applications of Random Matrices in Spectral Computations and

A scalar analogue

Crude quantization at extremely high rate +low-pass filter

Page 60: Applications of Random Matrices in Spectral Computations and

A scalar analogue

Crude quantization at extremely high rate +low-pass filter = 1-bit CD player (“Bitstream”)

Page 61: Applications of Random Matrices in Spectral Computations and

Accelerating spectral computationsBy injecting sparsification/quantization “noise”we can accelerate spectral computations:

• Fewer/simpler arithmetic operations• Reduced memory footprint

Amount of “noise” that can be tolerated increases with redundancy in data

L2 error can be quadratically better than “Nystrom”

Useful even for exact computations

Page 62: Applications of Random Matrices in Spectral Computations and

Accelerating exact computations

Page 63: Applications of Random Matrices in Spectral Computations and

Kernels

Page 64: Applications of Random Matrices in Spectral Computations and

Kernels & Support Vector Machines

Red and Blue pointcloudsWhich linear separator (hyperplane)?

Maximum margin

Optimal can be expressed by inner products with (a few) data points

Page 65: Applications of Random Matrices in Spectral Computations and

Not always linearly separable

Page 66: Applications of Random Matrices in Spectral Computations and

Population density

Page 67: Applications of Random Matrices in Spectral Computations and

Kernel PCA

Page 68: Applications of Random Matrices in Spectral Computations and

Kernel PCAWe can also compute the SVD via the spectrum of

d

n

n

n

d

n

Page 69: Applications of Random Matrices in Spectral Computations and

Kernel PCAWe can also compute the SVD via the spectrum of

d

n

n

n

d

n

Each entry in AAT is the inner product of two inputs

Page 70: Applications of Random Matrices in Spectral Computations and

Kernel PCAWe can also compute the SVD via the spectrum of

d

n

n

n

d

n

Each entry in AAT is the inner product of two inputs

Replace inner product with a kernel function

Page 71: Applications of Random Matrices in Spectral Computations and

Kernel PCAWe can also compute the SVD via the spectrum of

d

n

n

n

d

n

Each entry in AAT is the inner product of two inputs

Replace inner product with a kernel functionWork implicitly in high-dimensional space

Page 72: Applications of Random Matrices in Spectral Computations and

Kernel PCAWe can also compute the SVD via the spectrum of

d

n

n

n

d

n

Each entry in AAT is the inner product of two inputs

Replace inner product with a kernel functionWork implicitly in high-dimensional spaceGood linear separators in that space

Page 73: Applications of Random Matrices in Spectral Computations and

From linear to nonFrom linear to non--linear PCAlinear PCA||X-Y||p kernel illustrates how the contours of the first 2 components change from straight lines for p=2 to non-linea for p=1.5, 1 and 0.5.

From Schölkopf and Smola, Learning with kernels, MIT 2002

Page 74: Applications of Random Matrices in Spectral Computations and

Kernel PCA with Gaussian KernelKernel PCA with Gaussian KernelKPCA with Gaussian kernels. The contours follow the cluster densities!

First two kernel PCs separate the data nicely.

Linear PCA has only 2 components, but kernel PCA has more, sincethe space dimension is usually large (in this case infinite)

Page 75: Applications of Random Matrices in Spectral Computations and

KPCA in brief

Good News:- Work directly with non-vectorial inputs

Page 76: Applications of Random Matrices in Spectral Computations and

KPCA in brief

Good News:- Work directly with non-vectorial inputs- Very powerful:

e.g. LLE, Isomap, Laplacian Eigenmaps [Ham et al. ‘03]

Page 77: Applications of Random Matrices in Spectral Computations and

KPCA in brief

Good News:- Work directly with non-vectorial inputs- Very powerful:

e.g. LLE, Isomap, Laplacian Eigenmaps [Ham et al. ‘03]

Bad News:

n2 kernel evaluations are too many….

Page 78: Applications of Random Matrices in Spectral Computations and

KPCA in brief

Good News:- Work directly with non-vectorial inputs- Very powerful:

e.g. LLE, Isomap, Laplacian Eigenmaps [Ham et al. ‘03]

Bad News:

n2 kernel evaluations are too many….

Good News: [Shaw-Taylor et al. ‘03]

good generalization rapid spectral decay

Page 79: Applications of Random Matrices in Spectral Computations and

So, it’s enough to sample…

d

n

n

n

d

n

In practice, 1% of the data is more than enough

In theory, we can go down to n × polylog(n)

Page 80: Applications of Random Matrices in Spectral Computations and

Important Features are Preserved

Page 81: Applications of Random Matrices in Spectral Computations and

Open Problems

How general is this “stability under noise”?

For example, does it hold forSupport Vector Machines?

When can we prove such stability in a black-box fashion, i.e. as with matrices?

Can we exploit if for data privacy?