Scalable Low-Rank Nonlinear Subspace Tracking€¦ · Acknowledgement: NSF grants 1500713, 1514056,...

Acknowledgement: NSF grants 1500713, 1514056, and NIH grant 1R01GM104975-01

Scalable Low-Rank Nonlinear Subspace Tracking

F. Sheikholeslami, D. Berberidis, and G. B. Giannakis

Dept. of ECE and DTC, University of Minnesota

Learning from “Big Data”

Big size ( and/or )

Challenges

Incomplete

Noise and outliers

Fast streaming

Opportunities in key tasks

Dimensionality reduction

Online and robust

regression, classification

and clustering

Denoising and imputation

Outline

Scalable kernel-based learning

Sparsity-aware low-rank approximation of lifted data

Theoretical guarantees and test results

Online subspace tracking

Affordable memory and computation

Future directions

Linear or nonlinear functions for learning?

Regression or classification: Given , find

Memory requirement , and complexity

Pre-select kernel (inner product) function

RKHS basis expansion

Lift via nonlinear map to linear

Kernel-based nonparametric ridge regression

Prior art

Kernel matrix approximation

Tractable by leveraging low-rank attribute (in dual domain)

o Nystrom approximation [Williamson-Seeger’00, Si et al. ‘14, Wang et al. ‘14]

o Incomplete Cholesky decomposition [Fine-Scheinberg’01]

Random feature approximation

Kernel approximation via feature approximation

o Random sampling and kernel matrix factorization [Rahimi-Recht’08, Zhang et al.’11]

Functional gradient descent

Tractable gradient descent (batch form in primal domain)

o Pegasus [Shaleve-Schwart et al.’11]; Norma [Kivinen et al.’04]; [Wang et al. ’12, Lu et al. ‘16]

Reduced-dimensionality features

Tractability through online extraction of low-dimensional features

o Online (kernel) PCA [Honnein’012]; linear subspace tracking [Mardani et al.’15]

Low-rank approximation of mapped features

Low-rank approximation

S2. Update features while fixing subspace

S1. Update virtual subspace while fixing feature vectors

Alternating minimization

Nonlinear mapping

Sparsity-aware low-rank approximation

Row-sparsity of fewer “basis vectors” for

S1. Update with fixed

Group shrinkage

S2. Update with fixed

Solvers

i. BCD-Exact: Exact minimization- more expensive, larger descent

ii. BCD-PG: Proximal gradient-based update, inexpensive, smaller descent

F. Sheikholeslami and G. B. Giannakis, "Scalable Kernel-based Learning via Low-rank Approximation of

Lifted Data," Proc. of 55th Allerton Conf. on Comm., Control, and Computing, Oct. 4-6, 2017.

Convergence and generalization bounds

Proposition 1 Sequences generated by BCD-Ex and BCD-PG

iterations converge to a stationary point.

Proposition 2. If , and , then for it holds wp >

Generalization: performance guarantees on unseen data

Equivalent optimization

Define

F. Sheikholeslami and G. B. Giannakis, "Scalable Kernel-based Learning via Low-rank Approximation of

Lifted Data," Proc. Allerton, 2017.

Linear regression!

Linearized kernel regression and classification

Proposition 3. If iid with kernel matrix

can be approximated as , and wp >

Kernel matrix approximation

Bounds also on support vector machines for regression and classification

F. Sheikholeslami, D. K. Berberidis, and G. B. Giannakis, "Kernel-based Low-rank

Feature Extraction on a Budget for Big Data Streams,“ arxiv:1601.07947.

Simulation tests

F. Sheikholeslami and G. B. Giannakis, "Scalable Kernel-based Learning via Low-rank Approximation of Lifted Data," Proc. Allerton Conf. 2017.

IJCNN DatasetAdult Dataset Slice Dataset

N=53,500 , d= 384 N=32,650, d= 123 N=140,230 , d= 22

Datasets available at UCI repository: http://archive.ics.uci.edu/ml/datasets.html

Online function approximation on a budget

S1. Find projection coefficients

S2. Update subspace factor (via stochastic gradient descent)

Iteration : and available

If budget exceeded, remove the row of A with minimum -norm

Low-rank (r) subspace tracking [Mardani-Mateos-GG’14] here on lifted data

Censoring can mitigate curse of dimensionality as n

Rank surrogate

-0.2 -0.1 0 0.1 0.20

Budget

maintenance

F. Sheikholeslami, D. K. Berberidis, and G. B. Giannakis, "Kernel-based Low-rank Feature Extraction on

a Budget for Big Data Streams," arxiv:1601.07947.

OK-FEB with linear classification and regression

Slice dataset (regression) Year dataset (regression)

N=53,500 , d= 384, r=10, B=15 N=463,700 , d= 90, r=10, B=15

OK-FEB LSVM outperforms budgeted K-SVM/SVR variants in classification/regression

Tracking dynamic subspaces

Synthetic dataset

Recency-aware removal rule

Parameter trades off tracking for precision

Recency factor

Removal rule

• Associate factor to the i-th SV in the budget

• Decay with inclusion of a new SV,

and drawn from two subspaces

Smaller values of provide faster tracking, while larger values increase fitting precision

Successful use of affordable budget for nonlinear subspace tracking

Online physical activity tracking

Subjects asked to do various physical activities

d =13 quantities measured from chest,

ankle and wrist via wireless MUs

F. Sheikholeslami, D. K. Berberidis, and G. B. Giannakis, "Kernel-based Low-rank Feature Extraction

on a Budget for Big Data Streams,“ arxiv:1601.07947.

Gaussian kernel; and

Average LS-fit

Summary

Sparsity-aware low-rank approximation of lifted data

Theoretical guarantees and test results

Kernel-based learning

Nuclear norm regularization for online approximation

Budget enforcement for affordable memory and computation

Future directions

Nonlinear feature extraction for canonical correlation analysis

Kernel-based feature extraction over 2-D signals on networks

Thank you!

Scalable Low-Rank Nonlinear Subspace Tracking€¦ · Acknowledgement: NSF grants 1500713, 1514056,...

Documents

Algebraic Variety Models for High-Rank Matrix Completioncomplexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques

Subspace Physics

Low-Rank Tensor Constrained Multiview Subspace Clustering · 2015. 10. 24. · Low-Rank Tensor Constrained Multiview Subspace Clustering Changqing Zhang1 Huazhu Fu3 Si Liu2 Guangcan

Closed-Form Solutions in Low-Rank Subspace Recovery Models ... · Closed-Form Solutions in Low-Rank Subspace Recovery Models and Their Implications Zhouchen Lin (林宙辰) 北京大学

A LOW-RANK SOLVER FOR THE STOCHASTIC ......tensors in the tensor train format to ones with smaller ranks. Our goal is to use the low-rank tensors within Krylov subspace methods, in

Robust Subspace Discovery via Relaxed Rank Minimizationyima/psfile/subspace_discovery_v… · Robust Subspace Discovery via Relaxed Rank Minimization XinggangWang1,ZhengdongZhang2,YiMa3,XiangBai1,WenyuLiu1,

Cyclic Subspace

Learning to Rank Adaptively for Scalable Information ...pjbarrio/files/slides/EDBT15-LTR.pdf · Learning to rank adaptively for scalable information extraction ... Problematic over

The low-rank basis problem for a matrix subspace

Identity-based Encryption from Codes with Rank Metric · A rank code Cof length nand dimension kover the ﬁeld F qm is a subspace of dimension k of Fn qm embedded with the rank metric

Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-rank Matrices

Large-scale Kernel-based Feature Extraction via Low-rank ... · Large-scale Kernel-based Feature Extraction via Low-rank Subspace Tracking on a Budget Fatemeh Sheikholeslami, Student

Scalable and Sound Low-Rank Tensor Learningepxing/papers/2016/Yu_etal_aistats16.pdfScalable and Sound Low-Rank Tensor Learning The aim of this paper is therefore to design approxi-mate

Fast and Latent Low-Rank Subspace Clustering for

Row Reduction Applied to Decoding of Rank-Metric and ... · NonamemanuscriptNo. (willbeinsertedbytheeditor) Row Reduction Applied to Decoding of Rank-Metric and Subspace Codes SvenPuchinger

L1-Subspace Tracking for Streaming Databy a low-rank component. The online discriminative multi-task tracker [34] is proposed with structured and weighted low rank regularization

Murase Subspace Methods

High-Rank Matrix Completion and Subspace Clustering with ...If matrix completion fails (i.e., if the resulting matrix does not agree with the observed entries and/or the rank of the

A data scalable augmented Lagrangian KKT preconditioner ... · PDE constrained inverse problems, data scalability, augmented Lagrangian, pre-conditioning, KKT matrix, Krylov subspace

Scalable Maximum Margin Factorization by Active Riemannian Subspace search Yan Yan, Mingkui Tan, Ivor W. Tsang, Yi Yang, Chengqi Zhang and Qinfeng Shi