An Online Learning View via a Projections’ Path in theSparse-land
Sergios Theodoridis1
1Dept. of Informatics and Telecommunications, National and Kapodistrian Universityof Athens, Athens, Greece.
Workshop on Sparse Signal ProcessingFriday, Sep. 16, 2016
Joint work withP. Bouboulis, S. Chouvardas, Y. Kopsinis, G. Papageorgiou, K. Slavakis
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 1/58
Sparsity
Sparse Modeling
• Sparse modeling has been a major focus of research effort overthe last decade or so.
• Sparsity promoting regularization of cost functions copes with:
Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.
Promote zeros when the underlying models have manynear-to-zero values.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58
Sparsity
Sparse Modeling
• Sparse modeling has been a major focus of research effort overthe last decade or so.
• Sparsity promoting regularization of cost functions copes with:
Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.
Promote zeros when the underlying models have manynear-to-zero values.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58
Sparsity
Sparse Modeling
• Sparse modeling has been a major focus of research effort overthe last decade or so.
• Sparsity promoting regularization of cost functions copes with:
Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.
Promote zeros when the underlying models have manynear-to-zero values.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58
Sparsity
Sparse Modeling
• Sparse modeling has been a major focus of research effort overthe last decade or so.
• Sparsity promoting regularization of cost functions copes with:
Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.
Promote zeros when the underlying models have manynear-to-zero values.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58
Sparse Modeling
The need for sparse Models: Two examples
• Compression
• Echo Cancelation
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 3/58
Sparse Modeling
The Generic Model
OUTPUT=INPUT× SPARSE MODEL+NOISE
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 4/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
The Regression Model
• A generic model that covers a large class of problems (Filtering,Prediction)
yn = uTna∗ + vn
a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.
• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero
a∗ = [0, 0, ?︸︷︷︸1
, 0, . . . , 0, ?︸︷︷︸2
, 0, 0, . . . , 0, ?︸︷︷︸K
, 0, . . . , 0]T
• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58
Sparse Modeling
Dictionary Learning
• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.
[y1, . . . ,yN ]︸ ︷︷ ︸L×N
= [u1, . . . ,um]︸ ︷︷ ︸L×m
[a1, . . . ,aN ]︸ ︷︷ ︸m×N
, m>L
Y = UA
yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:
yn =
m∑i=1
uiani
where, an, n = 1, 2, . . . , N , sparse vectors.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58
Sparse Modeling
Dictionary Learning
• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.
[y1, . . . ,yN ]︸ ︷︷ ︸L×N
= [u1, . . . ,um]︸ ︷︷ ︸L×m
[a1, . . . ,aN ]︸ ︷︷ ︸m×N
, m>L
Y = UA
yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:
yn =
m∑i=1
uiani
where, an, n = 1, 2, . . . , N , sparse vectors.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58
Sparse Modeling
Dictionary Learning
• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.
[y1, . . . ,yN ]︸ ︷︷ ︸L×N
= [u1, . . . ,um]︸ ︷︷ ︸L×m
[a1, . . . ,aN ]︸ ︷︷ ︸m×N
, m>L
Y = UA
yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:
yn =
m∑i=1
uiani
where, an, n = 1, 2, . . . , N , sparse vectors.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58
Sparse Modeling
Low Rank Matrix Factorization
• This task is at the heart of dimensionality reduction.
Y = UA
=
r∑i=1
uiaTi
[y1, . . . ,yN ]︸ ︷︷ ︸L×N
= [u1, . . . ,ur]︸ ︷︷ ︸L×r
aT1...
arT
︸ ︷︷ ︸
r×N
• r < N .
• PCA performs low rank matrix factorization, by imposing sparsityon the singular values as well as orthogonality on U .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 7/58
Sparse Modeling
Low Rank Matrix Factorization
• This task is at the heart of dimensionality reduction.
Y = UA
=
r∑i=1
uiaTi
[y1, . . . ,yN ]︸ ︷︷ ︸L×N
= [u1, . . . ,ur]︸ ︷︷ ︸L×r
aT1...
arT
︸ ︷︷ ︸
r×N
• r < N .
• PCA performs low rank matrix factorization, by imposing sparsityon the singular values as well as orthogonality on U .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 7/58
Sparse Modeling
Low Rank Matrix Factorization
• Matrix Completion is a special constrained version of low rankmatrix factorization
• Y has missing elements and the lower rank matrix factorization isconstrained to provide the non-missing elements at the respectivepositions
Y =
∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗...
......
......
...∗ ∗ ∗ ∗ ∗ ∗
=
r∑i=1
uiaTi
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 8/58
Sparse Modeling
Low Rank Matrix Factorization
• Matrix Completion is a special constrained version of low rankmatrix factorization
• Y has missing elements and the lower rank matrix factorization isconstrained to provide the non-missing elements at the respectivepositions
Y =
∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗...
......
......
...∗ ∗ ∗ ∗ ∗ ∗
=
r∑i=1
uiaTi
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 8/58
Sparse Modeling
Low Rank Matrix Factorization
• Robust PCA is another special constrained version of low rankmatrix factorization.
Y = L+ V
L is a low rank matrix and V is a sparse matrix. The latter
models OUTLIER NOISE. Being outlier is sparse.
• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58
Sparse Modeling
Low Rank Matrix Factorization
• Robust PCA is another special constrained version of low rankmatrix factorization.
Y = L+ V
L is a low rank matrix and V is a sparse matrix. The latter
models OUTLIER NOISE. Being outlier is sparse.
• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58
Sparse Modeling
Low Rank Matrix Factorization
• Robust PCA is another special constrained version of low rankmatrix factorization.
Y = L+ V
L is a low rank matrix and V is a sparse matrix. The latter
models OUTLIER NOISE. Being outlier is sparse.
• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58
Sparse Modeling
Robust Regression
• Robust Regression is an old problem, with a major impact comingfrom the works of Huber. The revival of interest is due to a newlook via sparsity-aware learning techniques. For example, thenoise may comprise a few large values (outliers) on top of theGaussian component. Since the large values are only a few, theycan be treated via sparse modeling arguments.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 10/58
Sparse Regression Modeling
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch Learning Problem
Linear Regression Model yn = uTna∗ + vn
• U := [u1,u2, . . . ,uN ]T ∈ RN×L
• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .
Batch Formulation: y = Ua∗ + v
=
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58
Sparse Regression Modeling
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch Learning Problem
Linear Regression Model yn = uTna∗ + vn
• U := [u1,u2, . . . ,uN ]T ∈ RN×L
• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .
Batch Formulation: y = Ua∗ + v
=
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58
Sparse Regression Modeling
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch Learning Problem
Linear Regression Model yn = uTna∗ + vn
• U := [u1,u2, . . . ,uN ]T ∈ RN×L
• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .
Batch Formulation: y = Ua∗ + v
=
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
========
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
========
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
========
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
=
=======
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
===
=====
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
======
==
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Estimating the unknown
There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.
Batch vs Online Learning
Batch formulation: y = Ua∗ + v
Online Formulation: yn = uTna∗ + vn,
obtain an estimate, an, after (yn,un) has been received
========
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58
Sparse Vs Online Learning
Sparsity-promoting Batch algorithm(Compressed Sensing)
• Are mobilized after a finitenumber of data, (un, yn)
N−1n=0 , is
collected.
• For any new datum, theestimation of a∗, is repeatedfrom scratch.
• Computational complexity mightbecome prohibitive.
• Excessive storage demands.
• It is a “mature” research fieldwith a diverse number oftechniques and applications.
Sparsity-promoting Online algorithms
• Infinite number of data.
• For any new datum, the estimateof a∗ is updated dynamically.
• Cases of time-varying a∗ are“naturally” handled.
• Low complexity is required forstreaming applications.
• Fast convergence / Tracking.
• Large potential in Big Dataapplications
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58
Sparse Vs Online Learning
Sparsity-promoting Batch algorithm(Compressed Sensing)
• Are mobilized after a finitenumber of data, (un, yn)
N−1n=0 , is
collected.
• For any new datum, theestimation of a∗, is repeatedfrom scratch.
• Computational complexity mightbecome prohibitive.
• Excessive storage demands.
• It is a “mature” research fieldwith a diverse number oftechniques and applications.
Sparsity-promoting Online algorithms
• Infinite number of data.
• For any new datum, the estimateof a∗ is updated dynamically.
• Cases of time-varying a∗ are“naturally” handled.
• Low complexity is required forstreaming applications.
• Fast convergence / Tracking.
• Large potential in Big Dataapplications
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58
Sparse Vs Online Learning
Sparsity-promoting Batch algorithm(Compressed Sensing)
• Are mobilized after a finitenumber of data, (un, yn)
N−1n=0 , is
collected.
• For any new datum, theestimation of a∗, is repeatedfrom scratch.
• Computational complexity mightbecome prohibitive.
• Excessive storage demands.
• It is a “mature” research fieldwith a diverse number oftechniques and applications.
Sparsity-promoting Online algorithms
• Infinite number of data.
• For any new datum, the estimateof a∗ is updated dynamically.
• Cases of time-varying a∗ are“naturally” handled.
• Low complexity is required forstreaming applications.
• Fast convergence / Tracking.
• Large potential in Big Dataapplications
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58
Sparsity-Promoting Methods
`0-norm constrained minimization
• `0 (pseudo) norm minimization: NP-hard nonconvex task.
• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε
• The above is carried out via greedy-type algorithmic arguments.
Constrained Least Squares Estimation: Three equivalent formulations
• a := argmina∈Rl
{‖y − Ua‖22 + λ‖a‖1
}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ
• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of
nonzero elements) that retains its convex nature.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58
Sparsity-Promoting Methods
`0-norm constrained minimization
• `0 (pseudo) norm minimization: NP-hard nonconvex task.
• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε
• The above is carried out via greedy-type algorithmic arguments.
Constrained Least Squares Estimation: Three equivalent formulations
• a := argmina∈Rl
{‖y − Ua‖22 + λ‖a‖1
}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ
• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of
nonzero elements) that retains its convex nature.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58
Sparsity-Promoting Methods
`0-norm constrained minimization
• `0 (pseudo) norm minimization: NP-hard nonconvex task.
• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε
• The above is carried out via greedy-type algorithmic arguments.
Constrained Least Squares Estimation: Three equivalent formulations
• a := argmina∈Rl
{‖y − Ua‖22 + λ‖a‖1
}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ
• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of
nonzero elements) that retains its convex nature.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58
Sparsity-Promoting Methods
`0-norm constrained minimization
• `0 (pseudo) norm minimization: NP-hard nonconvex task.
• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε
• The above is carried out via greedy-type algorithmic arguments.
Constrained Least Squares Estimation: Three equivalent formulations
• a := argmina∈Rl
{‖y − Ua‖22 + λ‖a‖1
}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ
• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of
nonzero elements) that retains its convex nature.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58
Sparsity-Promoting Methods
`0-norm constrained minimization
• `0 (pseudo) norm minimization: NP-hard nonconvex task.
• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε
• The above is carried out via greedy-type algorithmic arguments.
Constrained Least Squares Estimation: Three equivalent formulations
• a := argmina∈Rl
{‖y − Ua‖22 + λ‖a‖1
}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ
• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of
nonzero elements) that retains its convex nature.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58
Sparsity-Promoting Methods
Hard and Soft thresholding
• The `1 norm is associated with a soft thresholding operation on therespective coefficients. This is a continuous function operation, butit adds bias even for the large values. On the other hand, hardthresholding is a discontinuous one.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 15/58
Batch Penalized Least-Squares Estimator
Penalized Least-Squares - General Case
mina∈RL
{1
2‖y −Ua‖22 + λ
L∑i=1
p(|ai|)
}• p(·), sparsity-promoting penalty function,
• λ, regularization parameter.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 16/58
Batch Penalized Least-Squares Estimator
Penalized Least-Squares - General Case
mina∈RL
{1
2‖y −Ua‖22 + λ
L∑i=1
p(|ai|)
}• p(·), sparsity-promoting penalty function,
• λ, regularization parameter.
Examples: Penalty functions
• p(|ai|) := |ai|γ , ∀ai ∈ R
• p(|ai|) = λ(1− e−β|ai|
)• p(|ai|) := λ
log(γ+1) log(γ|ai|+ 1), ∀ai ∈ R
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 16/58
Online Sparsity-Promoting Methods
Penalized Recursive LS
mina∈RL
{1
2
N∑n=1
βN−ne2n + λL∑i=1
p(|ai|)
},
rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1
an+1 = f(rn+1,Rn+1)
• It Works!
• Complexity O(L2)
• Regularization parameter needs fine tuning
• [Angelosante, Bazerque and Giannakis, 2010]
• [Eksioglu and Tanc, 2011]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58
Online Sparsity-Promoting Methods
Penalized Recursive LS
mina∈RL
{1
2
N∑n=1
βN−ne2n + λL∑i=1
p(|ai|)
},
rN :=
N∑n=1
βN−nynun, RN :=
N∑n=1
βN−nunuTn
rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1
an+1 = f(rn+1,Rn+1)
• It Works!
• Complexity O(L2)
• Regularization parameter needs fine tuning
• [Angelosante, Bazerque and Giannakis, 2010]
• [Eksioglu and Tanc, 2011]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58
Online Sparsity-Promoting Methods
Penalized Recursive LS
mina∈RL
{1
2
N∑n=1
βN−ne2n + λL∑i=1
p(|ai|)
},
rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1
an+1 = f(rn+1,Rn+1)
• It Works!
• Complexity O(L2)
• Regularization parameter needs fine tuning
• [Angelosante, Bazerque and Giannakis, 2010]
• [Eksioglu and Tanc, 2011]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58
Online Sparsity-Promoting Methods
Penalized Recursive LS
mina∈RL
{1
2
N∑n=1
βN−ne2n + λL∑i=1
p(|ai|)
},
rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1
an+1 = f(rn+1,Rn+1)
• It Works!
• Complexity O(L2)
• Regularization parameter needs fine tuning
• [Angelosante, Bazerque and Giannakis, 2010]
• [Eksioglu and Tanc, 2011]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58
Online Sparsity-Promoting Methods
Penalized Recursive LS
mina∈RL
{1
2
N∑n=1
βN−ne2n + λL∑i=1
p(|ai|)
},
rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1
an+1 = f(rn+1,Rn+1)
• It Works!
• Complexity O(L2)
• Regularization parameter needs fine tuning
• [Angelosante, Bazerque and Giannakis, 2010]
• [Eksioglu and Tanc, 2011]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58
Online Sparsity-Promoting Methods
Penalized stochastic gradient descent: LMS type
mina∈RL
{1
2e2n + λ
L∑i=1
p(|ai|)
}
an+1 = an + µen(a)un − µλf(an)
f(an) =
[∂p(|an,1|)∂an,1
,∂p(|an,2|)∂an,2
, . . . ,∂p(|an,L|)∂an,L
]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning
• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58
Online Sparsity-Promoting Methods
Penalized stochastic gradient descent: LMS type
mina∈RL
{1
2e2n + λ
L∑i=1
p(|ai|)
}
an+1 = an + µen(a)un − µλf(an)
f(an) =
[∂p(|an,1|)∂an,1
,∂p(|an,2|)∂an,2
, . . . ,∂p(|an,L|)∂an,L
]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning
• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58
Online Sparsity-Promoting Methods
Penalized stochastic gradient descent: LMS type
mina∈RL
{1
2e2n + λ
L∑i=1
p(|ai|)
}
an+1 = an + µen(a)un − µλf(an)
f(an) =
[∂p(|an,1|)∂an,1
,∂p(|an,2|)∂an,2
, . . . ,∂p(|an,L|)∂an,L
]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning
• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Metric Projection Mapping
Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Metric Projection Mapping
Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Metric Projection Mapping
Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The main concept
A descendent of POCS
Projection onto a Closed Convex Set
Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that
‖a− a∗‖ = ming∈C‖a− g‖.
Relaxed Projection Mapping
The relaxed Projection is the mappingTC(a) := a+ µ(PC(a)− a),µ ∈ (0, 2),∀a ∈ RL.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
PC1(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
PC1(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
PC2PC1
(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
PC1PC2
PC1(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
PC2PC1
PC2PC1
(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let
their associated projection mappings be PC1, . . . , PCq
. For any a ∈ RL, definethe sequence of projections:
· · ·PC2PC1
PC2PC1
(a).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
The Set-Theoretic Estimation Approach
Convex Combination of Projection Mappings [Pierra ’84]
Given a finite number of closed convex sets C1, . . . , Cq, with⋂q
i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that
∑qi=1 wi = 1. Then for any a0, the sequence
an+1 = an + µn(
q∑i=1
wiPCi(an)︸ ︷︷ ︸Convex combination of projections
−an), ∀n,
converges weakly to a point a∗ in⋂q
i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and
Mn :=∑q
i=1 wi‖PCi(an)−an‖2
‖∑q
i=1 wiPCi(an)−an‖2
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs
1
1
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs
2
2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs
3
3
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs4
4
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
Find a point in the intersection of all the hyperslabs5
5
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Set-Theoretic Estimation: The Online Case Approach
Constructing the Convex Sets
For each received set of measurements (training pairs) (un, yn), construct ahyperslab:
Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε
}
Solution
[Yamada 2001], [Yamada, Slavakis, Yamada 2002], [Yamada, Ogura 2004],[Slavakis, Yamada Ogura 2006].
[Chouvardas, Slavakis, Theodoridis, Yamada, 2013]: Under the assumption ofBounded noise it converges with probability 1 arbitrarily close to the truemodel.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58
Adaptive Projection Subgradient Method (APSM)
The Algorithm
an+1 := an + µn
n∑i=n−q+1
ω(n)i
(PSn[ε](an)− an
)Projection onto Hyperslab
PSn[ε](a) = a+
yn−ε−uT
na‖un‖2 un, if yn − ε > uTna
0, if |uTna− yn| ≤ εyn+ε−uT
na‖un‖2 un, if yn + ε < uTna
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 23/58
Adaptive Projection Subgradient Method (APSM)
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58
Adaptive Projection Subgradient Method (APSM)
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58
Adaptive Projection Subgradient Method (APSM)
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58
Adaptive Projection Subgradient Method (APSM)
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58
APSM under the `1 ball constraint
The `1-ball case
Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .
‖a‖1 ≤ δ.
The recursion:
an+1 := PB`1[δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
,
converges to
a∗ ∈ B`1 [δ] ∩
⋂n≥n0
Sn[ε]
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58
APSM under the `1 ball constraint
The `1-ball case
Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .
‖a‖1 ≤ δ.
The recursion:
an+1 := PB`1[δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
,
converges to
a∗ ∈ B`1 [δ] ∩
⋂n≥n0
Sn[ε]
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58
APSM under the `1 ball constraint
The `1-ball case
Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .
‖a‖1 ≤ δ.
The recursion:
an+1 := PB`1[δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
,
converges to
a∗ ∈ B`1 [δ] ∩
⋂n≥n0
Sn[ε]
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58
APSM under the `1 ball constraint
The `1-ball case
Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .
‖a‖1 ≤ δ.
The recursion:
an+1 := PB`1[δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
,
converges to
a∗ ∈ B`1 [δ] ∩
⋂n≥n0
Sn[ε]
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the `1 ball constraint
Geometric illustration example
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58
APSM under the weighted `1 ball constraint
The weighted `1-ball case:
• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.
• Definition:
‖a‖1,w :=
L∑i=1
wi|ai|.
• Time-adaptive weighted norm:
wn,i :=1
|an,i|+ ε′n.
• A time varying constraint case.
• The recursion:
an+1 := PB`1[wn,δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58
APSM under the weighted `1 ball constraint
The weighted `1-ball case:
• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.
• Definition:
‖a‖1,w :=
L∑i=1
wi|ai|.
• Time-adaptive weighted norm:
wn,i :=1
|an,i|+ ε′n.
• A time varying constraint case.
• The recursion:
an+1 := PB`1[wn,δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58
APSM under the weighted `1 ball constraint
The weighted `1-ball case:
• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.
• Definition:
‖a‖1,w :=
L∑i=1
wi|ai|.
• Time-adaptive weighted norm:
wn,i :=1
|an,i|+ ε′n.
• A time varying constraint case.
• The recursion:
an+1 := PB`1[wn,δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58
APSM under the weighted `1 ball constraint
The weighted `1-ball case:
• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.
• Definition:
‖a‖1,w :=
L∑i=1
wi|ai|.
• Time-adaptive weighted norm:
wn,i :=1
|an,i|+ ε′n.
• A time varying constraint case.
• The recursion:
an+1 := PB`1[wn,δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58
APSM under the weighted `1 ball constraint
The weighted `1-ball case:
• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.
• Definition:
‖a‖1,w :=
L∑i=1
wi|ai|.
• Time-adaptive weighted norm:
wn,i :=1
|an,i|+ ε′n.
• A time varying constraint case.
• The recursion:
an+1 := PB`1[wn,δ]
an + µn
n∑j=n−q+1
ω(n)j PSj [ε](an)− an
.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58
APSM under the weighted `1 ball constraint
Geometric illustration example
_
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58
APSM under the weighted `1 ball constraint
Geometric illustration example
_
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58
APSM under the weighted `1 ball constraint
Geometric illustration example
_
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58
APSM under the weighted `1 ball constraint
Geometric illustration example
_
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
APSM under the weighted `1 ball constraint
Convergence of the Scheme
• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.
0
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58
Simulation Examples
Example: Time-invariant signal sparse in wavelet domain
L := 1024, ‖a∗‖0 := 100 wavelet coefficients. The radius of the `1-ball is set to δ := 101.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 30/58
Simulation Examples
Example: Time varying signal compressible in wavelet domain
L := 4096.The sum of two chirp signals.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 31/58
Simulation Examples
Example: Time varying signal compressible in wavelet domain
L := 4096. The radius of the `1-ball is set to δ := 40.
Movies of the OCCD, and the APWL1sub.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 32/58
Generalized Thresholding Rules
Thresholding rules associated with non-convex penalty functions
• Penalized LS thresholding operators:
mina
1
2(a− a)2 + λp(|a|)
• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)
• Under some general conditions it has a unique solution [Antoniadis2007].
• PLSTO basically defines a mapping
a 7→ mina
1
2(a− a)2 + λp(|a|)
which corresponds to a Shrinkage operator.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58
Generalized Thresholding Rules
Thresholding rules associated with non-convex penalty functions
• Penalized LS thresholding operators:
mina
1
2(a− a)2 + λp(|a|)
• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)
• Under some general conditions it has a unique solution [Antoniadis2007].
• PLSTO basically defines a mapping
a 7→ mina
1
2(a− a)2 + λp(|a|)
which corresponds to a Shrinkage operator.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58
Generalized Thresholding Rules
Thresholding rules associated with non-convex penalty functions
• Penalized LS thresholding operators:
mina
1
2(a− a)2 + λp(|a|)
• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)
• Under some general conditions it has a unique solution [Antoniadis2007].
• PLSTO basically defines a mapping
a 7→ mina
1
2(a− a)2 + λp(|a|)
which corresponds to a Shrinkage operator.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58
Generalized Thresholding Rules
Thresholding rules associated with non-convex penalty functions
• Penalized LS thresholding operators:
mina
1
2(a− a)2 + λp(|a|)
• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)
• Under some general conditions it has a unique solution [Antoniadis2007].
• PLSTO basically defines a mapping
a 7→ mina
1
2(a− a)2 + λp(|a|)
which corresponds to a Shrinkage operator.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58
Generalized Thresholding Rules
Examples: Penalty functions
• p(|a|) := |a|γ , ∀a ∈ R
• p(|a|) = λ(1− e−β|a|
)• p(|a|) := λ
log(γ+1) log(γ|a|+ 1), ∀a ∈ R
Examples: Penalized Least-Squares Thresholding Operators
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 34/58
Generalized Thresholding Rules
Examples: Penalty functions
• p(|a|) := |a|γ , ∀a ∈ R
• p(|a|) = λ(1− e−β|a|
)• p(|a|) := λ
log(γ+1) log(γ|a|+ 1), ∀a ∈ R
Examples: Penalized Least-Squares Thresholding Operators
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 34/58
Generalized Thresholding Rules
Generalized Thresholding (GT) operator: Definition:
For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:
∀l ∈ 1, L, zl :=
{al, If, al is one of the largest K components,
shr(al), otherwise
Shrinkage Function (Shr)
• τshr(τ) ≥ 0, ∀τ ∈ R.
• shr acts as a strict shrinkage operator over all intervals which do notinclude 0.
• Any arbitrary function inline with the properties above can be used.
• All the penalized Least-Squares thresholding operators are included.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58
Generalized Thresholding Rules
Generalized Thresholding (GT) operator: Definition:
For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:
∀l ∈ 1, L, zl :=
{al, If, al is one of the largest K components,
shr(al), otherwise
Shrinkage Function (Shr)
• τshr(τ) ≥ 0, ∀τ ∈ R.
• shr acts as a strict shrinkage operator over all intervals which do notinclude 0.
• Any arbitrary function inline with the properties above can be used.
• All the penalized Least-Squares thresholding operators are included.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58
Generalized Thresholding Rules
Generalized Thresholding (GT) operator: Definition:
For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:
∀l ∈ 1, L, zl :=
{al, If, al is one of the largest K components,
shr(al), otherwise
In words
• Choose the largest K components of the estimate.
• The rest are shrunk according to the shrinkage rule.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58
Generalized Thresholding Rules
Generalized Thresholding (GT) operator: Definition:
For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:
∀l ∈ 1, L, zl :=
{al, If, al is one of the largest K components,
shr(al), otherwise
Examples: Generalized Thresholding (GT) operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*P ( )
P ( )
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*P ( )
P ( )
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
**
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
***
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
=0
**
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
=0shr( )
**
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)
The Algorithm
an+1 := Tn
an + µn
n∑i=n−q+1
ω(n)i (P (an)− an)
• Each piece of a-priori information, is also represented by a set
*
=0
=0
Thresholding Operator
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖
• The fixed point set of GT is a union ofsubspaces (non-convex).
Examples: Union of Subspaces for s = 2
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.
• The fixed point set of GT is a union ofsubspaces (non-convex).
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.
• The fixed point set of GT is a union ofsubspaces (non-convex).
• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:
The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set
of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.
• The fixed point set of GT is a union ofsubspaces (non-convex).
• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:
The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set
of cluster points is nonempty,
each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.
• The fixed point set of GT is a union ofsubspaces (non-convex).
• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:
The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set
of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,
the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58
Adaptive Projection-Based Algorithm With GT (APGT)
Convergence of APGT
• Partially Quasi-nonexpansive Mapping.
• The fixed point set of GT is a union ofsubspaces (non-convex).
• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:
The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set
of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58
Simulation Examples
Example: Time-varying case exhibiting an abrupt change
L := 1024, s = 100 (up to n = 1500, and s = 110 afterwards)
APGT:O(qL+ qK)OSCD: O(L2)IPAPA: O(q3)
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 39/58
Simulation Examples
Example: Sparse system identification with colored input
L := 600, s = 60, AR input (cond ' 100) .
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 40/58
Bibliography• I. Yamada and N. Ogura. Adaptive Projected Subgradient Method for asymptotic minimization of sequence of
nonnegative convex functions. Numerical Functional Analysis and Optimization, 25(7&8), 2004.• K. Slavakis, I. Yamada, and N. Ogura. The adaptive projected subgradient method over the fixed point set of
strongly attracting nonexpansive mappings. Numerical Functional Analysis and Optimization, 27 (7&8), Nov 2006.• S. Theodoridis, K. Slavakis, and I. Yamada, “Adaptive learning in a world of projections: a unifying framework for
linear and nonlinear classification and regression tasks,” ,” IEEE Trans. Signal Proc., vol. 28, Jan. 2011.• K. Slavakis and I. Yamada. The adaptive projected subgradient method constrained by families of quasi-
nonexpansive mappings and its application to online learning. SIAM Journal on Optimization, vol. 23, no. 1, 2013.• Y. Kopsinis, K. Slavakis, and S. Theodoridis, “Online sparse system identification and signal reconstruction using
projections onto weighted `1 balls,” IEEE Trans. Signal Proc., vol. 59, Mar. 2011.• S. Chouvardas, K. Slavakis, and S. Theodoridis. Adaptive robust distributed learning in diffusion sensor networks.
IEEE Transactions on Signal Processing, vol. 59, Oct. 2011.• S. Chouvardas, K. Slavakis, Y. Kopsinis, and S. Theodoridis. A sparsity promoting adaptive algorithm for
distributed learning. IEEE Transactions on Signal Processing, vol. 60, Oct. 2012.• K. Slavakis, Y. Kopsinis, S. Theodoridis, and S. McLaughlin. Generalized thresholding and online sparsity-aware
learning in a union of subspaces. Accepted for publication in the IEEE Transactions on Signal Processing, 2013• S. Chouvardas, K. Slavakis, S. Theodoridis, and I. Yamada. Stochastic analysis of hyperslab-based adaptive
projected subgradient method under bounded noise. IEEE Signal Processing Letters, vol. 20, 2013.• Y. Kopsinis, K. Slavakis, S. Theodoridis, “Thresholding-Based Online Algorithms of Complexity Comparable to
Sparse LMS methods,” Under preparation (A part submitted to ISCAS 2013),
• D. Angelosante, J. A. Bazerque, and G. B. Giannakis, ‘Online Adaptive Estimation of Sparse Signals: Where RLSMeets the l1-Norm’, IEEE Transactions on Signal Processing, vol. 58, Jul. 2010.
• E. M. Eksioglu and A. K. Tanc, ‘RLS Algorithm With Convex Regularization’, IEEE Signal Processing Letters, vol.18, Aug. 2011.
• Y. Chen, Y. Gu, and A. O. Hero, ‘Sparse LMS for system identification’, in IEEE International Conference onAcoustics, Speech and Signal Processing, ICASSP 2009.
• X. Wang and Y. Gu, ‘Proof of Convergence and Performance Analysis for Sparse Recovery via Zero-PointAttracting Projection’, IEEE Transactions on Signal Processing, vol. 60, Aug. 2012.
• G. Mileounis, B.Babadi, N. Kalouptsidis, and V. Tarokh, ”An Adaptive Greedy Algorithm with Application toNonlinear Communications”, IEEE Trans. on Signal Processing, vol. 58, Jun. 2010.
• P. Di Lorenzo and A. H. Sayed, “Sparse distributed learning based on diffusion adaptation,” IEEE Trans. SignalProcessing, vol. 61, March 2013.
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 41/58
“Machine Learning: A Bayesian and Optimization Perspective”
by
Sergios Theodoridis
Academic Press, 2015
1050 pages
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 42/58
Thank youfor your patience...
I hope that there are NO QUESTIONS !!!!!!!!!
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 43/58
Thank youfor your patience...
I hope that there are NO QUESTIONS !!!!!!!!!
Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 43/58