360
Learning in the Context of Set Theoretic Estimation: an Efficient and Unifying Framework for Adaptive Machine Learning and Signal Processing Sergios Theodoridis 1 a joint work with K. Slavakis (Univ. of Peloponnese, Greece), and I. Yamada (Tokyo Institute of Technology, Japan) 1 University of Athens, Athens, Greece. Vienna, April 11th, 2012 Sergios Theodoridis A Framework for Online Learning Vienna 1 / 97

 · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Learning in the Context ofSet Theoretic Estimation:

an Efficient and Unifying Framework forAdaptive Machine Learning and Signal Processing

Sergios Theodoridis1

a joint work withK. Slavakis (Univ. of Peloponnese, Greece), and

I. Yamada (Tokyo Institute of Technology, Japan)

1University of Athens, Athens, Greece.

Vienna,April 11th, 2012

Sergios Theodoridis A Framework for Online Learning Vienna 1 / 97

Page 2:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

“OΥ∆EIΣ AΓEΩMETPHTOΣ EIΣITΩ”

Sergios Theodoridis A Framework for Online Learning Vienna 2 / 97

Page 3:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

“OΥ∆EIΣ AΓEΩMETPHTOΣ EIΣITΩ”

(“Those who do not know geometry are not welcome here”)

Plato’s Academy of Philosophy

Sergios Theodoridis A Framework for Online Learning Vienna 2 / 97

Page 4:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Part A

Sergios Theodoridis A Framework for Online Learning Vienna 3 / 97

Page 5:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part A

The set theoretic estimation approach and multiple intersectingclosed convex sets.

The fundamental tool of metric projections in Hilbert spaces.

Online classification and regression.

The concept of Reproducing Kernel Hilbert Spaces (RKHS) andnonlinear processing.

Distributive learning in sensor networks.

Sergios Theodoridis A Framework for Online Learning Vienna 4 / 97

Page 6:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Machine Learning

Problem DefinitionGiven

A set of measurements (xn, yn)Nn=1, which are jointly distributed,

and

Sergios Theodoridis A Framework for Online Learning Vienna 5 / 97

Page 7:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Machine Learning

Problem DefinitionGiven

A set of measurements (xn, yn)Nn=1, which are jointly distributed,

and

A parametric set of functions F =fα(x) : α ∈ A ⊂ R

k

.

Sergios Theodoridis A Framework for Online Learning Vienna 5 / 97

Page 8:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Machine Learning

Problem DefinitionGiven

A set of measurements (xn, yn)Nn=1, which are jointly distributed,

and

A parametric set of functions F =fα(x) : α ∈ A ⊂ R

k

.

Compute an f(·), within F , that best approximates y given the value ofx:

y ≈ f(x).

Sergios Theodoridis A Framework for Online Learning Vienna 5 / 97

Page 9:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Machine Learning

Problem DefinitionGiven

A set of measurements (xn, yn)Nn=1, which are jointly distributed,

and

A parametric set of functions F =fα(x) : α ∈ A ⊂ R

k

.

Compute an f(·), within F , that best approximates y given the value ofx:

y ≈ f(x).

Special CasesSmoothing, prediction, curve-fitting, regression, classification, filtering,system identification, and beamforming.

Sergios Theodoridis A Framework for Online Learning Vienna 5 / 97

Page 10:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The More Classical ApproachSelect a loss function L(·, ·) and estimate f(·) so that

f(·) ∈ argminfα(·): α∈A

N∑

n=1

L(yn, fα(xn)

).

Sergios Theodoridis A Framework for Online Learning Vienna 6 / 97

Page 11:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The More Classical ApproachSelect a loss function L(·, ·) and estimate f(·) so that

f(·) ∈ argminfα(·): α∈A

N∑

n=1

L(yn, fα(xn)

).

DrawbacksMost often, in practice, the choice of the cost is dictated not byphysical reasoning but by computational tractability.

Sergios Theodoridis A Framework for Online Learning Vienna 6 / 97

Page 12:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The More Classical ApproachSelect a loss function L(·, ·) and estimate f(·) so that

f(·) ∈ argminfα(·): α∈A

N∑

n=1

L(yn, fα(xn)

).

DrawbacksMost often, in practice, the choice of the cost is dictated not byphysical reasoning but by computational tractability.

The existence of a-priori information in the form of constraintsmakes the task even more difficult.

Sergios Theodoridis A Framework for Online Learning Vienna 6 / 97

Page 13:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The More Classical ApproachSelect a loss function L(·, ·) and estimate f(·) so that

f(·) ∈ argminfα(·): α∈A

N∑

n=1

L(yn, fα(xn)

).

DrawbacksMost often, in practice, the choice of the cost is dictated not byphysical reasoning but by computational tractability.

The existence of a-priori information in the form of constraintsmakes the task even more difficult.

The optimization task is solved iteratively, and iterations freezeafter a finite number of steps. Thus, the obtained solution lies in aneighborhood of the optimal one.

Sergios Theodoridis A Framework for Online Learning Vienna 6 / 97

Page 14:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The More Classical ApproachSelect a loss function L(·, ·) and estimate f(·) so that

f(·) ∈ argminfα(·): α∈A

N∑

n=1

L(yn, fα(xn)

).

DrawbacksMost often, in practice, the choice of the cost is dictated not byphysical reasoning but by computational tractability.

The existence of a-priori information in the form of constraintsmakes the task even more difficult.

The optimization task is solved iteratively, and iterations freezeafter a finite number of steps. Thus, the obtained solution lies in aneighborhood of the optimal one.

The stochastic nature of the data and the existence of noise addanother uncertainty to the optimality of the obtained solution.

Sergios Theodoridis A Framework for Online Learning Vienna 6 / 97

Page 15:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

In this talk, we are concerned in finding a set of solutions, whichare in agreement with all the available information.

This will be achieved in the general context of Set theoretic estimation. Convexity. Mappings or operators, e.g., projections, and their associated fixed

point sets.

Sergios Theodoridis A Framework for Online Learning Vienna 7 / 97

Page 16:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Closed Subspace

TheoremGiven a Euclidean R

m or a Hilbert space H, the projectionof a point f onto a closed subspace M is the unique pointPM (f) ∈M that lies closest to f (Pythagoras Theorem).

PM(f)M

H

f

Sergios Theodoridis A Framework for Online Learning Vienna 8 / 97

Page 17:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Closed Convex Set

TheoremLet C be a closed convex set in a Hilbert space H. Then, for eachf ∈ H, there exists a unique f∗ ∈ C such that

‖f − f∗‖ = ming∈C‖f − g‖ =: d(f,C).

Sergios Theodoridis A Framework for Online Learning Vienna 9 / 97

Page 18:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Closed Convex Set

TheoremLet C be a closed convex set in a Hilbert space H. Then, for eachf ∈ H, there exists a unique f∗ ∈ C such that

‖f − f∗‖ = ming∈C‖f − g‖ =: d(f,C).

Definition (Metric Projection Mapping)

The projection is the mapping PC : H → C : f 7→ PC(f) := f∗.

f

Rm (H)

C

d(f, C)PC(f )

Sergios Theodoridis A Framework for Online Learning Vienna 9 / 97

Page 19:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Closed Convex Set

TheoremLet C be a closed convex set in a Hilbert space H. Then, for eachf ∈ H, there exists a unique f∗ ∈ C such that

‖f − f∗‖ = ming∈C‖f − g‖ =: d(f,C).

Definition (Metric Projection Mapping)

The projection is the mapping PC : H → C : f 7→ PC(f) := f∗.

f

Rm (H)

C

d(f, C)PC(f )

f ′ = PC(f′)

Sergios Theodoridis A Framework for Online Learning Vienna 9 / 97

Page 20:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Hyperplane H := g ∈ H : 〈g, a〉 = c)

0

f

H

Sergios Theodoridis A Framework for Online Learning Vienna 10 / 97

Page 21:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Hyperplane H := g ∈ H : 〈g, a〉 = c)

0

f

H

g ∈ H : 〈g, a〉 = c

H

a

Sergios Theodoridis A Framework for Online Learning Vienna 10 / 97

Page 22:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Hyperplane H := g ∈ H : 〈g, a〉 = c)

0

f

H

g ∈ H : 〈g, a〉 = c

H

a

−〈f,a〉−c‖a‖2 a

PH(f )

Sergios Theodoridis A Framework for Online Learning Vienna 10 / 97

Page 23:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Hyperplane H := g ∈ H : 〈g, a〉 = c)

0

f

H

g ∈ H : 〈g, a〉 = c

H

a

−〈f,a〉−c‖a‖2 a

PH(f )

PH(f) = f −〈f, a〉 − c

‖a‖2a, ∀f ∈ H.

Sergios Theodoridis A Framework for Online Learning Vienna 10 / 97

Page 24:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Halfspace H+ := g ∈ H : 〈g, a〉 ≥ c)f

H+

PH+(f)

a

Sergios Theodoridis A Framework for Online Learning Vienna 11 / 97

Page 25:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Halfspace H+ := g ∈ H : 〈g, a〉 ≥ c)f

H+

PH+(f)

a

PH+(f) = f −min

0, 〈f, a〉 − c

‖a‖2a, ∀f ∈ H.

Sergios Theodoridis A Framework for Online Learning Vienna 11 / 97

Page 26:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Closed Ball B[0, δ] := g ∈ H : ‖g‖ ≤ δ)

PB[0,δ](f)

f

B[0, δ]0

δ

Sergios Theodoridis A Framework for Online Learning Vienna 12 / 97

Page 27:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Closed Ball B[0, δ] := g ∈ H : ‖g‖ ≤ δ)

PB[0,δ](f)

f

B[0, δ]0

δ

PB[0,δ](f) :=δ

maxδ, ‖f‖

f, ∀f ∈ H.

Sergios Theodoridis A Framework for Online Learning Vienna 12 / 97

Page 28:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Icecream Cone K :=(f, τ) ∈ H × R : ‖f‖ ≥ τ

)

PK((f, τ ))

(f, τ )

H

R

K

Sergios Theodoridis A Framework for Online Learning Vienna 13 / 97

Page 29:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection Mappings

Example (Icecream Cone K :=(f, τ) ∈ H × R : ‖f‖ ≥ τ

)

PK((f, τ ))

(f, τ )

H

R

K

PK

((f, τ)

)=

(f, τ), if ‖f‖ ≤ τ,

(0, 0), if ‖f‖ ≤ −τ,‖f‖+τ

2

(f

‖f‖ , 1), otherwise,

∀(f, τ) ∈ H× R.

Sergios Theodoridis A Framework for Online Learning Vienna 13 / 97

Page 30:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 31:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

PM1(f).

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 32:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

PM2PM1(f).

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 33:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

PM1PM2PM1(f).

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 34:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

PM2PM1PM2PM1(f).

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 35:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

· · ·PM2PM1PM2PM1(f).

M2

M1

f

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 36:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternating Projections

Composition of Projection Mappings: Let M1 and M2 be closedsubspaces in the Hilbert space H. For any f ∈ H, define the sequenceof projections:

· · ·PM2PM1PM2PM1(f).

M2

M1

f

Theorem ([von Neumann ’33])

For any f ∈ H, limn→∞

(PM2PM1

)n(f) = PM1∩M2(f).

Sergios Theodoridis A Framework for Online Learning Vienna 14 / 97

Page 37:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projections Onto Convex Sets (POCS)

Theorem (POCS1)Given a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . For any f0 ∈ H, this defines thesequence of points

fn+1 := PCp · · ·PC1(fn), ∀n,

converges weakly to an f∗ ∈⋂p

i=1 Ci.

C1

C2fn

1[Bregman ’65], [Gubin, Polyak, Raik ’67].Sergios Theodoridis A Framework for Online Learning Vienna 15 / 97

Page 38:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projections Onto Convex Sets (POCS)

Theorem (POCS1)Given a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . For any f0 ∈ H, this defines thesequence of points

fn+1 := PCp · · ·PC1(fn), ∀n,

converges weakly to an f∗ ∈⋂p

i=1 Ci.

C1

C2fn

1[Bregman ’65], [Gubin, Polyak, Raik ’67].Sergios Theodoridis A Framework for Online Learning Vienna 15 / 97

Page 39:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projections Onto Convex Sets (POCS)

Theorem (POCS1)Given a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . For any f0 ∈ H, this defines thesequence of points

fn+1 := PCp · · ·PC1(fn), ∀n,

converges weakly to an f∗ ∈⋂p

i=1 Ci.

C1

C2fn

PC1PC2

(fn)

1[Bregman ’65], [Gubin, Polyak, Raik ’67].Sergios Theodoridis A Framework for Online Learning Vienna 15 / 97

Page 40:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projections Onto Convex Sets (POCS)

Theorem (POCS1)Given a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . For any f0 ∈ H, this defines thesequence of points

fn+1 := PCp · · ·PC1(fn), ∀n,

converges weakly to an f∗ ∈⋂p

i=1 Ci.

C1

C2fn

PC1PC2

(fn)

1[Bregman ’65], [Gubin, Polyak, Raik ’67].Sergios Theodoridis A Framework for Online Learning Vienna 15 / 97

Page 41:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projections Onto Convex Sets (POCS)

Theorem (POCS1)Given a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . For any f0 ∈ H, this defines thesequence of points

fn+1 := PCp · · ·PC1(fn), ∀n,

converges weakly to an f∗ ∈⋂p

i=1 Ci.

C1

C2fn

PC1PC2

(fn)

(PC1PC2

)2(fn)

1[Bregman ’65], [Gubin, Polyak, Raik ’67].Sergios Theodoridis A Framework for Online Learning Vienna 15 / 97

Page 42:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 43:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

C1

C2

fn

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 44:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

C1

C2

fn

fn+1

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 45:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

C1

C2

fn

fn+1

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 46:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

C1

C2

fn

fn+1

fn+2

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 47:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Extrapolated Parallel Projection Method (EPPM)

EPPM2

Given a finite number of closed convex sets C1, . . . , Cp, with⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Let also a set of positive constantsw1, . . . , wp such that

∑p

i=1 wi = 1. Then for any f0, the sequence

fn+1 = fn + µn

( p∑

i=1

wiPCi(fn)

︸ ︷︷ ︸

Convex combination of projections

−fn

)

, ∀n,

converges weakly to a point f∗ in⋂p

i=1 Ci,where µn ∈ (ǫ,Mn), for ǫ ∈ (0, 1), and

Mn :=∑p

i=1wi‖PCi

(fn)−fn‖2

‖∑p

i=1wiPCi

(fn)−fn‖2 .

C1

C2

fn

fn+1

fn+2

2[Pierra ’84].Sergios Theodoridis A Framework for Online Learning Vienna 16 / 97

Page 48:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 49:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 50:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 51:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1 fn+1

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 52:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1 fn+1

Cn+1

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 53:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1 fn+1

Cn+1

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 54:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

Cn

fn

Cn−1 fn+1

Cn+1

fn+2

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 55:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Infinite Number of Closed Convex Sets

Adaptive Projected Subgradient Method (APSM)3

Given an infinite number of closed convex sets (Cn)n≥0, let their associated projectionmappings be (PCn)n≥0. For any starting point f0, and an integer q > 0, let thesequence

fn+1 = fn + µn

( n∑

j=n−q+1

wjPCj(fn)− fn

)

, ∀n,

where µn ∈ (0, 2Mn), and

Mn :=∑n

j=n−q+1 wj

∥PCj

(fn)−fn

2

nj=n−q+1

wjPCj(fn)−fn

2 .

Under certain constraints the abovesequence converges strongly to apoint f∗ ∈ clos

(⋃

m≥0

n≥m Cn

).

Cn

fn

Cn−1 fn+1

Cn+1

fn+2

3[Yamada ’03], [Yamada, Ogura ’04], [Slavakis, Yamada, Ogura ’06].Sergios Theodoridis A Framework for Online Learning Vienna 17 / 97

Page 56:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Application to Machine Learning

The TaskGiven a set of training samples x0, . . . ,xN ⊂ R

m and a set ofcorresponding desired responses y0, . . . , yN , estimate a functionf(·) : Rm → R that fits the data.

Sergios Theodoridis A Framework for Online Learning Vienna 18 / 97

Page 57:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Application to Machine Learning

The TaskGiven a set of training samples x0, . . . ,xN ⊂ R

m and a set ofcorresponding desired responses y0, . . . , yN , estimate a functionf(·) : Rm → R that fits the data.

The Expected / Empirical Risk Function approachEstimate f so that the expected risk based on a loss function L(·, ·) isminimized:

minf

EL(f(x), y)

,

or, in practice, the empirical risk is minimized:

minf

N∑

n=0

L(f(xn), yn

).

Sergios Theodoridis A Framework for Online Learning Vienna 18 / 97

Page 58:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Loss Functions

Example (Classification)For a given margin ρ ≥ 0, and yn ∈ +1,−1, ∀n, define the softmargin loss function:

L(f(xn), yn

):= max

0, ρ− ynf(xn)

, ∀n.

ρ0

L

R

Sergios Theodoridis A Framework for Online Learning Vienna 19 / 97

Page 59:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Loss Functions

Example (Regression)The square loss function:

L(f(xn), yn

):=(yn − f(xn)

)2, ∀n.

L

0

Sergios Theodoridis A Framework for Online Learning Vienna 20 / 97

Page 60:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Set Theoretic Estimation Approach

Main IdeaThe goal here is to have a solution that is in agreement with all theavailable information, that resides in the data as well as in the availablea-priori information.

Sergios Theodoridis A Framework for Online Learning Vienna 21 / 97

Page 61:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Set Theoretic Estimation Approach

Main IdeaThe goal here is to have a solution that is in agreement with all theavailable information, that resides in the data as well as in the availablea-priori information.

The MeansEach piece of information, associated with the training pair(xn, yn), is represented in the solution space by a set.

Sergios Theodoridis A Framework for Online Learning Vienna 21 / 97

Page 62:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Set Theoretic Estimation Approach

Main IdeaThe goal here is to have a solution that is in agreement with all theavailable information, that resides in the data as well as in the availablea-priori information.

The MeansEach piece of information, associated with the training pair(xn, yn), is represented in the solution space by a set.

Each piece of a-priori information, i.e., each constraint, is alsorepresented by a set.

Sergios Theodoridis A Framework for Online Learning Vienna 21 / 97

Page 63:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Set Theoretic Estimation Approach

Main IdeaThe goal here is to have a solution that is in agreement with all theavailable information, that resides in the data as well as in the availablea-priori information.

The MeansEach piece of information, associated with the training pair(xn, yn), is represented in the solution space by a set.

Each piece of a-priori information, i.e., each constraint, is alsorepresented by a set.

The intersection of all these sets constitutes the family ofsolutions.

Sergios Theodoridis A Framework for Online Learning Vienna 21 / 97

Page 64:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Set Theoretic Estimation Approach

Main IdeaThe goal here is to have a solution that is in agreement with all theavailable information, that resides in the data as well as in the availablea-priori information.

The MeansEach piece of information, associated with the training pair(xn, yn), is represented in the solution space by a set.

Each piece of a-priori information, i.e., each constraint, is alsorepresented by a set.

The intersection of all these sets constitutes the family ofsolutions.

The family of solutions is known as the feasibility set.

Sergios Theodoridis A Framework for Online Learning Vienna 21 / 97

Page 65:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

That is, represent each cost and constraint byan equivalent set Cn and find the solution

f ∈⋂

n

Cn ⊂ H.

Sergios Theodoridis A Framework for Online Learning Vienna 22 / 97

Page 66:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Classification: The Soft Margin Loss

The Setting

Let the training data set (xn, yn) ⊂ Rm × +1,−1, n = 0, 1, . . ..

Assume the two class task,

yn = +1, xn ∈W1,

yn = −1, xn ∈W2.

Assume linear separable classes.

Sergios Theodoridis A Framework for Online Learning Vienna 23 / 97

Page 67:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Classification: The Soft Margin Loss

The Setting

Let the training data set (xn, yn) ⊂ Rm × +1,−1, n = 0, 1, . . ..

Assume the two class task,

yn = +1, xn ∈W1,

yn = −1, xn ∈W2.

Assume linear separable classes.

The Goal

Sergios Theodoridis A Framework for Online Learning Vienna 23 / 97

Page 68:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Classification: The Soft Margin Loss

The Setting

Let the training data set (xn, yn) ⊂ Rm × +1,−1, n = 0, 1, . . ..

Assume the two class task,

yn = +1, xn ∈W1,

yn = −1, xn ∈W2.

Assume linear separable classes.

The Goal

Find f(x) = θtx+ b, so that

θtxn + b ≥ ρ, if yn = +1,

θtxn + b ≤ ρ, if yn = −1.

Sergios Theodoridis A Framework for Online Learning Vienna 23 / 97

Page 69:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Classification: The Soft Margin Loss

The Setting

Let the training data set (xn, yn) ⊂ Rm × +1,−1, n = 0, 1, . . ..

Assume the two class task,

yn = +1, xn ∈W1,

yn = −1, xn ∈W2.

Assume linear separable classes.

The Goal

Find f(x) = θtx+ b, so that

θtxn + b ≥ ρ, if yn = +1,

θtxn + b ≤ ρ, if yn = −1.Hereafter,

(θ ←

[θb

], xn ← [ xn

1 ]).

Sergios Theodoridis A Framework for Online Learning Vienna 23 / 97

Page 70:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Classification

The Piece of Information

Find all those θ so that ynθtxn ≥ ρ, n = 0, 1, . . .

Sergios Theodoridis A Framework for Online Learning Vienna 24 / 97

Page 71:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Classification

The Piece of Information

Find all those θ so that ynθtxn ≥ ρ, n = 0, 1, . . .

The Equivalent Set

H+n :=

θ ∈ R

m : ynxtnθ ≥ ρ

, n = 0, 1, . . ..

w

Rm

w : ynxtnw ≥ ρ

w : ynxtnw = ρ

PH+n(w)

ynxn

L

Sergios Theodoridis A Framework for Online Learning Vienna 24 / 97

Page 72:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 73:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 74:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 75:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 76:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 77:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 78:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

θn

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 79:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

θn

H+n

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 80:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

θn

H+n

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 81:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

θn

H+n

θn+1

H+n+1

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 82:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The feasibility set

For each pair (xn, yn), form the equivalent halfspace H+n , and

find θ∗ ∈⋂

n

H+n .

If linearly separable, the problem is feasible.

The Algorithm

Each H+n is a convex set.

Start from an arbitrary initial θ0.

Keep projecting as each H+n is

formed.

PH

+n(θ) = θ −

min0,〈θ,ynxn〉

‖xn‖2ynxn.

θn−1

H+n−1

θn

H+n

θn+1

H+n+1

θn+2

Sergios Theodoridis A Framework for Online Learning Vienna 25 / 97

Page 83:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 84:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 85:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 86:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

PH+n(θn)

PH+n−1(θn)

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 87:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

PH+n(θn)

PH+n−1(θn)

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 88:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

PH+n(θn)

PH+n−1(θn)

θn+1

H+n+1

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 89:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

PH+n(θn)

PH+n−1(θn)

θn+1

H+n+1

PH+n+1(θn+1)

PH+n(θn+1)

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 90:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithmic Solution to Online Classification

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PH+

n(θn)− θn

,

µn ∈ (0, 2Mn), and

Mn :=

∑nj=n−q+1 ω

(n)j

PH

+n(θn)−θn

2

∑nj=n−q+1 ω

(n)j P

H+n(θn)−θn

2 , if∑n

j=n−q+1 ω(n)j PH+

n(θn) 6= θn,

1, otherwise.

θn

H+n−1

H+n

PH+n(θn)

PH+n−1(θn)

θn+1

H+n+1

PH+n+1(θn+1)

PH+n(θn+1)

θn+2

Sergios Theodoridis A Framework for Online Learning Vienna 26 / 97

Page 91:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Regression

The linear ǫ-insensitive loss function case

L(x) := max0, |x| − ǫ

, x ∈ R.

0

L

x

−ǫ ǫ

−ǫ

Sergios Theodoridis A Framework for Online Learning Vienna 27 / 97

Page 92:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Regression

The Piece of InformationGiven (xn, yn) ∈ R

m × R, find θ ∈ Rm such that

∣∣θtxn − yn

∣∣ ≤ ǫ, ∀n.

Sergios Theodoridis A Framework for Online Learning Vienna 28 / 97

Page 93:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Regression

The Piece of InformationGiven (xn, yn) ∈ R

m × R, find θ ∈ Rm such that

∣∣θtxn − yn

∣∣ ≤ ǫ, ∀n.

The Equivalent Set (Hyperslab)

Sn[ǫ] :=θ ∈ R

m : |θtxn − yn| ≤ ǫ, ∀n.

L

0

xn

Sn[ǫ]

θ ∈ Rm : θtxn − yn = ǫ

θ ∈ Rm : θtxn − yn = −ǫ

θ

PSn[ǫ](θ)

Rm

Sergios Theodoridis A Framework for Online Learning Vienna 28 / 97

Page 94:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Hyperslab

PSn[ǫ](θ) = θ + βxn, ∀θ ∈ Rm,

where

β :=

yn−θtxn−ǫxtnxn

, if θtxn − yn < −ǫ,

0, if∣∣θtxn − yn

∣∣ ≤ ǫ,

−θtxn−yn−ǫxtnxn

, if θtxn − yn > ǫ.

Sergios Theodoridis A Framework for Online Learning Vienna 29 / 97

Page 95:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Hyperslab

PSn[ǫ](θ) = θ + βxn, ∀θ ∈ Rm,

where

β :=

yn−θtxn−ǫxtnxn

, if θtxn − yn < −ǫ,

0, if∣∣θtxn − yn

∣∣ ≤ ǫ,

−θtxn−yn−ǫxtnxn

, if θtxn − yn > ǫ.

The feasibility setFor each pair (xn, yn), form the equivalent hyperslab Sn, and

find θ∗ ∈⋂

n

Sn[ǫ].

Sergios Theodoridis A Framework for Online Learning Vienna 29 / 97

Page 96:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for the Online Regression

Assume weights ω(n)j ≥ 0 such that

∑nj=n−q+1 ω

(n)j = 1. For any

θ0 ∈ Rm,

θn+1 := θn + µn

n∑

j=n−q+1

ω(n)j PSj [ǫ](θn)− θn

, ∀n ≥ 0,

where the extrapolation coefficient µn ∈ (0, 2Mn) with

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(θn)−θn‖2

‖∑n

j=n−q+1 ω(n)j PSj [ǫ]

(θn)−θn‖2, if

∑nj=n−q+1 ω

(n)j PSj [ǫ](θn) 6= θn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 30 / 97

Page 97:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 98:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 99:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 100:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 101:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θn+1

Sn+1[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 102:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θn+1

Sn+1[ǫ]

PSn+1[ǫ](θn+1)

PSn[ǫ](θn+1)

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 103:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θn+1

Sn+1[ǫ]

PSn+1[ǫ](θn+1)

PSn[ǫ](θn+1)θn+2

Sergios Theodoridis A Framework for Online Learning Vienna 31 / 97

Page 104:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Reproducing Kernel Hilbert Spaces (RKHS)

DefinitionConsider a Hilbert space H of functions f : Rm → R.

Sergios Theodoridis A Framework for Online Learning Vienna 32 / 97

Page 105:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Reproducing Kernel Hilbert Spaces (RKHS)

DefinitionConsider a Hilbert space H of functions f : Rm → R.Assume there exists a kernel function κ(·, ·) : Rm × R

m → R such that

Sergios Theodoridis A Framework for Online Learning Vienna 32 / 97

Page 106:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Reproducing Kernel Hilbert Spaces (RKHS)

DefinitionConsider a Hilbert space H of functions f : Rm → R.Assume there exists a kernel function κ(·, ·) : Rm × R

m → R such that

κ(x, ·) ∈ H, ∀x ∈ Rm,

Sergios Theodoridis A Framework for Online Learning Vienna 32 / 97

Page 107:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Reproducing Kernel Hilbert Spaces (RKHS)

DefinitionConsider a Hilbert space H of functions f : Rm → R.Assume there exists a kernel function κ(·, ·) : Rm × R

m → R such that

κ(x, ·) ∈ H, ∀x ∈ Rm,

〈f, κ(x, ·)〉 = f(x), ∀x ∈ Rm, ∀f ∈ H, (reproducing property).

Sergios Theodoridis A Framework for Online Learning Vienna 32 / 97

Page 108:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Reproducing Kernel Hilbert Spaces (RKHS)

DefinitionConsider a Hilbert space H of functions f : Rm → R.Assume there exists a kernel function κ(·, ·) : Rm × R

m → R such that

κ(x, ·) ∈ H, ∀x ∈ Rm,

〈f, κ(x, ·)〉 = f(x), ∀x ∈ Rm, ∀f ∈ H, (reproducing property).

Then H is called a Reproducing Kernel Hilbert Space (RKHS).

xR

m

H

κ(x, ·)κ(·, ·)

dim(H) ≤ ∞

f

Sergios Theodoridis A Framework for Online Learning Vienna 32 / 97

Page 109:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function

If such a kernel function exists, then it is a symmetric and positivedefinite kernel; for any real numbers a0, a1, . . . , aN , anyx0,x1, . . .xN ∈ R

m, and any N ,

N∑

i=0

N∑

j=0

aiajκ(xi,xj) ≥ 0.

Sergios Theodoridis A Framework for Online Learning Vienna 33 / 97

Page 110:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function

If such a kernel function exists, then it is a symmetric and positivedefinite kernel; for any real numbers a0, a1, . . . , aN , anyx0,x1, . . .xN ∈ R

m, and any N ,

N∑

i=0

N∑

j=0

aiajκ(xi,xj) ≥ 0.

The reverse is also true. Let

κ(·, ·) : Rm × Rm → R,

be symmetric and positive definite. Then, there exists an RKHS offunctions on R

m, such that κ(·, ·) is a reproducing kernel of H.

Sergios Theodoridis A Framework for Online Learning Vienna 33 / 97

Page 111:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function

If such a kernel function exists, then it is a symmetric and positivedefinite kernel; for any real numbers a0, a1, . . . , aN , anyx0,x1, . . .xN ∈ R

m, and any N ,

N∑

i=0

N∑

j=0

aiajκ(xi,xj) ≥ 0.

The reverse is also true. Let

κ(·, ·) : Rm × Rm → R,

be symmetric and positive definite. Then, there exists an RKHS offunctions on R

m, such that κ(·, ·) is a reproducing kernel of H.

Each RKHS is uniquely defined by a κ(·, ·), and each (symmetric)positive definite kernel, κ(·, ·), uniquely defines an RKHS4.

4[Aronszajn ’50]Sergios Theodoridis A Framework for Online Learning Vienna 33 / 97

Page 112:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function (cntd)The Kernel Trick

The celebrated kernel trick is formed as follows.

Sergios Theodoridis A Framework for Online Learning Vienna 34 / 97

Page 113:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function (cntd)The Kernel Trick

The celebrated kernel trick is formed as follows. Let

x 7→ κ(x, ·) =: φ(x) ∈ H,

y 7→ κ(y, ·) =: φ(y) ∈ H.

Sergios Theodoridis A Framework for Online Learning Vienna 34 / 97

Page 114:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function (cntd)The Kernel Trick

The celebrated kernel trick is formed as follows. Let

x 7→ κ(x, ·) =: φ(x) ∈ H,

y 7→ κ(y, ·) =: φ(y) ∈ H.

Then,〈φ(x), φ(y)〉 = κ(x,y).

Sergios Theodoridis A Framework for Online Learning Vienna 34 / 97

Page 115:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Properties of the Kernel Function (cntd)The Kernel Trick

The celebrated kernel trick is formed as follows. Let

x 7→ κ(x, ·) =: φ(x) ∈ H,

y 7→ κ(y, ·) =: φ(y) ∈ H.

Then,〈φ(x), φ(y)〉 = κ(x,y).

This is an important property since it leads to an easy, black boxrule, which transforms a nonlinear task to a linear one; this is doneby the following steps...

Sergios Theodoridis A Framework for Online Learning Vienna 34 / 97

Page 116:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Steps for Kernel Methods

Assume the implicit mapping

Rm ∋ x 7→ φ(x) ∈ H.

Sergios Theodoridis A Framework for Online Learning Vienna 35 / 97

Page 117:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Steps for Kernel Methods

Assume the implicit mapping

Rm ∋ x 7→ φ(x) ∈ H.

Solve the problem linearly in H.

Sergios Theodoridis A Framework for Online Learning Vienna 35 / 97

Page 118:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Steps for Kernel Methods

Assume the implicit mapping

Rm ∋ x 7→ φ(x) ∈ H.

Solve the problem linearly in H.

Use an algorithm that can be casted (modified) in terms of innerproducts.

Sergios Theodoridis A Framework for Online Learning Vienna 35 / 97

Page 119:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Steps for Kernel Methods

Assume the implicit mapping

Rm ∋ x 7→ φ(x) ∈ H.

Solve the problem linearly in H.

Use an algorithm that can be casted (modified) in terms of innerproducts.

Replace inner product computations with kernel ones:

〈φ(x), φ(y)〉 = κ(x,y).

This is the step that brings the nonlinearity in the modeling.

Sergios Theodoridis A Framework for Online Learning Vienna 35 / 97

Page 120:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Kernel Functions Examples

The Gaussian kernel:

κ(x,y) := exp

(

−‖x− y‖2

σ2

)

,

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

κ(x, y)

x

Sergios Theodoridis A Framework for Online Learning Vienna 36 / 97

Page 121:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Kernel Functions Examples

The Gaussian kernel:

κ(x,y) := exp

(

−‖x− y‖2

σ2

)

,

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

κ(x, y)

x

The polynomial kernel:

κ(x,y) :=(xty + 1

)d,

−3 −2 −1 0 1 2 3−20

0

20

40

60

80

x

κ(x, y)

Sergios Theodoridis A Framework for Online Learning Vienna 36 / 97

Page 122:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Representer Theorem

Let a strictly monotone increasing function: Ω : [0,∞)→ R,

Sergios Theodoridis A Framework for Online Learning Vienna 37 / 97

Page 123:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Representer Theorem

Let a strictly monotone increasing function: Ω : [0,∞)→ R,and a (cost) function: L : R× R→ R ∪ ∞.

Sergios Theodoridis A Framework for Online Learning Vienna 37 / 97

Page 124:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Representer Theorem

Let a strictly monotone increasing function: Ω : [0,∞)→ R,and a (cost) function: L : R× R→ R ∪ ∞.Then, the solution of the task

minf∈H

N∑

n=0

L(yn, f(xn)) + Ω(‖f‖),

admits a representation of the form:

f∗ =

N∑

n=0

anκ(xn, ·).

Sergios Theodoridis A Framework for Online Learning Vienna 37 / 97

Page 125:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Representer Theorem

Let a strictly monotone increasing function: Ω : [0,∞)→ R,and a (cost) function: L : R× R→ R ∪ ∞.Then, the solution of the task

minf∈H

N∑

n=0

L(yn, f(xn)) + Ω(‖f‖),

admits a representation of the form:

f∗ =

N∑

n=0

anκ(xn, ·).

Example

L(yn, f(xn)) :=(yn − f(xn)

)2,

Ω(‖f‖) := ‖f‖2 = 〈f, f〉.

Sergios Theodoridis A Framework for Online Learning Vienna 37 / 97

Page 126:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Regression in RKHS

The GoalLet the training data set (xn, yn) ⊂ R

m ×R, n = 0, 1, . . ..

xn 7→ κ(xn, ·), which is a function of one variable.

Sergios Theodoridis A Framework for Online Learning Vienna 38 / 97

Page 127:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Regression in RKHS

The GoalLet the training data set (xn, yn) ⊂ R

m ×R, n = 0, 1, . . ..

xn 7→ κ(xn, ·), which is a function of one variable.

Find f ∈ H such that

|f(xn)− yn| ≤ ǫ, ∀n.

Sergios Theodoridis A Framework for Online Learning Vienna 38 / 97

Page 128:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Regression

The Piece of InformationGiven (xn, yn) ∈ R

m × R, n = 0, 1, 2, . . ., find f ∈ H such that

|〈f, κ(xn, ·)〉 − yn| ≤ ǫ, ∀n.

Sergios Theodoridis A Framework for Online Learning Vienna 39 / 97

Page 129:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Set Theoretic Estimation Approach to Regression

The Piece of InformationGiven (xn, yn) ∈ R

m × R, n = 0, 1, 2, . . ., find f ∈ H such that

|〈f, κ(xn, ·)〉 − yn| ≤ ǫ, ∀n.

The Equivalent Set (Hyperslab)

Sn[ǫ] :=f ∈ H : |〈f, κ(xn, ·)〉 − yn| ≤ ǫ

, ∀n.

L

0

κ(xn, ·)

Sn[ǫ]

f ∈ H : 〈f, κ(xn, ·)〉 − yn = ǫf ∈ H : 〈f, κ(xn, ·)〉 − yn = −ǫ

f

PSn[ǫ](f)

H

Sergios Theodoridis A Framework for Online Learning Vienna 39 / 97

Page 130:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Hyperslab

PSn[ǫ](f) = f + βκ(xn, ·),∀f ∈ H,

where

β :=

yn−〈f,κ(xn,·)〉−ǫ

κ(xn,xn) , if 〈f, κ(xn, ·)〉 − yn < −ǫ,

0, if |〈f, κ(xn, ·)〉 − yn| ≤ ǫ,

− 〈f,κ(xn,·)〉−yn−ǫ

κ(xn,xn) , if 〈f, κ(xn, ·)〉 − yn > ǫ.

Sergios Theodoridis A Framework for Online Learning Vienna 40 / 97

Page 131:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto a Hyperslab

PSn[ǫ](f) = f + βκ(xn, ·),∀f ∈ H,

where

β :=

yn−〈f,κ(xn,·)〉−ǫ

κ(xn,xn) , if 〈f, κ(xn, ·)〉 − yn < −ǫ,

0, if |〈f, κ(xn, ·)〉 − yn| ≤ ǫ,

− 〈f,κ(xn,·)〉−yn−ǫ

κ(xn,xn) , if 〈f, κ(xn, ·)〉 − yn > ǫ.

The feasibility set

For each pair (xn, yn), form the equivalent hyperslab Sn, and

find f∗ ∈⋂

n≥n0

Sn[ǫ].

Sergios Theodoridis A Framework for Online Learning Vienna 40 / 97

Page 132:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for Online Regression in RKHS

For f0 ∈ H, execute the following algorithm5

fn+1 := fn + µn

n∑

j=n−q+1

ω(n)j PSj [ǫ](fn)− fn

, ∀n ≥ 0,

where the extrapolation coefficient µn ∈ (0, 2Mn) with

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(fn)−fn‖2

‖∑n

j=n−q+1 ω(n)j PSj [ǫ]

(fn)−fn‖2, if

∑nj=n−q+1 ω

(n)j PSj [ǫ](fn) 6= fn,

1, otherwise.

5[Slavakis, Theodoridis, Yamada ’09].Sergios Theodoridis A Framework for Online Learning Vienna 41 / 97

Page 133:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsification

As time goes by:

fn :=

n−1∑

i=0

γ(n)i κ(xi, ·).

Sergios Theodoridis A Framework for Online Learning Vienna 42 / 97

Page 134:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsification

As time goes by:

fn :=

n−1∑

i=0

γ(n)i κ(xi, ·).

Memory and computational load grows unbounded as n→∞!

Sergios Theodoridis A Framework for Online Learning Vienna 42 / 97

Page 135:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsification

As time goes by:

fn :=

n−1∑

i=0

γ(n)i κ(xi, ·).

Memory and computational load grows unbounded as n→∞!

To cope with the problem, we additionally constrain the norm of fn by apredefined δ > 06:

∀n ≥ 0, fn ∈ B[0, δ] := f ∈ H : ‖f‖ ≤ δ : Closed Ball.

6[Slavakis, Theodoridis, Yamada ’08], [Slavakis, Theodoridis ’08].Sergios Theodoridis A Framework for Online Learning Vienna 42 / 97

Page 136:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsification

As time goes by:

fn :=

n−1∑

i=0

γ(n)i κ(xi, ·).

Memory and computational load grows unbounded as n→∞!

To cope with the problem, we additionally constrain the norm of fn by apredefined δ > 06:

∀n ≥ 0, fn ∈ B[0, δ] := f ∈ H : ‖f‖ ≤ δ : Closed Ball.

GoalThus, we are looking for a classifier f ∈ H such that

f ∈ B[0, δ] ∩ (⋂

n≥n0

Sn[ǫ]).

6[Slavakis, Theodoridis, Yamada ’08], [Slavakis, Theodoridis ’08].Sergios Theodoridis A Framework for Online Learning Vienna 42 / 97

Page 137:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 138:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sn−1[ǫ]

Sn[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 139:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](fn)

PSn−1[ǫ](fn)

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 140:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](fn)

PSn−1[ǫ](fn)

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 141:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](fn)

PSn−1[ǫ](fn)

B[0, δ]

0

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 142:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

fn

Sn−1[ǫ]

Sn[ǫ]

PSn[ǫ](fn)

PSn−1[ǫ](fn)

B[0, δ]

0

fn+1

Sergios Theodoridis A Framework for Online Learning Vienna 43 / 97

Page 143:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distributive Learning for Sensor Networks

Problem DefinitionIn a distributed network, the nodes are tasked to collectinformation and estimate a parameter of interest. The generalconcept can be summarized as follows:

Sergios Theodoridis A Framework for Online Learning Vienna 44 / 97

Page 144:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distributive Learning for Sensor Networks

Problem DefinitionIn a distributed network, the nodes are tasked to collectinformation and estimate a parameter of interest. The generalconcept can be summarized as follows:

The nodes sense an amount of data from the environment.

Sergios Theodoridis A Framework for Online Learning Vienna 44 / 97

Page 145:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distributive Learning for Sensor Networks

Problem DefinitionIn a distributed network, the nodes are tasked to collectinformation and estimate a parameter of interest. The generalconcept can be summarized as follows:

The nodes sense an amount of data from the environment. Computations are performed locally in each node.

Sergios Theodoridis A Framework for Online Learning Vienna 44 / 97

Page 146:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distributive Learning for Sensor Networks

Problem DefinitionIn a distributed network, the nodes are tasked to collectinformation and estimate a parameter of interest. The generalconcept can be summarized as follows:

The nodes sense an amount of data from the environment. Computations are performed locally in each node. Each node transmits the locally obtained estimate to a

neighborhood of nodes.

Sergios Theodoridis A Framework for Online Learning Vienna 44 / 97

Page 147:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distributive Learning for Sensor Networks

Problem DefinitionIn a distributed network, the nodes are tasked to collectinformation and estimate a parameter of interest. The generalconcept can be summarized as follows:

The nodes sense an amount of data from the environment. Computations are performed locally in each node. Each node transmits the locally obtained estimate to a

neighborhood of nodes.

The goal is to drive the locally computed estimates toconverge to the same value. This is known as consensus.

Sergios Theodoridis A Framework for Online Learning Vienna 44 / 97

Page 148:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Diffusion Topology

The most commonly used topology is the diffusion network:

#1

#2

#3

#5

#7#6

#1’s neighborhood

#5’s neighborhood

#4

Sergios Theodoridis A Framework for Online Learning Vienna 45 / 97

Page 149:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Problem FormulationLet a node set denoted as N := 1, 2, . . . , N and each node, k, attime, n, has access to the measurements

yk(n) ∈ R, xk,n ∈ Rm,

Sergios Theodoridis A Framework for Online Learning Vienna 46 / 97

Page 150:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Problem FormulationLet a node set denoted as N := 1, 2, . . . , N and each node, k, attime, n, has access to the measurements

yk(n) ∈ R, xk,n ∈ Rm,

we assume that there exists a linear system, θ∗, such that

yk(n) = xtk,nθ∗ + vk(n),

where vk(n) is the noise.

Sergios Theodoridis A Framework for Online Learning Vienna 46 / 97

Page 151:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Problem FormulationLet a node set denoted as N := 1, 2, . . . , N and each node, k, attime, n, has access to the measurements

yk(n) ∈ R, xk,n ∈ Rm,

we assume that there exists a linear system, θ∗, such that

yk(n) = xtk,nθ∗ + vk(n),

where vk(n) is the noise.The task is to estimate the common θ∗.

Sergios Theodoridis A Framework for Online Learning Vienna 46 / 97

Page 152:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Algorithm (node k)

Combine estimates received from the neighborhoodNk:

φk(n) :=∑

l∈Nk

ck,l(n+ 1)θl(n).

Sergios Theodoridis A Framework for Online Learning Vienna 47 / 97

Page 153:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Algorithm (node k)

Combine estimates received from the neighborhoodNk:

φk(n) :=∑

l∈Nk

ck,l(n+ 1)θl(n).

Perform the adaptation step7:

θk(n+1) := φk(n)+µk(n+1)

n∑

j=n−q+1

ωk,jPSk,j(φk(n))− φk(n)

.

7[Chouvardas, Slavakis, Theodoridis, ’11].Sergios Theodoridis A Framework for Online Learning Vienna 47 / 97

Page 154:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 155:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2 θ6(n)

θ2(n)

θ3(n)

θ2(n)θ6(n)

θ3(n)

Rm

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 156:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2 θ6(n)

θ2(n)

θ3(n)

θ2(n)θ6(n)

θ3(n)

Rm

φ1(n)

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 157:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2 θ6(n)

θ2(n)

θ3(n)

θ2(n)θ6(n)

θ3(n)

Rm

φ1(n)

S1,n−1

S1,n

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 158:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2 θ6(n)

θ2(n)

θ3(n)

θ2(n)θ6(n)

θ3(n)

Rm

φ1(n)

S1,n−1

S1,n

PS1,n(φ1(n))

PS1,n−1(φ1(n))

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 159:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Geometry of the Algorithm

#1

#3

#1’s neighborhood

#6

#2 θ6(n)

θ2(n)

θ3(n)

θ2(n)θ6(n)

θ3(n)

Rm

φ1(n)

S1,n−1

S1,n

PS1,n(φ1(n))

PS1,n−1(φ1(n))

θ1(n + 1)

Sergios Theodoridis A Framework for Online Learning Vienna 48 / 97

Page 160:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Part B

Sergios Theodoridis A Framework for Online Learning Vienna 49 / 97

Page 161:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 162:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

An operator theoretic approach will be followed.

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 163:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

An operator theoretic approach will be followed.Such an approach will be illustrated through two paradigms:

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 164:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

An operator theoretic approach will be followed.Such an approach will be illustrated through two paradigms:

Beamforming task.

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 165:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

An operator theoretic approach will be followed.Such an approach will be illustrated through two paradigms:

Beamforming task. Sparsity-aware learning problem.

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 166:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Outline of Part B

Incorporate a-priori information into our algorithmic framework.

An operator theoretic approach will be followed.Such an approach will be illustrated through two paradigms:

Beamforming task. Sparsity-aware learning problem.

Our objective is to show that a large variety of constrained onlinelearning tasks can be unified under a common umbrella; theAdaptive Projected Subgradient Method (APSM).

Sergios Theodoridis A Framework for Online Learning Vienna 50 / 97

Page 167:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Underlying ConceptsA Mapping and its Fixed Point Set

A mapping defined in a Hilbert space H:

T : H → H.

Sergios Theodoridis A Framework for Online Learning Vienna 51 / 97

Page 168:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Underlying ConceptsA Mapping and its Fixed Point Set

A mapping defined in a Hilbert space H:

T : H → H.

Given a mapping T : H → H, its fixed point set is defined as

Fix(T ) :=f ∈ H : T (f) = f

,

i.e., all those points which are unaffected by T .

Sergios Theodoridis A Framework for Online Learning Vienna 51 / 97

Page 169:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Underlying ConceptsA Mapping and its Fixed Point Set

A mapping defined in a Hilbert space H:

T : H → H.

Given a mapping T : H → H, its fixed point set is defined as

Fix(T ) :=f ∈ H : T (f) = f

,

i.e., all those points which are unaffected by T .

Example

If C is a closed convex set inH, then Fix(PC) = C.

f

Rm (H)

C

d(f, C)PC(f )

f ′ = PC(f′)

Sergios Theodoridis A Framework for Online Learning Vienna 51 / 97

Page 170:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Beamforming

Antenna arrays are vastly utilized for space-time filtering:

M2 − 1

θ∗(m1,m2, 1) θ∗(m1,m2, P − 1)

x(m1,m2, n− 1) x(m1,m2, n− P + 1)

M1 − 1

d1

d2

m2

SOIJammer

Jammer

(0, 0)

x(m1,m2, n)Output

φ2

m1

yn

φ1

z−1 z−1

θ∗(m1,m2, 0)

Sergios Theodoridis A Framework for Online Learning Vienna 52 / 97

Page 171:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Beamforming

Antenna arrays are vastly utilized for space-time filtering:

M2 − 1

θ∗(m1,m2, 1) θ∗(m1,m2, P − 1)

x(m1,m2, n− 1) x(m1,m2, n− P + 1)

M1 − 1

d1

d2

m2

SOIJammer

Jammer

(0, 0)

x(m1,m2, n)Output

φ2

m1

yn

φ1

z−1 z−1

θ∗(m1,m2, 0)

The superscript ∗ stands for complex conjugation.

Sergios Theodoridis A Framework for Online Learning Vienna 52 / 97

Page 172:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Beamforming

Antenna arrays are vastly utilized for space-time filtering:

M2 − 1

θ∗(m1,m2, 1) θ∗(m1,m2, P − 1)

x(m1,m2, n− 1) x(m1,m2, n− P + 1)

M1 − 1

d1

d2

m2

SOIJammer

Jammer

(0, 0)

x(m1,m2, n)Output

φ2

m1

yn

φ1

z−1 z−1

θ∗(m1,m2, 0)

The superscript ∗ stands for complex conjugation. SOI: Signal Of Interest.

Sergios Theodoridis A Framework for Online Learning Vienna 52 / 97

Page 173:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Beamforming

Antenna arrays are vastly utilized for space-time filtering:

M2 − 1

θ∗(m1,m2, 1) θ∗(m1,m2, P − 1)

x(m1,m2, n− 1) x(m1,m2, n− P + 1)

M1 − 1

d1

d2

m2

SOIJammer

Jammer

(0, 0)

x(m1,m2, n)Output

φ2

m1

yn

φ1

z−1 z−1

θ∗(m1,m2, 0)

The superscript ∗ stands for complex conjugation. SOI: Signal Of Interest.

After some re-arrangements, the output of the array is given by

yn := θtxn, n = 0, 1, 2, . . . .

Sergios Theodoridis A Framework for Online Learning Vienna 52 / 97

Page 174:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Beamforming

Antenna arrays are vastly utilized for space-time filtering:

M2 − 1

θ∗(m1,m2, 1) θ∗(m1,m2, P − 1)

x(m1,m2, n− 1) x(m1,m2, n− P + 1)

M1 − 1

d1

d2

m2

SOIJammer

Jammer

(0, 0)

x(m1,m2, n)Output

φ2

m1

yn

φ1

z−1 z−1

θ∗(m1,m2, 0)

The superscript ∗ stands for complex conjugation. SOI: Signal Of Interest.

After some re-arrangements, the output of the array is given by

yn := θtxn, n = 0, 1, 2, . . . .

The beamformer is the vector θ.Sergios Theodoridis A Framework for Online Learning Vienna 52 / 97

Page 175:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 176:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori information

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 177:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 178:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 179:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Knowledge of the approximate location of the SOIs and jammers.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 180:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Knowledge of the approximate location of the SOIs and jammers. Array calibration errors.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 181:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Knowledge of the approximate location of the SOIs and jammers. Array calibration errors. Inoperative array elements.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 182:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Knowledge of the approximate location of the SOIs and jammers. Array calibration errors. Inoperative array elements. Bounds on the weights of the array elements.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 183:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Goal of BeamformingBy utilizing all the available a-priori knowledge, reconstruct the SOIs,while, in the meantime, suppress the jamming signals.

A-priori informationKnown locations of the SOIs and/or the jammers.Robustness against erroneous information and arrayimperfections:

Knowledge of the approximate location of the SOIs and jammers. Array calibration errors. Inoperative array elements. Bounds on the weights of the array elements.

Given the previous a-priori info, and the set of data (yn,xn),n = 0, 1, 2, . . ., compute θ such that

θtxn ≈ yn, ∀n.

Sergios Theodoridis A Framework for Online Learning Vienna 53 / 97

Page 184:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 185:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

only the source of interest transmits a signal of value 1,

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 186:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

only the source of interest transmits a signal of value 1,

and there is no noise in the system.

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 187:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

only the source of interest transmits a signal of value 1,

and there is no noise in the system.

Remark: The steering vector comprises information like the location of the associatedsource, and the geometry of the array.

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 188:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

only the source of interest transmits a signal of value 1,

and there is no noise in the system.

Remark: The steering vector comprises information like the location of the associatedsource, and the geometry of the array.

Distortionless constraintIf sSOI is the steering vector associated to a SOI, then we would like to have:

stSOIθ = 1.

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 189:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Distortionless and Null Constraints

Definition (Steering vector)Each transmitting source is associated to a steering vector, s, defined as the vectorwhich collects all the signal values in the array if

only the source of interest transmits a signal of value 1,

and there is no noise in the system.

Remark: The steering vector comprises information like the location of the associatedsource, and the geometry of the array.

Distortionless constraintIf sSOI is the steering vector associated to a SOI, then we would like to have:

stSOIθ = 1.

NullsIf sjam is the steering vector associated to a jammer, then we would like to have:

stjamθ = 0.

Sergios Theodoridis A Framework for Online Learning Vienna 54 / 97

Page 190:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Beamforming

A large variety of a-priori knowledge in beamforming problems can becast by means of affine constraints; given a matrix C and a vector g:

Ctθ = g.

Sergios Theodoridis A Framework for Online Learning Vienna 55 / 97

Page 191:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Beamforming

A large variety of a-priori knowledge in beamforming problems can becast by means of affine constraints; given a matrix C and a vector g:

Ctθ = g.

Example

Let C := [sSOI, sjam], and g := [1, 0]t.

Sergios Theodoridis A Framework for Online Learning Vienna 55 / 97

Page 192:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Beamforming

A large variety of a-priori knowledge in beamforming problems can becast by means of affine constraints; given a matrix C and a vector g:

Ctθ = g.

Example

Let C := [sSOI, sjam], and g := [1, 0]t.

Define the following affine set V := argminθ∈Rm

∥∥Ctθ − g

∥∥,

Sergios Theodoridis A Framework for Online Learning Vienna 55 / 97

Page 193:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Beamforming

A large variety of a-priori knowledge in beamforming problems can becast by means of affine constraints; given a matrix C and a vector g:

Ctθ = g.

Example

Let C := [sSOI, sjam], and g := [1, 0]t.

Define the following affine set V := argminθ∈Rm

∥∥Ctθ − g

∥∥, which

contains, in general, an infinite number of points, and covers also thecase of inconsistent a-priori constraints, i.e., the case:

∀θ, Ctθ 6= g.

Sergios Theodoridis A Framework for Online Learning Vienna 55 / 97

Page 194:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projection onto the affine set V

Given V := argminθ∈Rm

∥∥Ctθ − g

∥∥, the metric projection mapping

onto V is given by

PV (θ) = θ −Ct†(Ctθ − g

), ∀θ ∈ R

m,

where (·)† denotes the Moore-Penrose pseudoinverse of a matrix.

θ

PV (θ)V

Rm

Sergios Theodoridis A Framework for Online Learning Vienna 56 / 97

Page 195:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Algorithm

At time n, given the training data (yn,xn), define the hyperslab:

Sn[ǫ] :=θ ∈ R

m :∣∣xt

nθ − yn∣∣ ≤ ǫ

.

Sergios Theodoridis A Framework for Online Learning Vienna 57 / 97

Page 196:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Affinely Constrained Algorithm

At time n, given the training data (yn,xn), define the hyperslab:

Sn[ǫ] :=θ ∈ R

m :∣∣xt

nθ − yn∣∣ ≤ ǫ

.

For any initial point θ0, and ∀n,

θn+1 := PV

θn + µn

n∑

i=n−q+1

ω(n)i PSi[ǫ](θn)− θn

,

µn ∈ (0, 2Mn),

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(θn)−θn‖2

‖∑n

j=n−q+1 ω(n)j PSj [ǫ]

(θn)−θn‖2,

if∑n

j=n−q+1 ω(n)j PSj [ǫ](θn) 6= θn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 57 / 97

Page 197:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 198:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 199:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

V

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 200:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

V

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 201:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

V

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 202:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

V

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 203:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn[ǫ]

Sn−1[ǫ]

V

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 58 / 97

Page 204:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 205:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 206:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

There are cases, for example, where the location of the SOI isknown approximately.

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 207:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

There are cases, for example, where the location of the SOI isknown approximately.A mathematical formulation for such a scenario is as follows;

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 208:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

There are cases, for example, where the location of the SOI isknown approximately.A mathematical formulation for such a scenario is as follows;

given the approximate steering vector s,

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 209:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

There are cases, for example, where the location of the SOI isknown approximately.A mathematical formulation for such a scenario is as follows;

given the approximate steering vector s, and a ball of uncertainty B[s, ǫ′], of radius ǫ′ around s:

s

s

ǫ′

B[s, ǫ′]

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 210:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Robustness in BeamformingTowards More Elaborated Constrained Learning

Robustness is a key design issue in beamforming.

There are cases, for example, where the location of the SOI isknown approximately.A mathematical formulation for such a scenario is as follows;

given the approximate steering vector s, and a ball of uncertainty B[s, ǫ′], of radius ǫ′ around s:

s

s

ǫ′

B[s, ǫ′]

calculate those θ such that, for some user-defined ǫ′′ ≥ 0,

θts ∈ [1− ǫ′′, 1 + ǫ′′], ∀s ∈ B[s, ǫ′].

Sergios Theodoridis A Framework for Online Learning Vienna 59 / 97

Page 211:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Icecream Cone

The previous task breaks down to a number of more fundamentalproblems of the following type; find a vector that belongs to

Γ :=θ ∈ R

m : θts ≥ γ,∀s ∈ B[s, ǫ′]=

all vectors that satisfy aninfinite number of inequalities

.

If Γ 6= ∅, then the previous problem is equivalent to8

finding a point in K ∩Π,K: an icecream cone,Π: a hyperplane.

Π

K

R

π

K ∩ Π

Rm

Γ

8[Slavakis, Yamada’ 07], [Slavakis, Theodoridis, Yamada ’09].Sergios Theodoridis A Framework for Online Learning Vienna 60 / 97

Page 212:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Complete Picture

Given (xn, yn), find a θ ∈ Rm such that

∣∣θtxn − yn

∣∣ ≤ ǫ,

Sergios Theodoridis A Framework for Online Learning Vienna 61 / 97

Page 213:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Complete Picture

Given (xn, yn), find a θ ∈ Rm such that

∣∣θtxn − yn

∣∣ ≤ ǫ,

θts ≥ γ, ∀s ∈ B[s, ǫ′], (Robustness).

Sergios Theodoridis A Framework for Online Learning Vienna 61 / 97

Page 214:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for Robust Regression

Assume weights ω(n)j ≥ 0 such that

∑nj=n−q+1 ω

(n)j = 1. For any

θ0 ∈ Rm,

θn+1 := PΠPK

θn + µn

n∑

j=n−q+1

ω(n)j PSj [ǫ](θn)− θn

, ∀n ≥ 0,

where the extrapolation coefficient µn ∈ (0, 2Mn) with

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(θn)−θn‖2

‖∑

j=n−q+1 ω(n)j PSj [ǫ]

(θn)−θn‖2, if

j=n−q+1 ω(n)j PSj [ǫ](θn) 6= θn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 62 / 97

Page 215:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 216:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 217:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 218:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θ′

n

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 219:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θ′

n

K

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 220:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θ′

n

K

PK(θ′n)

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 221:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θ′

n

K

PK(θ′n)

Π

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 222:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometry of the Algorithm

θn

Sn−1

Sn

PSn[ǫ](θn)

PSn−1[ǫ](θn)

θ′

n

K

PK(θ′n)

Π

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 63 / 97

Page 223:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 224:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Each piece of a-priori info was represented by a closed convex set, e.g., K, Π.

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 225:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Each piece of a-priori info was represented by a closed convex set, e.g., K, Π.

In order to be in agreement with all of the pieces of the a-priori info, we lookedfor a point into the intersection of the closed convex sets, e.g., K ∩ Π.

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 226:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Each piece of a-priori info was represented by a closed convex set, e.g., K, Π.

In order to be in agreement with all of the pieces of the a-priori info, we lookedfor a point into the intersection of the closed convex sets, e.g., K ∩ Π.

In algorithmic terms, we utilized the composition mapping PΠPK : Rm → Rm.

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 227:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Each piece of a-priori info was represented by a closed convex set, e.g., K, Π.

In order to be in agreement with all of the pieces of the a-priori info, we lookedfor a point into the intersection of the closed convex sets, e.g., K ∩ Π.

In algorithmic terms, we utilized the composition mapping PΠPK : Rm → Rm.

This strategy reminds us of POCS:

POCSGiven a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Then,

∀θ ∈ Rm,(PCp · · ·PC1

)n(θ)

w−−−−→n→∞

∃θ∗ ∈

p⋂

i=1

Ci.

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 228:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Handling A-Priori Information

How did we handle a-priori information?

Each piece of a-priori info was represented by a closed convex set, e.g., K, Π.

In order to be in agreement with all of the pieces of the a-priori info, we lookedfor a point into the intersection of the closed convex sets, e.g., K ∩ Π.

In algorithmic terms, we utilized the composition mapping PΠPK : Rm → Rm.

This strategy reminds us of POCS:

POCSGiven a finite number of closed convex sets C1, . . . , Cp, with

⋂p

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCp . Then,

∀θ ∈ Rm,(PCp · · ·PC1

)n(θ)

w−−−−→n→∞

∃θ∗ ∈

p⋂

i=1

Ci.

Key assumptionThe a-priori info is consistent, i.e.,

⋂p

i=1 Ci 6= ∅.

Sergios Theodoridis A Framework for Online Learning Vienna 64 / 97

Page 229:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Inconsistent A-Priori Information

Is the case of inconsistent a-priori info possible in practice?

Sergios Theodoridis A Framework for Online Learning Vienna 65 / 97

Page 230:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Inconsistent A-Priori Information

Is the case of inconsistent a-priori info possible in practice?

Yes, in highly constrained learning tasks; the more constraints weadd to the problem, the smaller the intersection of the associatedconvex sets becomes.

Sergios Theodoridis A Framework for Online Learning Vienna 65 / 97

Page 231:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Inconsistent A-Priori Information

Is the case of inconsistent a-priori info possible in practice?

Yes, in highly constrained learning tasks; the more constraints weadd to the problem, the smaller the intersection of the associatedconvex sets becomes.

ExampleA beamforming problem where there is erroneous info on SOI andjammer steering vectors, array calibration errors, info on inoperativearray elements, and stringent bounds on the weights of the array.

Sergios Theodoridis A Framework for Online Learning Vienna 65 / 97

Page 232:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Inconsistent A-Priori Information

Is the case of inconsistent a-priori info possible in practice?

Yes, in highly constrained learning tasks; the more constraints weadd to the problem, the smaller the intersection of the associatedconvex sets becomes.

ExampleA beamforming problem where there is erroneous info on SOI andjammer steering vectors, array calibration errors, info on inoperativearray elements, and stringent bounds on the weights of the array.

How do we deal with the case of inconsistent a-priori info, i.e.,

p⋂

i=1

Ci = ∅?

Sergios Theodoridis A Framework for Online Learning Vienna 65 / 97

Page 233:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 234:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

K

Cp−1

C2C1

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 235:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

K

Cp−1

C2C1

θ

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 236:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

K

Cp−1

C2C1

θ

d(θ, C1)d(θ, C2)

d(θ, Cp−1)

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 237:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

K

Cp−1

C2C1

θ

d(θ, C1)d(θ, C2)

d(θ, Cp−1)

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 238:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

An Intuitive Suggestion

K

Cp−1

C2C1

θ

d(θ, C1)d(θ, C2)

d(θ, Cp−1)

Definition (KΦ)All those points of K which minimize a function Φ of the distancesd(·, Ci)

p−1i=1 .

Sergios Theodoridis A Framework for Online Learning Vienna 66 / 97

Page 239:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 240:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 241:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Assign to each Ci a convex weight βi, i.e., βi ∈ (0, 1] and∑p−1

i=1 βi = 1.

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 242:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Assign to each Ci a convex weight βi, i.e., βi ∈ (0, 1] and∑p−1

i=1 βi = 1.

Define the function:

Φ(θ) :=1

2

p−1∑

i=1

βid2(θ, Ci), ∀θ ∈ K.

Our objective is to look for the minimizers KΦ of this function.

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 243:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Assign to each Ci a convex weight βi, i.e., βi ∈ (0, 1] and∑p−1

i=1 βi = 1.

Define the function:

Φ(θ) :=1

2

p−1∑

i=1

βid2(θ, Ci), ∀θ ∈ K.

Our objective is to look for the minimizers KΦ of this function.

Notice that Φ′ = I −∑p−1

i=1 βiPCi.

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 244:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Assign to each Ci a convex weight βi, i.e., βi ∈ (0, 1] and∑p−1

i=1 βi = 1.

Define the function:

Φ(θ) :=1

2

p−1∑

i=1

βid2(θ, Ci), ∀θ ∈ K.

Our objective is to look for the minimizers KΦ of this function.

Notice that Φ′ = I −∑p−1

i=1 βiPCi.

Define the mapping T : Rm → Rm as

T := PK

(

I − λ(

I −

p−1∑

i=1

βiPCi

))

, λ ∈ (0, 2).

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 245:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

A Way to Handle Inconsistent A-Priori Information

Given a number of a-priori constraints, represented as p closed convex sets,

Identify one of them as the absolute constraint K, and rename the other ones asC1, C2, . . . , Cp−1.

Assign to each Ci a convex weight βi, i.e., βi ∈ (0, 1] and∑p−1

i=1 βi = 1.

Define the function:

Φ(θ) :=1

2

p−1∑

i=1

βid2(θ, Ci), ∀θ ∈ K.

Our objective is to look for the minimizers KΦ of this function.

Notice that Φ′ = I −∑p−1

i=1 βiPCi.

Define the mapping T : Rm → Rm as

T := PK

(

I − λ(

I −

p−1∑

i=1

βiPCi

))

, λ ∈ (0, 2).

Then, Fix(T ) = KΦ.

Sergios Theodoridis A Framework for Online Learning Vienna 67 / 97

Page 246:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Algorithm

For any θ0 ∈ Rm,

θn+1 := T

θn + µn

n∑

j=n−q+1

ω(n)j PSj [ǫ](θn)− θn

, ∀n ≥ 0,

where the extrapolation coefficient µn ∈ (0, 2Mn) with

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(θn)−θn‖2

‖∑

j=n−q+1 ω(n)j PSj [ǫ]

(θn)−θn‖2, if

j=n−q+1 ω(n)j PSj [ǫ](θn) 6= θn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 68 / 97

Page 247:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsity-Aware Learning

Problem definition

Sergios Theodoridis A Framework for Online Learning Vienna 69 / 97

Page 248:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsity-Aware Learning

Problem definitionIn a number of applications, many of the parameters to be estimated are a-prioriknown to be zero. That is, the parameter vector, θ, is sparse.

θt = [∗, ∗, 0, 0, 0, ∗, 0, . . .].

Sergios Theodoridis A Framework for Online Learning Vienna 69 / 97

Page 249:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsity-Aware Learning

Problem definitionIn a number of applications, many of the parameters to be estimated are a-prioriknown to be zero. That is, the parameter vector, θ, is sparse.

θt = [∗, ∗, 0, 0, 0, ∗, 0, . . .].

If the locations of the zeros were known, the problem would be trivial.

However, the locations of the zeros are not known a-priori. This makes the taskchallenging.

Sergios Theodoridis A Framework for Online Learning Vienna 69 / 97

Page 250:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsity-Aware Learning

Problem definitionIn a number of applications, many of the parameters to be estimated are a-prioriknown to be zero. That is, the parameter vector, θ, is sparse.

θt = [∗, ∗, 0, 0, 0, ∗, 0, . . .].

If the locations of the zeros were known, the problem would be trivial.

However, the locations of the zeros are not known a-priori. This makes the taskchallenging.

Typical applications include echo cancellation in Internet telephony, MIMOchannel estimation, Compressed Sensing (CS), etc.

Sergios Theodoridis A Framework for Online Learning Vienna 69 / 97

Page 251:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Sparsity-Aware Learning

Problem definitionIn a number of applications, many of the parameters to be estimated are a-prioriknown to be zero. That is, the parameter vector, θ, is sparse.

θt = [∗, ∗, 0, 0, 0, ∗, 0, . . .].

If the locations of the zeros were known, the problem would be trivial.

However, the locations of the zeros are not known a-priori. This makes the taskchallenging.

Typical applications include echo cancellation in Internet telephony, MIMOchannel estimation, Compressed Sensing (CS), etc.

Sparsity promotion is achieved via ℓ1-norm regularization of a loss function:

minθ∈Rm

N∑

n=0

L(yn,x

tnθ)+ λ ‖θ‖1 , λ > 0.

Sergios Theodoridis A Framework for Online Learning Vienna 69 / 97

Page 252:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Measuring Sparsity

The ℓ0 norm

‖θ‖0 := cardi : θi 6= 0

.

θ6

θ9

θ12

θ15

Sergios Theodoridis A Framework for Online Learning Vienna 70 / 97

Page 253:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Measuring Sparsity

The ℓ0 norm

‖θ‖0 := cardi : θi 6= 0

.

θ6

θ9

θ12

θ15

Consider the linear model:

yn := xtnθ + vn, ∀n,

where (vn)n≥0 denotes the noise process.

Sergios Theodoridis A Framework for Online Learning Vienna 70 / 97

Page 254:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Measuring Sparsity

The ℓ0 norm

‖θ‖0 := cardi : θi 6= 0

.

θ6

θ9

θ12

θ15

Consider the linear model:

yn := xtnθ + vn, ∀n,

where (vn)n≥0 denotes the noise process.Define XN := [x0,x1, . . . ,xN ], yN := [y0, y1, . . . , yN ]t, and ǫ ≥ 0.

Sergios Theodoridis A Framework for Online Learning Vienna 70 / 97

Page 255:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Measuring Sparsity

The ℓ0 norm

‖θ‖0 := cardi : θi 6= 0

.

θ6

θ9

θ12

θ15

Consider the linear model:

yn := xtnθ + vn, ∀n,

where (vn)n≥0 denotes the noise process.Define XN := [x0,x1, . . . ,xN ], yN := [y0, y1, . . . , yN ]t, and ǫ ≥ 0.A typical Compressed Sensing task is formulated as follows:

minθ∈Rm

‖θ‖0

s.t.∥∥Xt

Nθ − yN

∥∥ ≤ ǫ.

Sergios Theodoridis A Framework for Online Learning Vienna 70 / 97

Page 256:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Alternatives to the ℓ0 Norm

The ℓp norm (0 < p ≤ 1)

‖θ‖p :=

(m∑

i=1

|θi|p

) 1p

.

−2 −1 0 1 2−2

−1

0

1

2

p = 1

p = 2

θ1

θ2

p =∞

p = 0.5

p = 0

Sergios Theodoridis A Framework for Online Learning Vienna 71 / 97

Page 257:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for Sparsity-Aware Learning

The ℓ1-ball case

Sergios Theodoridis A Framework for Online Learning Vienna 72 / 97

Page 258:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for Sparsity-Aware Learning

The ℓ1-ball caseGiven (xn, yn), n = 0, 1, 2, . . ., find θ such that

∣∣θtxn − yn

∣∣ ≤ ǫ, n = 0, 1, 2, . . .

θ ∈ Bℓ1 [δ] :=θ′ ∈ R

m :∥∥θ′∥∥1≤ δ.

Sergios Theodoridis A Framework for Online Learning Vienna 72 / 97

Page 259:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Algorithm for Sparsity-Aware Learning

The ℓ1-ball caseGiven (xn, yn), n = 0, 1, 2, . . ., find θ such that

∣∣θtxn − yn

∣∣ ≤ ǫ, n = 0, 1, 2, . . .

θ ∈ Bℓ1 [δ] :=θ′ ∈ R

m :∥∥θ′∥∥1≤ δ.

The recursion:

θn+1 := PBℓ1[δ]

θn + µn

n∑

j=n−q+1

ω(n)j PSj [ǫ](θn)− θn

.

Sergios Theodoridis A Framework for Online Learning Vienna 72 / 97

Page 260:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 261:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 262:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 263:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 264:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Bℓ1[δ]

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 265:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn[ǫ](θn)

PSn−1[ǫ](θn)

Bℓ1[δ]

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 73 / 97

Page 266:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Remark: The convergence can be significantly speeded up, if in place of the ℓ1-ball aweighted ℓ1-ball is used to constrain the solutions.

Sergios Theodoridis A Framework for Online Learning Vienna 74 / 97

Page 267:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Remark: The convergence can be significantly speeded up, if in place of the ℓ1-ball aweighted ℓ1-ball is used to constrain the solutions.

Definition:

‖θ‖1,w :=

m∑

i=1

wi|θi|,

Bℓ1 [wn, δ] :=θ ∈ R

m : ‖θ‖1,w ≤ δ.

Sergios Theodoridis A Framework for Online Learning Vienna 74 / 97

Page 268:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Remark: The convergence can be significantly speeded up, if in place of the ℓ1-ball aweighted ℓ1-ball is used to constrain the solutions.

Definition:

‖θ‖1,w :=

m∑

i=1

wi|θi|,

Bℓ1 [wn, δ] :=θ ∈ R

m : ‖θ‖1,w ≤ δ.

Time-adaptive weighted norm:

wn,i :=1

|θn,i|+ ǫ′n.

Sergios Theodoridis A Framework for Online Learning Vienna 74 / 97

Page 269:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Remark: The convergence can be significantly speeded up, if in place of the ℓ1-ball aweighted ℓ1-ball is used to constrain the solutions.

Definition:

‖θ‖1,w :=

m∑

i=1

wi|θi|,

Bℓ1 [wn, δ] :=θ ∈ R

m : ‖θ‖1,w ≤ δ.

Time-adaptive weighted norm:

wn,i :=1

|θn,i|+ ǫ′n.

The recursion9:

θn+1 := PBℓ1[wn,δ]

(

θn + µn

(n∑

j=n−q+1

ω(n)j PSj [ǫ](θn)− θn

))

.

9[Kopsinis, Slavakis, Theodoridis, ’11].Sergios Theodoridis A Framework for Online Learning Vienna 74 / 97

Page 270:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 271:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 272:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn−1[ǫ](θn)

PSn[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 273:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn−1[ǫ](θn)

PSn[ǫ](θn)

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 274:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn−1[ǫ](θn)

PSn[ǫ](θn)

Bℓ1[wn, δ]

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 275:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn−1[ǫ](θn)

PSn[ǫ](θn)

Bℓ1[wn, δ]

θn+1

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 276:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Geometric Illustration of the Algorithm

0

θn

θ∗

Sn[ǫ]

Sn−1[ǫ]

PSn−1[ǫ](θn)

PSn[ǫ](θn)

Bℓ1[wn, δ]

θn+1

Projecting onto Bℓ1 [wn, δ] is equivalent to a specific soft thresholdingoperation.

Sergios Theodoridis A Framework for Online Learning Vienna 75 / 97

Page 277:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Note that our constraint, i.e., the weighted ℓ1-ball is a time-varyingconstraint.

Sergios Theodoridis A Framework for Online Learning Vienna 76 / 97

Page 278:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Note that our constraint, i.e., the weighted ℓ1-ball is a time-varyingconstraint.

θ∗

0

Bℓ1[δ]

Sergios Theodoridis A Framework for Online Learning Vienna 76 / 97

Page 279:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Note that our constraint, i.e., the weighted ℓ1-ball is a time-varyingconstraint.

θ∗

0

Bℓ1[δ]

Bℓ1[wn, δ]

Sergios Theodoridis A Framework for Online Learning Vienna 76 / 97

Page 280:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Note that our constraint, i.e., the weighted ℓ1-ball is a time-varyingconstraint.

θ∗

0

Bℓ1[δ]

Bℓ1[wn, δ]

Bℓ1[wn+1, δ]

Sergios Theodoridis A Framework for Online Learning Vienna 76 / 97

Page 281:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Time Invariant Signal

-TWL

-TNWL

m := 1024, ‖θ∗‖0 := 100 wavelet coefficients. The radius of the ℓ1-ball is set to δ := 101.

Sergios Theodoridis A Framework for Online Learning Vienna 77 / 97

Page 282:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Time Varying Signal

0 1000 2000 3000 4000−12

−10

−8

−6

−4

−2

0

2

i, (Wavelet component)

log 1

0(|

αi|)

0 500 1000 1500 2000 2500 3000 3500 4000−3

−2

−1

0

1

2

3

Signal Samples

m := 4096. The radius of the ℓ1-ball is set to δ := 40.The sum of two chirp signals.

Sergios Theodoridis A Framework for Online Learning Vienna 78 / 97

Page 283:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Time Varying Signal

-TWL

Movies of the OCCD, and the APWL1sub.

Sergios Theodoridis A Framework for Online Learning Vienna 79 / 97

Page 284:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 285:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Hard thresholding

Identify the K largest, in magnitude, components of a vector θ.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 286:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Hard thresholding

Identify the K largest, in magnitude, components of a vector θ.

Keep those as they are, while nullify the rest of them.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 287:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Hard thresholding

Identify the K largest, in magnitude, components of a vector θ.

Keep those as they are, while nullify the rest of them.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 288:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Hard thresholding

Identify the K largest, in magnitude, components of a vector θ.

Keep those as they are, while nullify the rest of them.

Generalized thresholding

Identify the K largest, in magnitude, components of a vector θ.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 289:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalized Thresholding

What we have seen so far corresponds to soft thresholding operations.

Hard thresholding

Identify the K largest, in magnitude, components of a vector θ.

Keep those as they are, while nullify the rest of them.

Generalized thresholding

Identify the K largest, in magnitude, components of a vector θ.

Shrink, under some rule, the rest of the components.

Sergios Theodoridis A Framework for Online Learning Vienna 80 / 97

Page 290:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Examples of Generalized Thresholding Mappings

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

λ2

θ

Soft(θ)

Hard(θ)

−λ2 Ridge(θ)

Hard(θ)

(a) Hard, soft thresholding, and the ridgeregression estimate.

−5 0 5−5

0

5

θ

θSCAD

θgarr

λSCAD

λgarr

(b) The SCAD and garrote thresholding.

Sergios Theodoridis A Framework for Online Learning Vienna 81 / 97

Page 291:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Mathematical Formulation of Thresholding

Penalized Least-Squares Thresholding

Identify the K largest, in magnitude, components of a vector θ.

Sergios Theodoridis A Framework for Online Learning Vienna 82 / 97

Page 292:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Mathematical Formulation of Thresholding

Penalized Least-Squares Thresholding

Identify the K largest, in magnitude, components of a vector θ.

Let θi be one of the rest of the components.

Sergios Theodoridis A Framework for Online Learning Vienna 82 / 97

Page 293:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Mathematical Formulation of Thresholding

Penalized Least-Squares Thresholding

Identify the K largest, in magnitude, components of a vector θ.

Let θi be one of the rest of the components.

In order to shrink θi, solve the optimization task:

minθi∈R

1

2

(θi − θi

)2+ λp

(∣∣θi∣∣), λ > 0,

where p(·) stands for a user-defined penalty function, which might benon-convex.

Sergios Theodoridis A Framework for Online Learning Vienna 82 / 97

Page 294:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Mathematical Formulation of Thresholding

Penalized Least-Squares Thresholding

Identify the K largest, in magnitude, components of a vector θ.

Let θi be one of the rest of the components.

In order to shrink θi, solve the optimization task:

minθi∈R

1

2

(θi − θi

)2+ λp

(∣∣θi∣∣), λ > 0,

where p(·) stands for a user-defined penalty function, which might benon-convex.

Under some mild conditions, the previous optimization task possesses a uniquesolution θi∗.

Sergios Theodoridis A Framework for Online Learning Vienna 82 / 97

Page 295:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Mathematical Formulation of Thresholding

Penalized Least-Squares Thresholding

Identify the K largest, in magnitude, components of a vector θ.

Let θi be one of the rest of the components.

In order to shrink θi, solve the optimization task:

minθi∈R

1

2

(θi − θi

)2+ λp

(∣∣θi∣∣), λ > 0,

where p(·) stands for a user-defined penalty function, which might benon-convex.

Under some mild conditions, the previous optimization task possesses a uniquesolution θi∗.

Definition (Generalized Thresholding Mapping)The Generalized Thresholding mapping is defined as follows:

TGT : θi 7→ θi∗.

Sergios Theodoridis A Framework for Online Learning Vienna 82 / 97

Page 296:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Fixed Point Set of TGT

Given K, define the set of all tuples of length K:

T := (i1, i2, . . . , iK) : 1 ≤ i1 < i2 < . . . < iK ≤ m.

Sergios Theodoridis A Framework for Online Learning Vienna 83 / 97

Page 297:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Fixed Point Set of TGT

Given K, define the set of all tuples of length K:

T := (i1, i2, . . . , iK) : 1 ≤ i1 < i2 < . . . < iK ≤ m.

Given a tuple J ∈ T , define the subspace:

MJ :=θ ∈ R

m : θi = 0,∀i /∈ J.

Sergios Theodoridis A Framework for Online Learning Vienna 83 / 97

Page 298:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Fixed Point Set of TGT

Given K, define the set of all tuples of length K:

T := (i1, i2, . . . , iK) : 1 ≤ i1 < i2 < . . . < iK ≤ m.

Given a tuple J ∈ T , define the subspace:

MJ :=θ ∈ R

m : θi = 0,∀i /∈ J.

Then, the fixed point set of TGT is a union of subspaces:

Fix(TGT

)=⋃

J∈T

MJ , (non-convex set).

Sergios Theodoridis A Framework for Online Learning Vienna 83 / 97

Page 299:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Fixed Point Set of TGT

Given K, define the set of all tuples of length K:

T := (i1, i2, . . . , iK) : 1 ≤ i1 < i2 < . . . < iK ≤ m.

Given a tuple J ∈ T , define the subspace:

MJ :=θ ∈ R

m : θi = 0,∀i /∈ J.

Then, the fixed point set of TGT is a union of subspaces:

Fix(TGT

)=⋃

J∈T

MJ , (non-convex set).

Example

For the 3-dimensional case R3, and if K := 2,

Fix(TGT

)=

xy-plane ∪ yz-plane∪ xz-plane. y

z

x

Sergios Theodoridis A Framework for Online Learning Vienna 83 / 97

Page 300:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

First Steps Towards a Unifying Framework

Definition (Nonexpansive Mapping)A mapping T : H → H is called nonexpansive if

‖T (f1)− T (f2)‖ ≤ ‖f1 − f2‖ , ∀f1, f2 ∈ H.

The fixed point set of a nonexpansive mapping is closed and convex.

Sergios Theodoridis A Framework for Online Learning Vienna 84 / 97

Page 301:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

First Steps Towards a Unifying Framework

Definition (Nonexpansive Mapping)A mapping T : H → H is called nonexpansive if

‖T (f1)− T (f2)‖ ≤ ‖f1 − f2‖ , ∀f1, f2 ∈ H.

The fixed point set of a nonexpansive mapping is closed and convex.

Example (Projection Mapping)

f1

f2

H

C

‖f1 − f2‖

Sergios Theodoridis A Framework for Online Learning Vienna 84 / 97

Page 302:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

First Steps Towards a Unifying Framework

Definition (Nonexpansive Mapping)A mapping T : H → H is called nonexpansive if

‖T (f1)− T (f2)‖ ≤ ‖f1 − f2‖ , ∀f1, f2 ∈ H.

The fixed point set of a nonexpansive mapping is closed and convex.

Example (Projection Mapping)

f1

f2

H

C

‖f1 − f2‖

PC(f2)

PC(f1)

Sergios Theodoridis A Framework for Online Learning Vienna 84 / 97

Page 303:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

First Steps Towards a Unifying Framework

Definition (Nonexpansive Mapping)A mapping T : H → H is called nonexpansive if

‖T (f1)− T (f2)‖ ≤ ‖f1 − f2‖ , ∀f1, f2 ∈ H.

The fixed point set of a nonexpansive mapping is closed and convex.

Example (Projection Mapping)

f1

f2

H

C

‖f1 − f2‖

PC(f2)

PC(f1)

‖PC(f1)− PC(f2)‖

Fix(PC) = C.

Sergios Theodoridis A Framework for Online Learning Vienna 84 / 97

Page 304:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Quasi-nonexpansive Mapping

Definition (Quasi-nonexpansive Mapping)

A mapping T : H → H, with Fix(T ) 6= ∅, is called quasi-nonexpansive,if

‖T (f)− h‖ ≤ ‖f − h‖ , ∀f ∈ H,∀h ∈ Fix(T ).

The fixed point set of T is convex.

f

Fix(T )

H

Sergios Theodoridis A Framework for Online Learning Vienna 85 / 97

Page 305:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Quasi-nonexpansive Mapping

Definition (Quasi-nonexpansive Mapping)

A mapping T : H → H, with Fix(T ) 6= ∅, is called quasi-nonexpansive,if

‖T (f)− h‖ ≤ ‖f − h‖ , ∀f ∈ H,∀h ∈ Fix(T ).

The fixed point set of T is convex.

f

Fix(T )

H

Sergios Theodoridis A Framework for Online Learning Vienna 85 / 97

Page 306:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Quasi-nonexpansive Mapping

Definition (Quasi-nonexpansive Mapping)

A mapping T : H → H, with Fix(T ) 6= ∅, is called quasi-nonexpansive,if

‖T (f)− h‖ ≤ ‖f − h‖ , ∀f ∈ H,∀h ∈ Fix(T ).

The fixed point set of T is convex.

f

Fix(T )

H

T (f )

Sergios Theodoridis A Framework for Online Learning Vienna 85 / 97

Page 307:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Quasi-nonexpansive Mapping

Definition (Quasi-nonexpansive Mapping)

A mapping T : H → H, with Fix(T ) 6= ∅, is called quasi-nonexpansive,if

‖T (f)− h‖ ≤ ‖f − h‖ , ∀f ∈ H,∀h ∈ Fix(T ).

The fixed point set of T is convex.

f

Fix(T )

H

T (f )

h

‖f − h‖

Sergios Theodoridis A Framework for Online Learning Vienna 85 / 97

Page 308:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Quasi-nonexpansive Mapping

Definition (Quasi-nonexpansive Mapping)

A mapping T : H → H, with Fix(T ) 6= ∅, is called quasi-nonexpansive,if

‖T (f)− h‖ ≤ ‖f − h‖ , ∀f ∈ H,∀h ∈ Fix(T ).

The fixed point set of T is convex.

f

Fix(T )

H

T (f )

h

‖f − h‖‖T (f )− h‖

Sergios Theodoridis A Framework for Online Learning Vienna 85 / 97

Page 309:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Every nonexpansive mapping is quasi-nonexpansive.

Sergios Theodoridis A Framework for Online Learning Vienna 86 / 97

Page 310:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 311:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 312:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 313:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 314:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1T (f1)

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 315:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1T (f1)

f2

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 316:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1T (f1)

f2

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 317:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1T (f1)

f2

T (f2)

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 318:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Projecting onto arbitrary separating hyperplanes generates aquasi-nonexpansive mapping which is not nonexpansive.

H

Fix(T )

f1T (f1)

f2

T (f2)‖T (f1)− T (f2)‖

‖f1 − f2‖

Sergios Theodoridis A Framework for Online Learning Vienna 87 / 97

Page 319:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient

Definition (Subgradient)Given a convex function Θ : H → R, the subgradient, Θ′(f), is an element of H suchthat

〈g − f ,Θ′(f)〉+Θ(f) ≤ Θ(g), ∀g ∈ H.

In other words, the hyperplane(

g, 〈g − f ,Θ′(f)〉+Θ(f)): g ∈ H

, supports the

graph of Θ at the point(f,Θ(f)

).

Definition (Level set)

lev≤0(Θ) := f ∈ H : Θ(f) ≤ 0.

Θ(f )

0

Sergios Theodoridis A Framework for Online Learning Vienna 88 / 97

Page 320:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient

Definition (Subgradient)Given a convex function Θ : H → R, the subgradient, Θ′(f), is an element of H suchthat

〈g − f ,Θ′(f)〉+Θ(f) ≤ Θ(g), ∀g ∈ H.

In other words, the hyperplane(

g, 〈g − f ,Θ′(f)〉+Θ(f)): g ∈ H

, supports the

graph of Θ at the point(f,Θ(f)

).

Definition (Level set)

lev≤0(Θ) := f ∈ H : Θ(f) ≤ 0.

Θ(f )

0

hyperplane

Supporting

Sergios Theodoridis A Framework for Online Learning Vienna 88 / 97

Page 321:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient

Definition (Subgradient)Given a convex function Θ : H → R, the subgradient, Θ′(f), is an element of H suchthat

〈g − f ,Θ′(f)〉+Θ(f) ≤ Θ(g), ∀g ∈ H.

In other words, the hyperplane(

g, 〈g − f ,Θ′(f)〉+Θ(f)): g ∈ H

, supports the

graph of Θ at the point(f,Θ(f)

).

Definition (Level set)

lev≤0(Θ) := f ∈ H : Θ(f) ≤ 0.

Θ(f )

0

hyperplane

Supporting

Sergios Theodoridis A Framework for Online Learning Vienna 88 / 97

Page 322:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient

Definition (Subgradient)Given a convex function Θ : H → R, the subgradient, Θ′(f), is an element of H suchthat

〈g − f ,Θ′(f)〉+Θ(f) ≤ Θ(g), ∀g ∈ H.

In other words, the hyperplane(

g, 〈g − f ,Θ′(f)〉+Θ(f)): g ∈ H

, supports the

graph of Θ at the point(f,Θ(f)

).

Definition (Level set)

lev≤0(Θ) := f ∈ H : Θ(f) ≤ 0.

Θ(f )

0

hyperplane

Supporting

Sergios Theodoridis A Framework for Online Learning Vienna 88 / 97

Page 323:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient

Definition (Subgradient)Given a convex function Θ : H → R, the subgradient, Θ′(f), is an element of H suchthat

〈g − f ,Θ′(f)〉+Θ(f) ≤ Θ(g), ∀g ∈ H.

In other words, the hyperplane(

g, 〈g − f ,Θ′(f)〉+Θ(f)): g ∈ H

, supports the

graph of Θ at the point(f,Θ(f)

).

Definition (Level set)

lev≤0(Θ) := f ∈ H : Θ(f) ≤ 0.

Θ(f )

0

hyperplane

Supporting

lev≤0(Θ)

Sergios Theodoridis A Framework for Online Learning Vienna 88 / 97

Page 324:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 325:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 326:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

f

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 327:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

f

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 328:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

f

Hyperplane generatedby the subgradient

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 329:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

f

Hyperplane generatedby the subgradient

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 330:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Subgradient Projection MappingA Quasi-nonexpansive mapping

Definition (Subgradient projection mapping)

Let a convex function Θ : H → R, with lev≤0(Θ) 6= ∅. Then, thesubgradient projection mapping TΘ : H → H is defined as follows:

TΘ(f) :=

f − Θ(f)

‖Θ′(f)‖2Θ′(f), if f /∈ lev≤0(Θ),

f, if f ∈ lev≤0(Θ).

The mapping TΘ is a quasi-nonexpansive one.

H

Θ

0

lev≤0Θ

f

Hyperplane generatedby the subgradient

TΘ(f )

Sergios Theodoridis A Framework for Online Learning Vienna 89 / 97

Page 331:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 332:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 333:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSM

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 334:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSM

(fn)

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 335:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSMloss function info

︷ ︸︸ ︷((1− λ)I + λTΘn

)

︸ ︷︷ ︸= relaxed subgradient

projection TΘn

(fn)

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 336:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSMa-priori info

︷ ︸︸ ︷(αRn + (1− α)I

)

︸ ︷︷ ︸=Tn: averaged

quasi-nonexpansive mapping

loss function info︷ ︸︸ ︷((1− λ)I + λTΘn

)

︸ ︷︷ ︸= relaxed subgradient

projection TΘn

(fn)

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 337:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSM

fn+1 :=

a-priori info︷ ︸︸ ︷(αRn + (1− α)I

)

︸ ︷︷ ︸=Tn: averaged

quasi-nonexpansive mapping

loss function info︷ ︸︸ ︷((1− λ)I + λTΘn

)

︸ ︷︷ ︸= relaxed subgradient

projection TΘn

(fn), ∀n,

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 338:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSM

fn+1 :=

a-priori info︷ ︸︸ ︷(αRn + (1− α)I

)

︸ ︷︷ ︸=Tn: averaged

quasi-nonexpansive mapping

loss function info︷ ︸︸ ︷((1− λ)I + λTΘn

)

︸ ︷︷ ︸= relaxed subgradient

projection TΘn

(fn), ∀n,

where

(Rn)n=0,1,... is a sequence of quasi-nonexpansive mappings. This sequence ofmappings comprises the a-priori information.

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 339:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Generalizing the Previous MethodologiesAdaptive Projected Subgradient Method (APSM)

For some user-defined

α ∈ (0, 1), λ ∈ (0, 2),

and any initial point f0 ∈ H,

APSM

fn+1 :=

a-priori info︷ ︸︸ ︷(αRn + (1− α)I

)

︸ ︷︷ ︸=Tn: averaged

quasi-nonexpansive mapping

loss function info︷ ︸︸ ︷((1− λ)I + λTΘn

)

︸ ︷︷ ︸= relaxed subgradient

projection TΘn

(fn), ∀n,

where

(Rn)n=0,1,... is a sequence of quasi-nonexpansive mappings. This sequence ofmappings comprises the a-priori information.

(Θn)n=0,1,... is a sequence of loss/penalty function which quantifies the deviationof the sequential training data from the underlying model.

Sergios Theodoridis A Framework for Online Learning Vienna 90 / 97

Page 340:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 341:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Then, the APSM becomes: ∀n,

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 342:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Then, the APSM becomes: ∀n,

fn + µn

n∑

i=n−q+1

ω(n)i PSi[ǫ](fn)− fn

,

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 343:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Then, the APSM becomes: ∀n,

(αRn + (1− α)I

)

︸ ︷︷ ︸

Tn: averagedquasi-nonexpansive mapping

fn + µn

n∑

i=n−q+1

ω(n)i PSi[ǫ](fn)− fn

,

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 344:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Then, the APSM becomes: ∀n,

fn+1 =(αRn + (1− α)I

)

︸ ︷︷ ︸

Tn: averagedquasi-nonexpansive mapping

fn + µn

n∑

i=n−q+1

ω(n)i PSi[ǫ](fn)− fn

,

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 345:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Loss Functions (Θn)n=0,1,...

Given the current estimate fn, define ∀f ∈ H,

Θn(f) :=

∑ni=n−q+1

ω(n)i d(fn,Si[ǫ])

∑nj=n−q+1 ω

(n)j d(fn,Sj [ǫ])

d(f, Si[ǫ]),

if f /∈⋂n

i=n−q+1 Si[ǫ],

0, otherwise.

Then, the APSM becomes: ∀n,

fn+1 =(αRn + (1− α)I

)

︸ ︷︷ ︸

Tn: averagedquasi-nonexpansive mapping

fn + µn

n∑

i=n−q+1

ω(n)i PSi[ǫ](fn)− fn

,

where the extrapolation coefficient µn ∈ (0, 2Mn) with

Mn :=

∑nj=n−q+1 ω

(n)j ‖PSj [ǫ]

(fn)−fn‖2

‖∑n

j=n−q+1 ω(n)j PSj [ǫ]

(fn)−fn‖2, if

∑nj=n−q+1 ω

(n)j PSj [ǫ](fn) 6= fn,

1, otherwise.

Sergios Theodoridis A Framework for Online Learning Vienna 91 / 97

Page 346:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 347:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 348:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming).

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 349:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming). The projection PBℓ1

[δ] onto the ℓ1 ball, (sparsity-aware learning).

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 350:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming). The projection PBℓ1

[δ] onto the ℓ1 ball, (sparsity-aware learning). The projections (PBℓ1

[wn,δ])n=0,1,... onto a sequence of weighted ℓ1balls, (sparsity-aware learning).

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 351:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming). The projection PBℓ1

[δ] onto the ℓ1 ball, (sparsity-aware learning). The projections (PBℓ1

[wn,δ])n=0,1,... onto a sequence of weighted ℓ1balls, (sparsity-aware learning).

The composition of projections PC1 · · ·PCp , where C1, . . . Cp areclosed convex sets with

⋂pi=1Ci 6= ∅.

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 352:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming). The projection PBℓ1

[δ] onto the ℓ1 ball, (sparsity-aware learning). The projections (PBℓ1

[wn,δ])n=0,1,... onto a sequence of weighted ℓ1balls, (sparsity-aware learning).

The composition of projections PC1 · · ·PCp , where C1, . . . Cp areclosed convex sets with

⋂pi=1Ci 6= ∅.

The composition of projections PΠPK , where Π is a hyperplane andK is an icecream cone, (beamforming).

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 353:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Candidates for the Averaged Quasi-nonexpansives Tn

Example (Examples of averaged quasi-nonexpansive mappings)The projection PC onto a closed convex set C of H.

The projection PV onto an affine set V of Rm, (beamforming). The projection PBℓ1

[δ] onto the ℓ1 ball, (sparsity-aware learning). The projections (PBℓ1

[wn,δ])n=0,1,... onto a sequence of weighted ℓ1balls, (sparsity-aware learning).

The composition of projections PC1 · · ·PCp , where C1, . . . Cp areclosed convex sets with

⋂pi=1Ci 6= ∅.

The composition of projections PΠPK , where Π is a hyperplane andK is an icecream cone, (beamforming).

The composition PK

(

I − λ(I −

∑p−1i=1 βiPCi

))

, λ ∈ (0, 2), where

K ∩(⋂p−1

i=1 Ci

)= ∅, (beamforming).

Sergios Theodoridis A Framework for Online Learning Vienna 92 / 97

Page 354:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Trip Still Goes on and the Topic Still Grows...

Surprisingly, the APSM retains its performance and theoreticalproperties in the case where the Generalized Thresholdingmapping TGT is used in the place of Tn!

Sergios Theodoridis A Framework for Online Learning Vienna 93 / 97

Page 355:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Trip Still Goes on and the Topic Still Grows...

Surprisingly, the APSM retains its performance and theoreticalproperties in the case where the Generalized Thresholdingmapping TGT is used in the place of Tn!

Recall that Fix(TGT) is a union of subspaces, which is anon-convex set.

Sergios Theodoridis A Framework for Online Learning Vienna 93 / 97

Page 356:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

The Trip Still Goes on and the Topic Still Grows...

Surprisingly, the APSM retains its performance and theoreticalproperties in the case where the Generalized Thresholdingmapping TGT is used in the place of Tn!

Recall that Fix(TGT) is a union of subspaces, which is anon-convex set.

Such an application motivates the extension of the concept of aquasi-nonexpansive mapping to that of a partiallyquasi-nonexpansive one10.

10[Kopsinis etal ’11a].Sergios Theodoridis A Framework for Online Learning Vienna 93 / 97

Page 357:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Theoretical Properties

Define at n ≥ 0, Ωn := Fix(Tn) ∩ lev≤0 Θn. Let Ω :=⋂

n≥n0Ωn 6= ∅, for some

nonnegative integer n0. Assume also that µn

Mn∈ [ǫ1, 2− ǫ2], ∀n ≥ n0, for some

sufficiently small ǫ1, ǫ2 > 0. Under the addition of some mild assumptions, thefollowing statements hold true11.

Monotone approximation. d(fn+1,Ω) ≤ d(fn,Ω), ∀n ≥ n0.

Asymptotic minimization. limn→∞ Θn(fn) = 0.

Cluster points. If we assume that the set of all sequential strong cluster pointsS((fn)n=0,1,...

)is nonempty, then

S((fn)n=0,1,...

)⊂ lim sup

n→∞Fix(Tn) ∩ lim sup

n→∞lev≤0(Θn),

where lim supn→∞ An :=⋂

r>0

⋂∞n=1

⋃∞k=n

(Ak +B[0, r]

), and B[0, r] is a

closed ball of center 0 and radius r.

Strong convergence. Assume that there exists a hyperplane Π ⊂ H such thatriΠ(Ω) 6= ∅. Then, there exists an f∗ ∈ H such that limn→∞ fn = f∗.

11[Slavakis, Yamada, ’11].Sergios Theodoridis A Framework for Online Learning Vienna 94 / 97

Page 358:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Bibliography

K. Slavakis, P. Bouboulis, and S. Theodoridis. Adaptive multiregression inreproducing kernel Hilbert spaces: the multiaccess MIMO channel case. IEEETrans. Neural Networks and Learning Systems, 23(2): 260–276, Feb. 2012,

Y. Kopsinis, K. Slavakis, S. Theodoridis, and S. McLaughlin. Generalizedthresholding sparsity-aware online learning in a union of subspaces, 2011,http://arxiv.org/abs/1112.0665 .

K. Slavakis and I. Yamada. The adaptive projected subgradient methodconstrained by families of quasi-nonexpansive mappings and its application toonline learning, 2011, http://arxiv.org/abs/1008.5231 .

S. Theodoridis, K. Slavakis, and I. Yamada. Adaptive learning in a world ofprojections: a unifying framework for linear and nonlinear classification andregression tasks. IEEE Signal Processing Magazine, 28(1):97–123, 2011.

Y. Kopsinis, K. Slavakis, and S. Theodoridis. Online sparse system identificationand signal reconstruction using projections onto weighted ℓ1 balls. IEEE Trans.Signal Processing, 59(3):936–952, 2011.

S. Chouvardas, K. Slavakis, and S. Theodoridis. Adaptive robust distributedlearning in diffusion sensor networks. IEEE Trans. Signal Processing, 59(10):4692–4707, 2011.

Sergios Theodoridis A Framework for Online Learning Vienna 95 / 97

Page 359:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Bibliography

K. Slavakis, S. Theodoridis, and I. Yamada. Adaptive constrained learning inreproducing kernel Hilbert spaces: the robust beamforming case. IEEE Trans.Signal Processing, 57(12):4744–4764, Dec. 2009.

K. Slavakis, S. Theodoridis, and I. Yamada. Online kernel-based classificationusing adaptive projection algorithms. IEEE Trans. Signal Processing, 56(7), Part1: 2781–2796, July 2008.

K. Slavakis and I. Yamada. Robust wideband beamforming by the hybridsteepest descent method. IEEE Trans. Signal Processing, 55(9): 4511–4522,Sept. 2007.

K. Slavakis, I. Yamada, and N. Ogura. The adaptive projected subgradientmethod over the fixed point set of strongly attracting nonexpansive mappings.Numerical Functional Analysis and Optimization, 27(7&8):905–930, 2006.

I. Yamada and N. Ogura. Adaptive projected subgradient method for asymptoticminimization of sequence of nonnegative convex functions. NumericalFunctional Analysis and Optimization, 25(7&8):593–617, 2004.

I. Yamada, K. Slavakis, and K. Yamada. An efficient robust adaptive filteringalgorithm based on parallel subgradient projection techniques. IEEE Trans.Signal Processing, 50(5): 1091–1101, May 2002.

Sergios Theodoridis A Framework for Online Learning Vienna 96 / 97

Page 360:  · 2012-05-02 · The More Classical Approach Select a loss function L(·,·) and estimate f(·) so that f(·) ∈argminf α(·): α∈A XN n=1 L yn,fα(xn). Drawbacks Most often,

Matlab codehttp://users.uop.gr/ ˜ slavakis/publications.htm

Sergios Theodoridis A Framework for Online Learning Vienna 97 / 97