30
Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh TU Clausthal-Zellerfeld Institute of Computing Science Barbara Hammer

The Dynamics of Learning Vector Quantization

Embed Size (px)

DESCRIPTION

The Dynamics of Learning Vector Quantization. Barbara Hammer. Michael Biehl, Anarta Ghosh. TU Clausthal-Zellerfeld Institute of Computing Science. Rijksuniversiteit Groningen Mathematics and Computing Science. The dynamics of learning. a model situation: randomized data - PowerPoint PPT Presentation

Citation preview

Page 1: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

The Dynamics of Learning Vector Quantization

Rijksuniversiteit Groningen

Mathematics and Computing Science

Michael Biehl Anarta GhoshTU Clausthal-Zellerfeld

Institute of Computing Science

Barbara Hammer

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)Learning Vector Quantization (LVQ)

Introduction

The dynamics of learning

a model situation randomized datalearning algorithms for VQ und LVQanalysis and comparison dynamics success of learning

Summary

Outlook

prototype-based learning from example datarepresentation classification

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectors

example

identification and grouping

in clusters of similar data

assignment of feature vector to the closest prototype w

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 2: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)Learning Vector Quantization (LVQ)

Introduction

The dynamics of learning

a model situation randomized datalearning algorithms for VQ und LVQanalysis and comparison dynamics success of learning

Summary

Outlook

prototype-based learning from example datarepresentation classification

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectors

example

identification and grouping

in clusters of similar data

assignment of feature vector to the closest prototype w

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 3: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectors

example

identification and grouping

in clusters of similar data

assignment of feature vector to the closest prototype w

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 4: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 5: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 6: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)

aim

classification of data

learning from examples

Learning choice of prototypes according to example data

example situtation

3 classes

classification

assignment of a vector to the class of the closest

prototype w

3 prototypes

aim generalization ability ie correct

classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 7: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors (for different classes)

bull identify the closest correct and the closest wrong prototype

bull move the corresponding winner towards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristically motivated variations of competitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 8: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 9: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 10: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustration microscopic images of (pig) semen cells after freezing and storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 11: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 12: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding - develop efficient LVQ schemes - test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 13: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

2σN2

-2

1exp

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation ℓ ℓ

jj Bσσξ

22222 Nξ1ξξN

1σσ

j

jjj ξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 14: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Ninfin)

400 examples ξμ isinℝN N=200 ℓ=1 p+=06μ

B

(240)(160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

(240)(160)

projections in two independent random directions w12

μ 11x ξw

model for studying typical behavior of LVQ algorithmsnot density-estimation based classificationNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 15: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of on-line training

sequence of independent random data 123μμ ξ acc to μP ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervised Vector Quantization dd f μs

μss

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo σS fs)(1)(1 classcorrect

classwrong

here two prototypes no explicit competition

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww

21

μs

μμsd

1σS

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 16: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Ν1Οffη QxfηQxfη

1N

QQ

Ryfη1N

RR

ts1-μ

stμst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

2

1-μs

μμs-

μss

1-μs

μs σSddf

N

ηwξww recursions

mathematical analysis of the learning dynamics

1221 -μss

μs

μμs

μμs Q2xd ξwξ

μμμ1-μs

μs ξByx ξwprojections

distances

random vector ξμ enters only in the form of

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 17: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x

Bww j

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N

random vector acc to σ)|P( μ ξμμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσtσsσt s Q xx- xx sσσsσ s R yx- yx

yy- yy σσσ

else

σ ifsσσ y

0

S

2 average over the current example

averaged recursions closed in Rsσ Qst p σ1σ

σ

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 18: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging properties

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions coupled ordinary differential equations

evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 19: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

ddpddp gε

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 20: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classification with minimal generalization error

B-

B+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p ξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weightsℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 21: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

(analytical)integrationfor ws(0) = 0

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

p = (1+m ) 2 (mgt0)

[Seo Obermeyer] LVQ21 harr cost function

(likelihood ratios)

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 =05 averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 22: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

strategies

- selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 23: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquo

numericalintegrationfor ws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 =12

averaged over 100 indep runs

Q++

Q--

Q+-

α

w+

w-

ℓ B+

ℓ B-

trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary____ asymptotic position

RS

+

RS-

R--

R-+

R--

R++

winner ws 1

I) LVQ 1 [Kohonen] 1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class membership

w-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 24: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

learning curve

εg =12

(p+=02 ℓ=12)

εg (αinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

η0 - variable rate η(α)

- well-defined asymptotics

(ODE linear in η)

10

εg

20 30 40 50 0 014

026

022

018

min εg

(η α)

η0η 0 αinfin

( η α ) infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 25: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquo

II ) LVQ+ ( only positive steps without repulsion)

1-μs

μSμσμS

μS

1-μs

μs δdd

N

ηwξww

winner correct

αinfin asymptotic configuration

symmetric about ℓ (B++B-)2

w-

w+

ℓ B+

ℓ B-

p+=02 ℓ=12 =12

classification scheme and the

achieved generalization error are

independent of the prior weights p

(and optimal for p = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 26: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

p+

min p+p-

- LVQ 1

here close to optimal

classification

p+

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 =10εg

α

learning curves

LVQ+

LVQ1

asymptotics η0 (ηα)infin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 27: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 28: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 29: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic

generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Page 30: The Dynamics of Learning Vector Quantization

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications