59
ain Inspired Computing, Cetraro, July 2013 Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen www.cs.rug.nl/biehl Prototype-based learning and adaptive distances for classification

Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen Prototype-based learning and

Embed Size (px)

Citation preview

Page 1: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Michael BiehlJohann Bernoulli Institute for Mathematics and Computer ScienceUniversity of Groningen

www.cs.rug.nl/biehl

Prototype-based learning and adaptive distances for

classification

Page 2: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 2

overview

Basic concepts of similarity / distance based classification

example system: Learning Vector Quantization (LVQ)

application: Classification of Adrenal Tumors

Distance measures and Relevance Learning

predefined distances, e.g. divergence based LVQ

application: Detection of Cassava Mosaic Disease

adaptive distances, e.g. Matrix Relevance LVQ

application: Classification of Adrenal Tumors (cont’d)

extensions: combined distances, relational data

(excursion: uniqueness and regularization of relevance matrices)

Page 3: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Part I: Basic concepts of distance/similarity based

classification

Page 4: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 4

classification problems

- character/digit/speech recognition

- medical diagnoses

- pixel-wise segmentation in image processing

- object recognition/scene analysis

- fault detection in technical systems

- remote sensing

...

machine learning approach:

extract information from example data

parameterized in a learning system (neural network, LVQ, SVM...)

working phase: application to novel data

here only: supervised learning , classification:

Page 5: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 5

distance based classification

assignment of data (objects, observations,...)

to one or several classes (crisp/soft) (categories, labels)

based on comparison with reference data (samples, prototypes)

in terms of a distance measure (dis-similarity, metric)

representation of data (a key step!)

- collection of qualitative/quantitative descriptors

- vectors of numerical features

- sequences, graphs, functional data

- relational data, e.g. in terms of pairwise (dis-) similarities

Page 6: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

K-NN classifier

a simple distance-based classifier

- store a set of labeled examples

- classify a query according to the label of the Nearest Neighbor (or the majority of K NN)

- local decision boundary acc. to (e.g.) Euclidean distances

?

- piece-wise linear class borders parameterized by all examples

feature space

+ conceptually simple, no training required, one parameter (K)

- expensive storage and computation, sensitivity to “outliers”can result in overly complex decision boundaries

Page 7: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

prototype based classification

a prototype based classifier [Kohonen 1990, 1997]

- represent the data by one or several prototypes per class

- classify a query according to the label of the nearest prototype (or alternative schemes)

- local decision boundaries according to (e.g.) Euclidean distances

- piece-wise linear class borders parameterized by prototypes

feature space

?

+ less sensitive to outliers, lower storage needs, little computationaleffort in the working phase

- training phase required in order to place prototypes,model selection problem: number of prototypes per class, etc.

Page 8: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

set of prototypes

carrying class-labels

based on dissimilarity/distance measure

nearest prototype classifier (NPC):

given - determine the winner

- assign to the class

most prominent example: (squared) Euclidean distance

Nearest Prototype Classifier

xx

reasonable requirements:

Page 9: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

∙ identification of prototype vectors from labeled example data

∙ distance based classification (e.g. Euclidean)

Learning Vector Quantization

N-dimensional data, feature vectors

• initialize prototype vectors for different classes

heuristic scheme: LVQ1 [Kohonen, 1990, 1997]

• identify the winner (closest prototype)

• present a single example

• move the winner

- closer towards the data (same class)

- away from the data (different class)

Page 10: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

∙ identification of prototype vectors from labeled example data

∙ distance based classification (e.g. Euclidean)

Learning Vector Quantization

N-dimensional data, feature vectors

∙ tesselation of feature space [piece-wise linear]

∙ distance-based classification [here: Euclidean distances]

∙ generalization ability correct classification of new data

∙ aim: discrimination of classes ( ≠ vector quantization or density estimation )

Page 11: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

sequential presentation of labelled examples

… the winner takes it all:

learning rate

many heuristic variants/modifications: [Kohonen, 1990,1997]

- learning rate schedules ηw (t) [Darken & Moody, 1992]

- update more than one prototype per step

iterative training procedure:

randomized initial , e.g. close to the class-conditional means

LVQ1

LVQ1 update step:

Page 12: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

LVQ1 update step:

LVQ1-like update forgeneralized distance:

requirement:

update decreases (increases) distance if classes coincide (are different)

LVQ1

Page 13: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Generalized LVQ

one example of cost function based training: GLVQ [Sato & Yamada, 1995]

sigmoidal (linear for small arguments), e.g.

E approximates number of misclassifications

linear

E favors large margin separation of classes, e.g.

two winning prototypes:

minimize

Page 14: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

GLVQ

training = optimization with respect to prototype position,

e.g. single example presentation, stochastic sequence of examples,

update of two prototypes per step

based on non-negative, differentiable distance

Page 15: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

GLVQ

training = optimization with respect to prototype position,

e.g. single example presentation, stochastic sequence of examples,

update of two prototypes per step

based on non-negative, differentiable distance

Page 16: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

GLVQ

training = optimization with respect to prototype position,

e.g. single example presentation, stochastic sequence of examples,

update of two prototypes per step

based on Euclidean distance

moves prototypes towards / away from

sample with prefactors

Page 17: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

+ frequently applied in a

variety of practical problems

+ intuitive interpretation

prototypes defined in feature space

+ natural for multi-class problems

- often based on purely heuristic arguments … or …

cost functions with unclear relation to classification error

Important issue: which is the ‘right’ distance measure ?

prototype/distance based classifiers

- model/parameter selection (# of prototypes, learning rate, …)

features may- scale differently - be of completely different nature - be highly correlated / dependent …

simple Euclidean distance ?

+ flexible, easy to implement

Page 18: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

related schemes

Many variants of LVQ

intuitive schemes: LVQ2.1, LVQ3, OLVQ, ...

cost function based: RSLVQ (likelihood ratios)

Supervised Neural Gas (NG)

many prototypes, rank based update

Supervised Self-Organizing Maps (SOM)

neighborhood relations, topology preserving mapping

Radial Basis Function Networks (RBF)

hidden units = centers (prototypes) with Gaussian activation

Page 19: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

remark: the curse of dimension ?

concentration of distances for large N

„distance based methods are bound to fail in high dimensions“ ???

LVQ:

- prototypes are not just random data points

- carefully selected low-noise representatives of the data

- distances of a given data point to prototypes are compared

projection to non-trivial

low-dimensional subspace!

[Ghosh et al., 2007, Witoelar et al., 2010]

models of LVQ training, analytical treatment in the limit

successful training needs training examples

see also:

Page 20: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 20

Questions ?

?

Page 21: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

An example problem: classification of adrenal tumors

Wiebke Arlt , Angela TaylorDave J. Smith, Peter Nightingale P.M. Stewart, C.H.L. Shackleton et al.

Petra SchneiderHan Stiekema Michael Biehl

Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen

School of MedicineQueen Elizabeth HospitalUniversity of Birmingham/UK(+ several centers in Europe)

tumor classification

[Arlt et al., J. Clin. Endocrinology & Metabolism,

2011]

Page 22: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

∙ adrenal tumors are common (1-2%)

and mostly found incidentally

∙ adrenocortical carcinomas (ACC) account

for 2-11% of adrenal incidentalomas

( ACA: adrenocortical adenomas )

∙ conventional diagnostic tools lack sensitivity

and are labor and cost intensive (CT, MRI)

www.ensat.org

adrenal gland

∙ idea: tumor classification based on steroid excretion profile

tumor classification

Page 23: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

- urinary steroid excretion (24 hours) - 32 potential biomarkers - biochemistry imposes correlations, grouping of steroids

tumor classification

Page 24: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

ACA patient #

ACC patient #

# steroid marker

102 patients with benign ACA

45 patients with malignant ACC

color coded excretion values(log. scale, relative to healthy controls)

data set:

tumor classification

Page 25: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Generalized LVQ , training and performance evaluation

∙ data divided in 90% training and 10% test set

∙ determine prototypes by stochastic gradient descent

typical profiles (1 per class)

∙ apply classifier to test data

evaluate performance (error rates)

∙ employ Euclidean distance measure

in the 32-dim. feature space

∙ repeat and average over many random splits

tumor classification

Page 26: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

ACA

ACC

prototypes: steroid excretion in ACA/ACC

tumor classification

Page 27: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

∙ Receiver Operator Characteristics (ROC) [Fawcett, 2000]

obtained by introducing a biased NPC:

false positive rate(1-specificity)

true p

osi

tive r

ate

(s

ensi

tivit

y)

θ = 0

rand

om g

uess

ing

Area under Curve

all tumors classified as ACA

- no false alarms

- no true positives detected

all tumors classified as ACC

- all true positives detected

- max. number of false alarms

tumor classification

Page 28: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

ROC characteristics (averaged over splits of the data set)

AUC=0.87

GLVQ performance:

tumor classification

Page 29: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 29

Questions ?

?

Page 30: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Part II: distance measures and relevance learning

Page 31: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 31

distance measures

fixed distance measures:

- select distance measures according to prior knowledge

- data driven choice in a preprocessing step

- determine prototypes for a given distance

- compare performance of various measures

example: divergence based LVQ

Page 32: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Relevance Matrix LVQ

generalized quadratic distance in LVQ:

variants: one global, several local, class-wise relevance matrices → piecewise quadratic decision boundaries

rectangular discriminative low-dim. representation e.g. for visualization [Bunte et al., 2012]

possible constraints: rank-control, sparsity, …

normalization:

diagonal matrices: single feature weights [Bojer et al., 2001] [Hammer et al., 2002]

[Schneider et al., 2009]

Page 33: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Relevance Matrix LVQ

optimization of prototypes and distance measure

WTA

Matrix-LVQ1

Page 34: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Relevance Matrix LVQ

Generalized

Matrix-LVQ

(gradients of )

optimization of prototypes and distance measure

Page 35: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 35

heuristic interpretation

summarizes

- the contribution of the original dimension

- the relevance of original features for the classification

interpretation assumes implicitly:

features have equal order of magnitude

e.g. after z-score-transformation →

(averages over data set)

standard Euclidean distance for

linearly transformed features

Page 36: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Relevance Matrix LVQ

optimization of prototype positions

distance measure(s) in one training process (≠ pre-processing)

motivation:

improved performance - weighting of features and pairs of features

simplified classification schemes - elimination of non-informative, noisy features - discriminative low-dimensional representation

insight into the data / classification problem - identification of most discriminative features - incorporation of prior knowledge (e.g. structure of Ω)

Page 37: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

Generalized Matrix LVQ , ACC vs. ACA classification

∙ data divided in 90% training, 10% test set, (z-score transformed)

∙ determine prototypes

typical profiles (1 per class)

∙ apply classifier to test data

evaluate performance (error rates, ROC)

∙ adaptive generalized quadratic distance measure

parameterized by

∙ repeat and average over many random splits

tumor classification (cont’d)

[Arlt et al., 2011][Biehl et al., 2012]

Page 38: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

off-diagonaldiagonal elements

fraction of runs (random splits) in which asteroid is rated among 9 most relevant markers

subset of 9 selected steroids ↔ technical realization (patented, University

of Birmingham/UK)

tumor classification

Relevance matrix

Page 39: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

off-diagonaldiagonal elements

19

ACA

ACCdiscriminative e.g. steroid 19

tumor classification

Page 40: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

off-diagonaldiagonal elements

8

ACA ACC

non-trivial role:steroid 8 among the most relevant!

tumor classification

Page 41: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

highly discriminativecombination of markers!

weakl

y d

iscr

imin

ati

ve m

ark

ers

12

8

tumor classification

Page 42: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

ROC characteristics

clear improvement due to

adaptive distances

(1-specificity)

(s

ensi

tivit

y)

8

GMLVQ

GRLVQ

diagonal rel.Euclidean

full matrix

AUC0.870.930.97

tumor classification

Page 43: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

observation / theory :

low rank of resulting relevance matrix

often: single relevant eigendirection

eigenvaluesin ACA/ACCclassification

intrinsic regularization

nominally ~ NxN adaptive parameters in Matrix LVQ

reduce to ~ N effective degrees of freedom

low-dimensional representation

facilitates, e.g., visualization of labeled data sets

tumor classification

Stationarity of Matrix Relevance LVQ

[M. Biehl, B. Hammer, F.-M. Schleif, T. Villmann,

IJCNN 2015, in press]

Page 44: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

tumor classification

visualization of the data set

ACAACC

Page 45: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

projection on first eigenvector

pro

ject

ion o

n s

eco

nd e

igenvect

ora multi-class example

classification of coffee samples

based on hyperspectral data

(256-dim. feature vectors)

[U. Seiffert et al., IFF Magdeburg]

prototypes

Page 46: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

related schemes

Relevance LVQ variants

local, rectangular, structured, restricted... relevance matrices

for visualization, functional data, texture recognition, etc.

relevance learning in Robust Soft LVQ, Supervised NG, etc.

combination of distances for mixed data ...

Relevance Learning related schemes in supervised learning ...

RBF Networks [Backhaus et

al., 2012]

Neighborhood Component Analysis [Goldberger et

al., 2005]

Large Margin Nearest Neighbor [Weinberger et al., 2006,

2010]

and many more!

Linear Discriminant Analysis (LDA)

one prototype per class + global matrix,

different objective function!

Page 47: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 47

http://matlabserver.cs.rug.nl/gmlvqweb/web/

Matlab collection:

Relevance and Matrix adaptation in Learning Vector

Quantization (GRLVQ, GMLVQ and LiRaM LVQ)

http://www.cs.rug.nl/~biehl/

links

Pre/re-prints etc.:

Challenging data sets ?

[email protected]

Page 48: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 48

Questions ?

?

Page 49: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013

uniqueness / regularization

quadratic distance measure (positive semi-definite pseudo-metric)

intrinsic representation

by linear transformation

uniqueness (i)

matrix square root is not unique*

canonical representation, e.g.

* irrelevant rotations, reflections, symmetries

Page 50: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 50

uniqueness of relevance matrix

uniqueness (ii)

given mapping:

i.e. the rows of lie in the null-space of

is possible if exists with

→ identical mapping of all examples and prototypes,

same distances and classification scheme w.r.t. training data

is singular if features are highly correlated, interdependent

Page 51: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 51

a simple example

uniqueness of relevance matrix

contributions cancel exactly if

(disregarded in the classification)

but naïve interpretation of diagonal

suggests high relevance!

consider two identical, entirely

irrelevant features, e.g.

Page 52: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 52

posterior null-space projection

training process yields

determine with eigenvectors and eigenvalues

column space projection:

with

Note: minimizes under the condition

formal solution: (Moore-Penrose pseudo-inverse)

removes null-space contributions

Page 53: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 53

posterior regularization

regularization:

with

- retains the eigenspace corresponding to largest eigenvalues only

- removes also eigenspace of (small) non-zero eigenvalues

- potentially improved generalization performance

- smoothens the mapping, less data set specific

training process yields

determine with eigenvectors and eigenvalues

Page 54: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 54

posterior regularization

regularized mapping

after/during training

pre-processing of data

(PCA-like)

mapped feature spacefixed K prototypes yet unknown (*)

retains original featuresflexible K can include prototypes

here:

posterior regularization in classification schemes

dependence of generalization performance on parameter K

improved interpretability of the mapping / distance measure

(*) remark: prototypes are (close to) linear combinations of feature vectors

when converged

Page 55: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 55

illustrative example

infra-red spectral data: 124 wine spamples

256 wavelengths 30 training data

94 test spectra

alco

hol co

nte

nt

(bin

ned

)

GMLVQ classification

(here)

high correlation of features (neighbor channels)

and P=30 → effective dimension ≪ 256 can be expected

Page 56: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 56

illustrative example

best performance7 dimensions remaining

over-fitting

effect

null-space correctionP=30 dimensions

regularization (beyond column space projection)

- potentially enhances generalization, controls over-fitting

Page 57: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 57

regularization

- enhances generalization

- smoothens relevance profile/matrix

- removes ‘false relevances’

- improves interpretability of Λ

before and after …

Page 58: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 58

http://matlabserver.cs.rug.nl/gmlvqweb/web/

Matlab collection:

Relevance and Matrix adaptation in Learning Vector

Quantization (GRLVQ, GMLVQ and LiRaM LVQ)

http://www.cs.rug.nl/~biehl/

links

Pre/re-prints etc.:

Page 59: Michael Biehl Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen  Prototype-based learning and

Brain Inspired Computing, Cetraro, July 2013 59

Questions ?

?