69
Classification and regression based on derivatives: a consistency result for sampled functions Nathalie Villa-Vialaneix http://www.nathalievilla.org Toulouse School of Economics TSE Seminar April 29, 2009 TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 1 / 25

Classification and regression based on derivatives: a consistency result for sampled functions

  • Upload
    tuxette

  • View
    72

  • Download
    0

Embed Size (px)

DESCRIPTION

Department Seminar, Toulouse School of Economics April 29th, 2014

Citation preview

Page 1: Classification and regression based on derivatives: a consistency result for sampled functions

Classification and regression based on derivatives: aconsistency result for sampled functions

Nathalie Villa-Vialaneixhttp://www.nathalievilla.org

Toulouse School of Economics

TSE SeminarApril 29, 2009

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 1 / 25

Page 2: Classification and regression based on derivatives: a consistency result for sampled functions

Research interests

Graph mining (& social networks) Modelling land use evolution

Functional data analysis

Graph mining (& social networks) Modelling land use evolution

Functional data analysis

Based on a preprint (available on demand) written with Fabrice Rossi. Ashort version is given on TSE web site.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25

Page 3: Classification and regression based on derivatives: a consistency result for sampled functions

Research interests

Graph mining (& social networks) Modelling land use evolution

Functional data analysis

Based on a preprint (available on demand) written with Fabrice Rossi. Ashort version is given on TSE web site.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25

Page 4: Classification and regression based on derivatives: a consistency result for sampled functions

Research interests

Graph mining (& social networks) Modelling land use evolution

Functional data analysis

Based on a preprint (available on demand) written with Fabrice Rossi. Ashort version is given on TSE web site.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 2 / 25

Page 5: Classification and regression based on derivatives: a consistency result for sampled functions

Outline

1 Introduction and motivations

2 A general consistency result

3 Examples

4 Conclusion and prospect

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 3 / 25

Page 6: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

A regression or classification where the predictor takesvalues in an infinite dimensional space

Settings

(X ,Y) is a random pair of variables where

• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

• X ∈ (X, 〈., .〉X), an infinite dimensional Hilbert space.

We are given a learning set Sn = {(Xi ,Yi)}ni=1 of n i.i.d. copies of (X ,Y).

Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:

• Classification case: limn→+∞ P (φn(X) , Y) = L∗ whereL∗ = infφ:X→{−1,1} P (φ(X) , Y) is the Bayes risk.

• Regression case: limn→+∞ E([φn(X) − Y ]2

)= L∗ where

L∗ = infφ:X→R E([φ(X) − Y ]2

)will also be called the Bayes risk.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25

Page 7: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

A regression or classification where the predictor takesvalues in an infinite dimensional space

Settings

(X ,Y) is a random pair of variables where

• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

• X ∈ (X, 〈., .〉X), an infinite dimensional Hilbert space.

We are given a learning set Sn = {(Xi ,Yi)}ni=1 of n i.i.d. copies of (X ,Y).

Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:

• Classification case: limn→+∞ P (φn(X) , Y) = L∗ whereL∗ = infφ:X→{−1,1} P (φ(X) , Y) is the Bayes risk.

• Regression case: limn→+∞ E([φn(X) − Y ]2

)= L∗ where

L∗ = infφ:X→R E([φ(X) − Y ]2

)will also be called the Bayes risk.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25

Page 8: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

A regression or classification where the predictor takesvalues in an infinite dimensional space

Settings

(X ,Y) is a random pair of variables where

• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

• X ∈ (X, 〈., .〉X), an infinite dimensional Hilbert space.

We are given a learning set Sn = {(Xi ,Yi)}ni=1 of n i.i.d. copies of (X ,Y).

Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:

• Classification case: limn→+∞ P (φn(X) , Y) = L∗ whereL∗ = infφ:X→{−1,1} P (φ(X) , Y) is the Bayes risk.

• Regression case: limn→+∞ E([φn(X) − Y ]2

)= L∗ where

L∗ = infφ:X→R E([φ(X) − Y ]2

)will also be called the Bayes risk.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25

Page 9: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

A regression or classification where the predictor takesvalues in an infinite dimensional space

Settings

(X ,Y) is a random pair of variables where

• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

• X ∈ (X, 〈., .〉X), an infinite dimensional Hilbert space.

We are given a learning set Sn = {(Xi ,Yi)}ni=1 of n i.i.d. copies of (X ,Y).

Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:

• Classification case: limn→+∞ P (φn(X) , Y) = L∗ whereL∗ = infφ:X→{−1,1} P (φ(X) , Y) is the Bayes risk.

• Regression case: limn→+∞ E([φn(X) − Y ]2

)= L∗ where

L∗ = infφ:X→R E([φ(X) − Y ]2

)will also be called the Bayes risk.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25

Page 10: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

A regression or classification where the predictor takesvalues in an infinite dimensional space

Settings

(X ,Y) is a random pair of variables where

• Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

• X ∈ (X, 〈., .〉X), an infinite dimensional Hilbert space.

We are given a learning set Sn = {(Xi ,Yi)}ni=1 of n i.i.d. copies of (X ,Y).

Purpose: Find φn : X → {−1, 1} or R, that is universally consistent:

• Classification case: limn→+∞ P (φn(X) , Y) = L∗ whereL∗ = infφ:X→{−1,1} P (φ(X) , Y) is the Bayes risk.

• Regression case: limn→+∞ E([φn(X) − Y ]2

)= L∗ where

L∗ = infφ:X→R E([φ(X) − Y ]2

)will also be called the Bayes risk.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 4 / 25

Page 11: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

An example

Predicting the rate of yellow berry in durum wheat from its NIRspectrum.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 5 / 25

Page 12: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Using derivativesPractically, X (m) is often more relevant than X for the prediction.

But X → X (m) induces information loss and

infφ:DmX→{−1,1}

P(φ(X (m)) , Y

)≥ inf

φ:X→{−1,1}P (φ(X) , Y) = L∗

andinf

φ:DmX→RE

([φ(X (m)) − Y

]2)≥ inf

φ:X→RP([φ(X) − Y ]2

)= L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25

Page 13: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Using derivativesPractically, X (m) is often more relevant than X for the prediction.But X → X (m) induces information loss and

infφ:DmX→{−1,1}

P(φ(X (m)) , Y

)≥ inf

φ:X→{−1,1}P (φ(X) , Y) = L∗

andinf

φ:DmX→RE

([φ(X (m)) − Y

]2)≥ inf

φ:X→RP([φ(X) − Y ]2

)= L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 6 / 25

Page 14: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Sampled functionsPractically, (Xi)i are not perfectly known but we are given a discretesampling of them: Xτd

i = (Xi(t))t∈τd where τd = {tτd1 , . . . , tτd

|τd |}.

Then, X (m)i is estimated from Xτd

i and if the estimation is denoted by

X̂ (m)τd , this also induces information loss:

infφ:DmX→{−1,1}

P(φ(X̂ (m)

τd ) , Y)≥ inf

φ:DmX→{−1,1}P(φ(X (m)) , Y

)≥ L∗

and

infφ:DmX→R

E

([φ(X̂ (m)

τd ) − Y]2

)≥ inf

φ:DmX→RE

([φ(X (m)) − Y

]2)≥ L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25

Page 15: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Sampled functionsPractically, (Xi)i are not perfectly known but we are given a discretesampling of them: Xτd

i = (Xi(t))t∈τd where τd = {tτd1 , . . . , tτd

|τd |}.

The sampling can be noisy.

Then, X (m)i is estimated from Xτd

i and if the estimation is denoted by

X̂ (m)τd , this also induces information loss:

infφ:DmX→{−1,1}

P(φ(X̂ (m)

τd ) , Y)≥ inf

φ:DmX→{−1,1}P(φ(X (m)) , Y

)≥ L∗

and

infφ:DmX→R

E

([φ(X̂ (m)

τd ) − Y]2

)≥ inf

φ:DmX→RE

([φ(X (m)) − Y

]2)≥ L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25

Page 16: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Sampled functionsPractically, (Xi)i are not perfectly known but we are given a discretesampling of them: Xτd

i = (Xi(t))t∈τd where τd = {tτd1 , . . . , tτd

|τd |}.

Then, X (m)i is estimated from Xτd

i and if the estimation is denoted by

X̂ (m)τd , this also induces information loss:

infφ:DmX→{−1,1}

P(φ(X̂ (m)

τd ) , Y)≥ inf

φ:DmX→{−1,1}P(φ(X (m)) , Y

)≥ L∗

and

infφ:DmX→R

E

([φ(X̂ (m)

τd ) − Y]2

)≥ inf

φ:DmX→RE

([φ(X (m)) − Y

]2)≥ L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 7 / 25

Page 17: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Purpose of the presentation

Find a classifier or a regression function φn,τd built from X̂ (m)τd such that the

risk of φn,τd asymptotically reaches the Bayes risk L∗:

lim|τd |→+∞

limn→+∞

P(φn,τd (X̂ (m)

τd ) , Y)

= L∗

or

lim|τd |→+∞

limn→+∞

E

([φn,τd (X̂ (m)

τd ) − Y]2

)= L∗

Main idea: Use a relevant way to estimate X (m) from Xτd (by smoothingsplines) and combine the consistency of splines with the consistency of aR|τd |-classifier or regression function.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25

Page 18: Classification and regression based on derivatives: a consistency result for sampled functions

Introduction and motivations

Purpose of the presentation

Find a classifier or a regression function φn,τd built from X̂ (m)τd such that the

risk of φn,τd asymptotically reaches the Bayes risk L∗:

lim|τd |→+∞

limn→+∞

P(φn,τd (X̂ (m)

τd ) , Y)

= L∗

or

lim|τd |→+∞

limn→+∞

E

([φn,τd (X̂ (m)

τd ) − Y]2

)= L∗

Main idea: Use a relevant way to estimate X (m) from Xτd (by smoothingsplines) and combine the consistency of splines with the consistency of aR|τd |-classifier or regression function.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 8 / 25

Page 19: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Basics about smoothing splines ISuppose that X is the Sobolev space

Hm ={h ∈ L2

[0,1]|∀ j = 1, . . . ,m,D jh exists in the weak sense and Dmh ∈ L2}

equipped with the scalar product

〈u, v〉Hm = 〈Dmu,Dmv〉L2 +m∑

j=1

B juB jv

where B are m boundary conditions such that KerB ∩ Pm−1 = {0}.(Hm, 〈., .〉Hm ) is a RKHS: there is k0 : Pm−1 × Pm−1 → R andk1 : KerB × KerB → R such that

∀ u ∈ Pm−1, t ∈ [0, 1], 〈u, k0(t , .)〉Hm = u(t)

and∀ u ∈ KerB , t ∈ [0, 1], 〈u, k1(t , .)〉Hm = u(t)

See [Berlinet and Thomas-Agnan, 2004] for further details.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25

Page 20: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Basics about smoothing splines ISuppose that X is the Sobolev space

Hm ={h ∈ L2

[0,1]|∀ j = 1, . . . ,m,D jh exists in the weak sense and Dmh ∈ L2}

equipped with the scalar product

〈u, v〉Hm = 〈Dmu,Dmv〉L2 +m∑

j=1

B juB jv

where B are m boundary conditions such that KerB ∩ Pm−1 = {0}.

(Hm, 〈., .〉Hm ) is a RKHS: there is k0 : Pm−1 × Pm−1 → R andk1 : KerB × KerB → R such that

∀ u ∈ Pm−1, t ∈ [0, 1], 〈u, k0(t , .)〉Hm = u(t)

and∀ u ∈ KerB , t ∈ [0, 1], 〈u, k1(t , .)〉Hm = u(t)

See [Berlinet and Thomas-Agnan, 2004] for further details.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25

Page 21: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Basics about smoothing splines ISuppose that X is the Sobolev space

Hm ={h ∈ L2

[0,1]|∀ j = 1, . . . ,m,D jh exists in the weak sense and Dmh ∈ L2}

equipped with the scalar product

〈u, v〉Hm = 〈Dmu,Dmv〉L2 +m∑

j=1

B juB jv

where B are m boundary conditions such that KerB ∩ Pm−1 = {0}.(Hm, 〈., .〉Hm ) is a RKHS: there is k0 : Pm−1 × Pm−1 → R andk1 : KerB × KerB → R such that

∀ u ∈ Pm−1, t ∈ [0, 1], 〈u, k0(t , .)〉Hm = u(t)

and∀ u ∈ KerB , t ∈ [0, 1], 〈u, k1(t , .)〉Hm = u(t)

See [Berlinet and Thomas-Agnan, 2004] for further details.TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 9 / 25

Page 22: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Basics about smoothing splines II

A simple example of boundary conditions:

h(0) = h(1)(0) = . . . = h(m−1)(0) = 0.

Then,

k0(s, t) =m−1∑k=0

tk sk

(k !)2

and

k1(s, t) =

∫ 1

0

(t − w)m−1+ (s − w)m−1

+

(m − 1)!dw.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 10 / 25

Page 23: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Estimating the predictors with smoothing splines I

Assumption (A1)

• |τd | ≥ m − 1

• sampling points are distinct in [0, 1]

• B j are linearly independent from h → h(t) for all t ∈ τd

[Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a uniquex̂λ,τd ∈ H

m solution of

arg minh∈Hm

1|τd |

|τd |∑l)1

(h(tl) − xτd )2 + λ

∫[0,1]

(h(m)(t))2dt .

and x̂λ,τd = Sλ,τd xτd where Sλ,τd is a full rank operator from R|τd | to Hm.These assumptions are fullfilled by the previous simple example as longas 0 < τd .

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25

Page 24: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Estimating the predictors with smoothing splines I

Assumption (A1)

• |τd | ≥ m − 1

• sampling points are distinct in [0, 1]

• B j are linearly independent from h → h(t) for all t ∈ τd

[Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a uniquex̂λ,τd ∈ H

m solution of

arg minh∈Hm

1|τd |

|τd |∑l)1

(h(tl) − xτd )2 + λ

∫[0,1]

(h(m)(t))2dt .

and x̂λ,τd = Sλ,τd xτd where Sλ,τd is a full rank operator from R|τd | to Hm.

These assumptions are fullfilled by the previous simple example as longas 0 < τd .

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25

Page 25: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Estimating the predictors with smoothing splines I

Assumption (A1)

• |τd | ≥ m − 1

• sampling points are distinct in [0, 1]

• B j are linearly independent from h → h(t) for all t ∈ τd

[Kimeldorf and Wahba, 1971]: for xτd in R|τd |, there is a uniquex̂λ,τd ∈ H

m solution of

arg minh∈Hm

1|τd |

|τd |∑l)1

(h(tl) − xτd )2 + λ

∫[0,1]

(h(m)(t))2dt .

and x̂λ,τd = Sλ,τd xτd where Sλ,τd is a full rank operator from R|τd | to Hm.These assumptions are fullfilled by the previous simple example as longas 0 < τd .

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 11 / 25

Page 26: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Estimating the predictors with smoothing splines II

S, τd is given by:

Sλ,τd = ωT (U(K1 + λI|τd |)UT )−1U(K1 + λI|τd |)

−1

+ηT (K1 + λI|τd |)−1(I|τd | − UT (U(K1 + λI|τd |)

−1U(K1 + λI|τd |)−1)

= ωT M0 + ηT M1

with

• {ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m t∈τd ;

• η = (k1(t , .))Tt∈τd

and K1 = (k1(t , t ′))t ,t ′∈τd .

The observations of the predictor X (NIR spectra) are then estimatedfrom their sampling Xτd by X̂λ,τd .

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25

Page 27: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Estimating the predictors with smoothing splines II

S, τd is given by:

Sλ,τd = ωT (U(K1 + λI|τd |)UT )−1U(K1 + λI|τd |)

−1

+ηT (K1 + λI|τd |)−1(I|τd | − UT (U(K1 + λI|τd |)

−1U(K1 + λI|τd |)−1)

= ωT M0 + ηT M1

with

• {ω1, . . . , ωm} is a basis of Pm−1, ω = (ω1, . . . , ωm)T andU = (ωi(t))i=1,...,m t∈τd ;

• η = (k1(t , .))Tt∈τd

and K1 = (k1(t , t ′))t ,t ′∈τd .

The observations of the predictor X (NIR spectra) are then estimatedfrom their sampling Xτd by X̂λ,τd .

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 12 / 25

Page 28: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)

2 Easy way to use derivatives:

= 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where Qλ,τd is the Choleski triangle of Mλ,τd : QTλ,τd

Qλ,τd = Mλ,τd .Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 29: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)2 Easy way to use derivatives:

〈Sλ,τd uτd ,Sλ,τd vτd 〉Hm = 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where Qλ,τd is the Choleski triangle of Mλ,τd : QTλ,τd

Qλ,τd = Mλ,τd .Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 30: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)2 Easy way to use derivatives:

(uτd )T MT0 WM0vτd + (uτd )T MT

1 K1M1vτd = 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where K1, M0 and M1 have been previously defined andW = (〈ωi , ωj〉Hm )i,j=1,...,m.

where Qλ,τd is the Choleski triangle ofMλ,τd : QT

λ,τdQλ,τd = Mλ,τd .

Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 31: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)2 Easy way to use derivatives:

(uτd )T Mλ,τd vτd = 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where Mλ,τd is symmetric, definite positive.

where Qλ,τd is theCholeski triangle of Mλ,τd : QT

λ,τdQλ,τd = Mλ,τd .

Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 32: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)2 Easy way to use derivatives:

(Qλ,τd uτd )T (Qλ,τd vτd ) = 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where Qλ,τd is the Choleski triangle of Mλ,τd : QTλ,τd

Qλ,τd = Mλ,τd .Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 33: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Two important consequences1 No information loss

infφ:Hm→{−1,1}

P(φ(X̂λ,τd ) , Y

)= inf

φ:R|τd |→{−1,1}P (φ(Xτd ) , Y)

and

infφ:Hm→{−1,1}

E([φ(X̂λ,τd ) − Y

]2)

= infφ:R|τd |→{−1,1}

P([φ(Xτd ) − Y ]2

)2 Easy way to use derivatives:

(Qλ,τd uτd )T (Qλ,τd vτd ) = 〈̂uλ,τd , v̂λ,τd 〉Hm

' 〈̂u(m)λ,τd

, v̂(m)λ,τd〉L2

where Qλ,τd is the Choleski triangle of Mλ,τd : QTλ,τd

Qλ,τd = Mλ,τd .Remark: Qλ,τd is calculated only from the RKHS, λ and τd : it does notdepend on the data set.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 13 / 25

Page 34: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Classification and regression based on derivativesSuppose that we know a consistent classifier or regression function inR|τd | that is based on R|τd | scalar product or norm.

The corresponding derivative based classifier or regression functionis given by using the norm induced by Qλ,τd :

Example: Nonparametric kernel regression

Ψ : u ∈ R|τd | →

∑ni=1 TiK

(‖u−Ui‖R|τd |

hn

)∑n

i=1 K(‖u−Ui‖R|τd |

hn

)where (Ui ,Ti)i=1,...,n is a learning set in R|τd | × R.

φn,d = Ψ ◦ Qλ,τd : x ∈ Hm →

∑ni=1 YiK

(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)∑n

i=1 K(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)

'−→

∑ni=1 YiK

(‖x(m)−X (m)

i ‖L2

hn

)∑n

i=1 K(‖x(m)−X (m)

i ‖L2

hn

)

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25

Page 35: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Classification and regression based on derivativesSuppose that we know a consistent classifier or regression function inR|τd | that is based on R|τd | scalar product or norm.The corresponding derivative based classifier or regression functionis given by using the norm induced by Qλ,τd :Example: Nonparametric kernel regression

Ψ : u ∈ R|τd | →

∑ni=1 TiK

(‖u−Ui‖R|τd |

hn

)∑n

i=1 K(‖u−Ui‖R|τd |

hn

)where (Ui ,Ti)i=1,...,n is a learning set in R|τd | × R.

φn,d = Ψ ◦ Qλ,τd : x ∈ Hm →

∑ni=1 YiK

(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)∑n

i=1 K(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)

'−→

∑ni=1 YiK

(‖x(m)−X (m)

i ‖L2

hn

)∑n

i=1 K(‖x(m)−X (m)

i ‖L2

hn

)

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25

Page 36: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Classification and regression based on derivativesSuppose that we know a consistent classifier or regression function inR|τd | that is based on R|τd | scalar product or norm.The corresponding derivative based classifier or regression functionis given by using the norm induced by Qλ,τd :Example: Nonparametric kernel regression

φn,d = Ψ ◦ Qλ,τd : x ∈ Hm →

∑ni=1 YiK

(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)∑n

i=1 K(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)

'−→

∑ni=1 YiK

(‖x(m)−X (m)

i ‖L2

hn

)∑n

i=1 K(‖x(m)−X (m)

i ‖L2

hn

)

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25

Page 37: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Classification and regression based on derivativesSuppose that we know a consistent classifier or regression function inR|τd | that is based on R|τd | scalar product or norm.The corresponding derivative based classifier or regression functionis given by using the norm induced by Qλ,τd :Example: Nonparametric kernel regression

φn,d = Ψ ◦ Qλ,τd : x ∈ Hm →

∑ni=1 YiK

(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)∑n

i=1 K(‖Qλ,τd xτd−Qλ,τd X

τdi ‖R|τd |

hn

)

'−→

∑ni=1 YiK

(‖x(m)−X (m)

i ‖L2

hn

)∑n

i=1 K(‖x(m)−X (m)

i ‖L2

hn

)

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 14 / 25

Page 38: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Remark for consistencyClassification case (approximatively the same is true for regressioncase):

P(φn,τd (X̂λ,τd ) , Y

)− L∗ = P

(φn,τd (X̂λ,τd ) , Y

)− L∗d + L∗d − L∗

where L∗d = infφ:R|τd |→{−1,1} P (φ(Xτd ) , Y).

1 For all fixed d,lim

n→+∞P(φn,τd (X̂λ,τd ) , Y

)= L∗d

as long as the R|τd |-classifier is consistent because there is aone-to-one mapping between Xτd and X̂λ,τd .

2

L∗d − L∗ ≤ E(∣∣∣∣E(Y |X̂λ,τd ) − E(Y |X)

∣∣∣∣)with consistency of spline estimate X̂λ,τd and assumption on theregularity of E(Y |X = .), consistency would be proved.

But continuity of E(Y |X = .) is a strong assumption in infinitedimensional case, and is not easy to check.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25

Page 39: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Remark for consistencyClassification case (approximatively the same is true for regressioncase):

P(φn,τd (X̂λ,τd ) , Y

)− L∗ = P

(φn,τd (X̂λ,τd ) , Y

)− L∗d + L∗d − L∗

where L∗d = infφ:R|τd |→{−1,1} P (φ(Xτd ) , Y).1 For all fixed d,

limn→+∞

P(φn,τd (X̂λ,τd ) , Y

)= L∗d

as long as the R|τd |-classifier is consistent because there is aone-to-one mapping between Xτd and X̂λ,τd .

2

L∗d − L∗ ≤ E(∣∣∣∣E(Y |X̂λ,τd ) − E(Y |X)

∣∣∣∣)with consistency of spline estimate X̂λ,τd and assumption on theregularity of E(Y |X = .), consistency would be proved.

But continuity of E(Y |X = .) is a strong assumption in infinitedimensional case, and is not easy to check.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25

Page 40: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Remark for consistencyClassification case (approximatively the same is true for regressioncase):

P(φn,τd (X̂λ,τd ) , Y

)− L∗ = P

(φn,τd (X̂λ,τd ) , Y

)− L∗d + L∗d − L∗

where L∗d = infφ:R|τd |→{−1,1} P (φ(Xτd ) , Y).1 For all fixed d,

limn→+∞

P(φn,τd (X̂λ,τd ) , Y

)= L∗d

as long as the R|τd |-classifier is consistent because there is aone-to-one mapping between Xτd and X̂λ,τd .

2

L∗d − L∗ ≤ E(∣∣∣∣E(Y |X̂λ,τd ) − E(Y |X)

∣∣∣∣)with consistency of spline estimate X̂λ,τd and assumption on theregularity of E(Y |X = .), consistency would be proved.

But continuity of E(Y |X = .) is a strong assumption in infinitedimensional case, and is not easy to check.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25

Page 41: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Remark for consistencyClassification case (approximatively the same is true for regressioncase):

P(φn,τd (X̂λ,τd ) , Y

)− L∗ = P

(φn,τd (X̂λ,τd ) , Y

)− L∗d + L∗d − L∗

where L∗d = infφ:R|τd |→{−1,1} P (φ(Xτd ) , Y).1 For all fixed d,

limn→+∞

P(φn,τd (X̂λ,τd ) , Y

)= L∗d

as long as the R|τd |-classifier is consistent because there is aone-to-one mapping between Xτd and X̂λ,τd .

2

L∗d − L∗ ≤ E(∣∣∣∣E(Y |X̂λ,τd ) − E(Y |X)

∣∣∣∣)with consistency of spline estimate X̂λ,τd and assumption on theregularity of E(Y |X = .), consistency would be proved.But continuity of E(Y |X = .) is a strong assumption in infinitedimensional case, and is not easy to check.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 15 / 25

Page 42: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Spline consistencyLet λ depends on d and denote (λd)d the series of regularizationparameters. Also introduce

∆τd := max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd:= min

1≤i<|τd |{ti+1 − ti}

Assumption (A2)

• There is R such that ∆τd/∆τd≤ R for all d;

• limd→+∞ |τd | = +∞;

• limd→+∞ λd = 0.

[Ragozin, 1983]: Under (A1) and (A2), ∃AR ,m and BR ,m such that for anyx ∈ Hm and any λd > 0,∥∥∥x̂λd ,τd − x

∥∥∥2L2 ≤

(AR ,mλd + BR ,m

1|τd |

2m

)‖Dmx‖2L2

d→+∞−−−−−−→ 0

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25

Page 43: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Spline consistencyLet λ depends on d and denote (λd)d the series of regularizationparameters. Also introduce

∆τd := max{t1, t2 − t1, . . . , 1 − t|τd |}, ∆τd:= min

1≤i<|τd |{ti+1 − ti}

Assumption (A2)

• There is R such that ∆τd/∆τd≤ R for all d;

• limd→+∞ |τd | = +∞;

• limd→+∞ λd = 0.

[Ragozin, 1983]: Under (A1) and (A2), ∃AR ,m and BR ,m such that for anyx ∈ Hm and any λd > 0,∥∥∥x̂λd ,τd − x

∥∥∥2L2 ≤

(AR ,mλd + BR ,m

1|τd |

2m

)‖Dmx‖2L2

d→+∞−−−−−−→ 0

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 16 / 25

Page 44: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Bayes risk consistency

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

or

Assumption (A3b)

τd ⊂ τd+1 for all d and E(Y2) is finite.

Under (A1)-(A3), limd→+∞ L∗d = L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25

Page 45: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Bayes risk consistency

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

or

Assumption (A3b)

τd ⊂ τd+1 for all d and E(Y2) is finite.

Under (A1)-(A3), limd→+∞ L∗d = L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25

Page 46: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Bayes risk consistency

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

or

Assumption (A3b)

τd ⊂ τd+1 for all d and E(Y2) is finite.

Under (A1)-(A3), limd→+∞ L∗d = L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 17 / 25

Page 47: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Proof for assumption (A3a)

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

The proof is based on a result of [Faragó and Györfi, 1975]:

For a pair of random variables (X ,Y) taking their values inX×{−1, 1} whereX is an arbitrary metric space and for a seriesof functions Td : X → X such that

E(δ(Td(X),X))d→+∞−−−−−−→ 0

then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) , Y) = L∗.

Using for Td the spline estimate and the previous inequality of[Ragozin, 1983] about this estimates give the assumption. Then the resultfollows.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25

Page 48: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Proof for assumption (A3a)

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

The proof is based on a result of [Faragó and Györfi, 1975]:

For a pair of random variables (X ,Y) taking their values inX×{−1, 1} whereX is an arbitrary metric space and for a seriesof functions Td : X → X such that

E(δ(Td(X),X))d→+∞−−−−−−→ 0

then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) , Y) = L∗.

Using for Td the spline estimate and the previous inequality of[Ragozin, 1983] about this estimates give the assumption. Then the resultfollows.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25

Page 49: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Proof for assumption (A3a)

Assumption (A3a)

E(‖DmX‖2

L2

)is finite and Y ∈ {−1, 1}.

The proof is based on a result of [Faragó and Györfi, 1975]:

For a pair of random variables (X ,Y) taking their values inX×{−1, 1} whereX is an arbitrary metric space and for a seriesof functions Td : X → X such that

E(δ(Td(X),X))d→+∞−−−−−−→ 0

then limd→+∞ infφ:X→{−1,1} P(φ(Td(X)) , Y) = L∗.

Using for Td the spline estimate and the previous inequality of[Ragozin, 1983] about this estimates give the assumption. Then the resultfollows.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 18 / 25

Page 50: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Proof for assumption (A3b)

Assumption (A3b)

τd ⊂ τd+1 for all d and E(Y2) is finite.

Under (A3b), (E(Y |X̂λd ,τd ))d is a uniformly bounded martingale and thusconverge for the L1-norm. Using the consistency of (X̂λd ,τd )d to Xcompletes the proof.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25

Page 51: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Proof for assumption (A3b)

Assumption (A3b)

τd ⊂ τd+1 for all d and E(Y2) is finite.

Under (A3b), (E(Y |X̂λd ,τd ))d is a uniformly bounded martingale and thusconverge for the L1-norm. Using the consistency of (X̂λd ,τd )d to Xcompletes the proof.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 19 / 25

Page 52: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

Concluding result

Theorem

Under assumptions (A1)-(A3),

limd→+∞

limn→+∞

P(φn,τd (X̂λd ,τd ) , Y

)= L∗

andlim

|τd |→+∞lim

n→+∞E

([φn,τd (X̂λd ,τd ) − Y

]2)

= L∗

Proof: For a ε > 0, fix d0 such that, for all d ≥ d0, L∗d − L∗ ≤ ε/2.Then, by consistency of the R|τd |-classifier or regression function,conclude.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 20 / 25

Page 53: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

A practical application to SVM I

Recall that, for a learning set (Ui ,Ti)i=1,...,n in Rp × {−1, 1}, gaussian SVMis the classifier

u ∈ Rp → Sign

n∑i=1

αiTie−γ‖u−Ui‖

2Rp

where (αi)i satisfy the following quadratic optimization problem:

arg minw

n∑i=1

∣∣∣1 − Tiw(Ui)∣∣∣+

+ C‖w‖2S

where w(u) =∑n

i=1 αie−γ‖u−Ui‖

2Rp and S is the RKHS associated with the

gaussian kernel and C is a regularization parameter.

Under suitable assumptions, [Steinwart, 2002] proves the consistency ofSVM classifiers.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25

Page 54: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

A practical application to SVM I

Recall that, for a learning set (Ui ,Ti)i=1,...,n in Rp × {−1, 1}, gaussian SVMis the classifier

u ∈ Rp → Sign

n∑i=1

αiTie−γ‖u−Ui‖

2Rp

where (αi)i satisfy the following quadratic optimization problem:

arg minw

n∑i=1

∣∣∣1 − Tiw(Ui)∣∣∣+

+ C‖w‖2S

where w(u) =∑n

i=1 αie−γ‖u−Ui‖

2Rp and S is the RKHS associated with the

gaussian kernel and C is a regularization parameter.Under suitable assumptions, [Steinwart, 2002] proves the consistency ofSVM classifiers.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 21 / 25

Page 55: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

A practical application to SVM II

Additional assumptions related to SVM: Assumptions (A4)

• For all d, the regularization parameter depends on n such thatlimn→+∞ nCd

n = +∞ and Cdn = On

(nβd−1

)for a 0 < βd < 1/d.

• For all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Result: Under assumptions (A1)-(A4), the SVM φn,d : x ∈ Hm →

Sign

n∑i=1

αiYie−γ‖Qλd ,τd xτd−Qλd ,τd X

τdi ‖

2Rd

' Sign

n∑i=1

αiYie−γ‖x(m)−X (m)

i |2L2

is consistent: limd→+∞ limn→+∞ P

(φn,τd (X̂λd ,τd ) , Y

)= L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25

Page 56: Classification and regression based on derivatives: a consistency result for sampled functions

A general consistency result

A practical application to SVM II

Additional assumptions related to SVM: Assumptions (A4)

• For all d, the regularization parameter depends on n such thatlimn→+∞ nCd

n = +∞ and Cdn = On

(nβd−1

)for a 0 < βd < 1/d.

• For all d, there is a bounded subset of R|τd |, Bd , such that Xτd belongsto Bd .

Result: Under assumptions (A1)-(A4), the SVM φn,d : x ∈ Hm →

Sign

n∑i=1

αiYie−γ‖Qλd ,τd xτd−Qλd ,τd X

τdi ‖

2Rd

' Sign

n∑i=1

αiYie−γ‖x(m)−X (m)

i |2L2

is consistent: limd→+∞ limn→+∞ P

(φn,τd (X̂λd ,τd ) , Y

)= L∗.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 22 / 25

Page 57: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:

Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 58: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:Method:

1 Split the data set into training sample and test sample;

2 Estimate λd by leave-one-out on the Xτdi of the training sample. The

Sobolev space is the one with boundary conditionsh(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).

3 Calculate Qλd ,τd .4 Calculate Gaussian and Linear SVM classifiers for initial data, first

order derivatives and second order derivatives (with the use of Qλd ,τd

for the derivatives).5 Calculate the pourcentage of false classifications made by all these

classifiers for the data in test sample.Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 59: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:Method:

1 Split the data set into training sample and test sample;2 Estimate λd by leave-one-out on the Xτd

i of the training sample. TheSobolev space is the one with boundary conditionsh(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).

3 Calculate Qλd ,τd .4 Calculate Gaussian and Linear SVM classifiers for initial data, first

order derivatives and second order derivatives (with the use of Qλd ,τd

for the derivatives).5 Calculate the pourcentage of false classifications made by all these

classifiers for the data in test sample.Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 60: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:Method:

1 Split the data set into training sample and test sample;2 Estimate λd by leave-one-out on the Xτd

i of the training sample. TheSobolev space is the one with boundary conditionsh(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).

3 Calculate Qλd ,τd .

4 Calculate Gaussian and Linear SVM classifiers for initial data, firstorder derivatives and second order derivatives (with the use of Qλd ,τd

for the derivatives).5 Calculate the pourcentage of false classifications made by all these

classifiers for the data in test sample.Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 61: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:Method:

1 Split the data set into training sample and test sample;2 Estimate λd by leave-one-out on the Xτd

i of the training sample. TheSobolev space is the one with boundary conditionsh(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).

3 Calculate Qλd ,τd .4 Calculate Gaussian and Linear SVM classifiers for initial data, first

order derivatives and second order derivatives (with the use of Qλd ,τd

for the derivatives).

5 Calculate the pourcentage of false classifications made by all theseclassifiers for the data in test sample.

Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 62: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVMPredicted fat content class of peaces of meat from NIR spectra:Method:

1 Split the data set into training sample and test sample;2 Estimate λd by leave-one-out on the Xτd

i of the training sample. TheSobolev space is the one with boundary conditionsh(0) = . . . = h(m)(0) = 0 (for m = 1 and m = 2).

3 Calculate Qλd ,τd .4 Calculate Gaussian and Linear SVM classifiers for initial data, first

order derivatives and second order derivatives (with the use of Qλd ,τd

for the derivatives).5 Calculate the pourcentage of false classifications made by all these

classifiers for the data in test sample.

Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 63: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 1: Classification with SVM

Predicted fat content class of peaces of meat from NIR spectra:Kernel (SVM) Error rate on test (and sd)Linear (L ) 3.78 % (2.52)Linear on derivatives (L (1)) 2.97 % (1.93)Linear on second derivatives (L (2)) 3,12 % (1,71)Gaussian (G) 5.97 % (2.76)Gaussian on derivatives (G(1)) 3.69 % (2.03)Gaussian on second derivatives (G(2)) 2.77 % (2.07)

with significative differences (Wilcoxon paired test with level 1%) betweenG(2) and G(1) and between G(1) and G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 23 / 25

Page 64: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 2: Regression with kernel ridge regressionPredicting yellow berry on durum wheat: Recall that kernel ridgeregression is given by solving

arg minw

n∑i=1

(Ti − w(Ui))2 + ‖w‖2S

where S is a RKHS where w takes its values and defined from a givenkernel (such as the Gaussian kernel) and (Ui ,Ti)i is a training sample inRp × R.

Kernel (SVM) MSE on test (and sd ×10−3)Linear (L ) 0.122 % (8.77)Linear on derivatives (L (1)) 0.138 % (9.53)Linear on second derivatives (L (2)) 0.122 % (1.71)Gaussian (G) 0.110 % (20.2)Gaussian on derivatives (G(1)) 0.098 % (7.92)Gaussian on second derivatives (G(2)) 0.094 % (8.35)

The differences are significative between G(2) / G(1) and between G(1) / G.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25

Page 65: Classification and regression based on derivatives: a consistency result for sampled functions

Examples

Example 2: Regression with kernel ridge regressionPredicting yellow berry on durum wheat:

Kernel (SVM) MSE on test (and sd ×10−3)Linear (L ) 0.122 % (8.77)Linear on derivatives (L (1)) 0.138 % (9.53)Linear on second derivatives (L (2)) 0.122 % (1.71)Gaussian (G) 0.110 % (20.2)Gaussian on derivatives (G(1)) 0.098 % (7.92)Gaussian on second derivatives (G(2)) 0.094 % (8.35)

The differences are significative between G(2) / G(1) and between G(1) / G.TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 24 / 25

Page 66: Classification and regression based on derivatives: a consistency result for sampled functions

Conclusion and prospect

Conclusion

We presented a method that justify the common use of derivatives aspredictors and garanties no dramatic loss of information.

Prospects

1 Including errors in the sampling of X (related to errors-in-variablesmodels);

2 Adding regularity assumptions on E(Y |X = .) to obtain a convergencerate and relate d to n.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25

Page 67: Classification and regression based on derivatives: a consistency result for sampled functions

Conclusion and prospect

Conclusion

We presented a method that justify the common use of derivatives aspredictors and garanties no dramatic loss of information.

Prospects

1 Including errors in the sampling of X (related to errors-in-variablesmodels);

2 Adding regularity assumptions on E(Y |X = .) to obtain a convergencerate and relate d to n.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25

Page 68: Classification and regression based on derivatives: a consistency result for sampled functions

Conclusion and prospect

Conclusion

We presented a method that justify the common use of derivatives aspredictors and garanties no dramatic loss of information.

Prospects

1 Including errors in the sampling of X (related to errors-in-variablesmodels);

2 Adding regularity assumptions on E(Y |X = .) to obtain a convergencerate and relate d to n.

TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25

Page 69: Classification and regression based on derivatives: a consistency result for sampled functions

Conclusion and prospect

Some referencesBerlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics.Kluwer Academic Publisher.

Faragó, T. and Györfi, L. (1975).On the continuity of the error distortion function for multiple-hypothesis decisions.IEEE Transactions on Information Theory, 21(4):458–460.

Kimeldorf, G. and Wahba, G. (1971).Some results on Tchebycheffian spline functions.Journal of Mathematical Analysis and Applications, 33(1):82–95.

Ragozin, D. (1983).Error bounds for derivative estimation based on spline smoothing of exact or noisydata.Journal of Approximation Theory, 37:335–355.

Steinwart, I. (2002).Support vector machines are universally consistent.Journal of Complexity, 18:768–791.

Thank you for your attention !TSE Seminar (29/04/09) Nathalie Villa Derivatives based regression 25 / 25