20
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation Research Department of Electrical and Computer Engineering University of Maryland, College Park

ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

Embed Size (px)

Citation preview

Page 1: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods

Shaohua (Kevin) Zhou

Center for Automation Research

Department of Electrical and Computer Engineering

University of Maryland, College Park

Page 2: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Overview

• Reproducing Kernel Hilbert Space (RKHS)– From RN to RKHS

• Regularization Theory with RKHS– Regularization Network (RN)

– Support Vector Regression (SVR)

– Support Vector Classification (SVC)

• Kernel Methods– Kernel Principal Component Analysis (KPCA)

– More examples

Page 3: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Vector Space RN

• Positive definite matrix S=[si(j)]

– S = [s1,s2,…,sN]

– Eigensystem: S = n=1:N nnnT

• Inner product <f,g>= fT S-1g– <f,g>= n n

-1 fTnnTg = n n

-1(f,n(g,n

– (u,v) = uTv, regular inner product

• Two properties:– <si, sj>= si

T S-1 sj= siTej = si(j)

– <si, f>= siTS-1 f = ei

Tf = f(i) with f=[f(1),f(2),…,f(N)]T

Page 4: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Reproducing Kernel Hilbert Space (RKHS)

• Positive kernel function kx(.)=k(x,.)– Mercer’s theorem

– Eigensystem : k(x,y)=n=1:∞ nn(x)n(y) with n=1:∞ n

2<∞

• Inner product <f,g>H

– <f,g>H = n n-1(f,n(g,n

– (u,v) = ∫u(y)v(y)dy, regular inner product

• Two properties:– <kx,ky>H = k(x,y)

– <kx,f>H = f(x) reproducing property

Page 5: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

More on RKHS

• Let f(y) be an element in RKHS– f(y) = n=1:∞ an n(y)

– (f, n) = an

– <f,f>H= n=1:∞ n-1an

2

• One particular function f(y)– f(y) = i=1:n ci k(y,xi)

– Is f(y) in the RKHS?

– <f,f>H = i=1:n j=1:n ci cj k(xi,xj) = cT K c with c=[c1,c2,…, ci]T and K=[k(xi,xj)] the Gram matrix

Page 6: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

More on RKHS

• Nonlinear mapping : RN R∞

– (x)=[11/21(x),…, n

1/2n(x),…]T

• Regular inner product in feature space R∞

– ((x),(y)) = (x)T(y) = n=1:∞ n

1/2n(x)n1/2n(y)

= k(x,y) = <kx, ky>H

Page 7: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Kernel Choices

• Gaussian kernel or RBF kernel – k(x,y)=exp(- -2 ||x-y||2)

• Polynomial kernel– k(x,y) = ((x,y)+d)p

• Construction rule– Covariance function of Gaussian processes

– k(x,y) = ∫g(x,z)g(z,y)dz– k(x,y) = c, c>0

– k(x,y) = k1(x,y) + k2(x,y)

– k(x,y) = k1(x,y) * k2(x,y)

Page 8: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Regularization Theory

• Regularization task– min f inH J(f) = [i=1:n L(yi,f(xi)) + <f,f>H],

where L is lost function and <f,f>H is a stabilizer.

• Optimal solution– f(x)= i=1:n ci k(x,xi) = [k(x,x1),

…,k(x,xn)]c

– {hi(x)=k(x,xi); i=1,…,n} are basis functions

– Optimal coefficients {ci; i=1,…,n} depend on the function L and

Page 9: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Regularization Network (RN)

• RN assumes a quadratic loss function– min f inH J(f) = [i=1:n (yi-f(xi))2 + <f,f>H]

• Find {ci}

– [f(x1), f(x2), …, f(xn)]T = Kc

– J(f) = (y-Kc)T (y-Kc) + cTKc– c = (K+ )-1 y

• Practical considerations– One term of intercept f(x) = i=1:n ci k(x,xi)+b

– Too many coefficients Support vector regression (SVR)

Page 10: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Support Vector Regression (SVR)

• SVR assumes an –insensitive loss function– min f inH J(f) = [i=1:n |yi-f(xi)|+ <f,f>H],

with |x| = max(0, |x|-)

• Primal problem– min J(f,,)= i=1:n (i+i) + <f,f>H

– s.t. (1) f(xi)-yi<= i; (2) yi-f(xi)<= i ;(3)i >=0; (4) i >=0

– Quadratic programming (QP) Dual problem

– xi is called support vector (SV) if its Langrange multipler is nonzero

Page 11: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Support Vector Classification (SVC)

• SVR assumes a soft margin loss function– min f inH J(f) =[i=1:n|1-yif(xi)| +<f,f>H],

with |x| = max(0, x)– Determine the label of x as

sgn(i ciyik(x,xi)+b)

• Primal problem– min J(f,)= i=1:n i + <f,f>H

– s.t. (1) 1- yif(xi)<= i; (2)i >=0; – Quadratic programming (QP) Dual problem– xi is called support vector (SV) if its Langrange

multipler is nonzero

Page 12: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Kernel Methods

• General strategy of kernel methods– Nonlinear mapping : RN R∞ embedded in the

kernel function

– Linear learning methods employing geometry / linear algebra

– Kernel trick: cast all computations in dot product

Page 13: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Gram Matrix

• Gram matrix (dot product matrix, kernel matrix)

– Covariance matrix of any Gaussian process for any finite sample

– Combines the information of the data and the kernel

– Contains all needed information for the learning kernel

– K = [k(xi,xj)] = [(xi)T(xj)] = T where = [(x1), (x2),…, (xn)]

Page 14: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Geometry in the RKHS

• Distance in the RKHS– ((x)-(y))T((x)-(y))

= (x)T(x)+(y)T (y)–2(x)T(y)= k(x,x) + k(y,y)- 2k(x,y)

• Distance to center– 0 = i=1:n (xi)/n = 1/n– ((x)-0)T((x)-0)

= (x)T(x) + 0T 0 – 2 (x)T0

= k(x,x) + 1TT1/n2 – 2 (x)T 1/n= k(x,x) + 1TK1/n2 – 2 g(x)T1/n

– g(x) = T(x) = [k(x,x1),…,k(x,xn)]T

Page 15: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Geometry in the RKHS

• Centered distance in the RKHS– ( (x) – 0 )T( (y) – 0 )

= (x)T(y) + 0T0 – (x)T 0 – (y)T 0

= k(x,y) +1TK1/n2-g(x)T1/n-g(y)T1/n

• Centered Gram matrix– K^=[(x1)–0,…,(xn)–0]T[(x1)–0,…,(xn)–0]

= [11T/n]T[11T/n]= [Q]T[Q] = TQTKQQ = In-11T/n

Page 16: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Kernel Principal Component Analysis (KPCA)

• Kernel PCA– Mean 0 = i=1:n (xi)/n = 1/n

– Covariance matrix

C = n-1[(x1)–0,…,(xn)–0][(x1)–0,…,(xn)–0]T

= n-1[Q][Q]T = n-1T; Q

• Eigensystem of C– The ‘reciprocal’ matrix: Tu= K^u= u– n-1Tu= n-1u; Cv= n-1v; v= u– Normalizaton: vTv= uT K^u= uTu=

v~ = u-1/2

Page 17: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Kernel Principal Component Analysis (KPCA)

• Eigen-projection– ((x)–0)T v~ = ((x)–0)T Qu-1/2

= (x)TQu-1/2 - 1TTQu-1/2 /n= g(x)TQu-1/2 - 1TKQu-1/2 /n

Page 18: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Kernel Principal Component Analysis (KPCA)

Contour plots of PCA features

Page 19: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

More Examples of Kernel Methods

• Examples– Kernel Fisher Discriminant Analysis (KFDA)

– Kernel K-Means Clustering

– Spectral Clustering and Graph Cutting

– Kernel …

– Kernel Independent Component Analysis (KICA) ?

Page 20: ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation

ENEE698AGraduate Seminar

Summary of Kernel Methods

• Pros and Cons– Nonlinear embedding

– Linear algorithm

– Large storage requirement

– Computational inefficiency

• Important Issues– Kernel selection and design