Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally deﬁne kernel functions ... Recall: Kernel Magic

Lecture 3Theory of Kernel Functions

Pavel Laskov1 Blaine Nelson1

1Cognitive Systems GroupWilhelm Schickard Institute for Computer ScienceUniversitat Tubingen, Germany

Advanced Topics in Machine Learning, 2012

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 1 / 47

Part I

Introduction: Kernel Functions


Overview

In this lecture, we will formally define kernel functions

Recall: advantages of kernel-based learning:1 Kernels allow for learning in high-dimensional feature spaces without explicit

mapping into feature space2 Kernels make learning in high-dimensional feature spaces computationally

feasible3 Kernel methods learn non-linear function with the machinery of algorithms

for learning linear functions4 Kernels provide an abstraction that separates data representation & learning

Questions to be addressed:1 What properties do kernels have & what properties does a function need to

be a kernel?2 How can we verify that a kernel function is valid?3 How does one construct a kernel function?


Recall: Kernel MagicExample 2: 2-dimensional Polynomials of Degree 3

Consider a (slightly modified) feature space for 2-dimensionalpolynomials of degree 3:

Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]




Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]

Let us compute the inner product between two points in the featurespace:

Φ(x)⊤Φ(y) = x31 y31 + x32 y

32 + 3x21 x2y

21 y2 + 3x1x

22 y1y

22 + 3x21 y

21 + 3x22 y

22

+ 6x1x2y1y2 + 3x1y1 + 3x2y2 + 1

= (x1y1 + x2y2 + 1)3




Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]

Let us compute the inner product between two points in the featurespace:

Φ(x)⊤Φ(y) = x31 y31 + x32 y

32 + 3x21 x2y

21 y2 + 3x1x

22 y1y

22 + 3x21 y

21 + 3x22 y

22

+ 6x1x2y1y2 + 3x1y1 + 3x2y2 + 1

= (x1y1 + x2y2 + 1)3

Complexity: 3 multiplications instead of 10.


Kernel Questions

Which of the following functions are kernels?

κ1 (x, z) =∑D

i=1 (xi + zi ) κ2 (x, z) =∏D

i=1 h(xi−ca

)h( zi−ca

)

κ3 (x, z) = − 〈x,z〉‖x‖2‖z‖2

κ4 (x, z) =√

‖x− z‖22 + 1

where h(x) = cos(1.75x) exp(−x2/2)


Part II

Linear Algebra Review


Vector Space

Definition 1

A set X is a vector space (over the reals) if it is closed under an additionoperator ‘+’ (i.e., ∀ x, z ∈ X x+ z ∈ X ) & a scalar multiplicationoperator ‘·’ (i.e., ∀ x ∈ X , a ∈ ℜ a · x ∈ X ) and these operators satisfy;

1 (Additive Associativity) u+ (v + w) = (u+ v) + w

2 (Additive Commutativity) u+ v = v + u

3 (Additive Identity) ∃ 0 ∈ X s.t. ∀ u ∈ X u+ 0 = u

4 (Additive Inverse) ∀ u ∈ X ∃ −u ∈ X s.t. u+ (−u) = 0

5 (Distibutivity) a · (u+ v) = a · u+ a · v & (a + b) · u = a · u+ b · u6 (Multiplicative Associativity) a · (b · u) = (a · b) · u7 (Multiplicative Identity) 1 · u = u

Example: For any D ∈ ℵ, ℜD is a vector space.


Vectors I

A D-dimensional vector x is a list of D-reals in vector space ℜD

Vectors {xi}Ni=1 are linearly dependent if there exists c1, c2, . . . , cD (atleast one not 0) such that

∑Ni=1 ci · xi = 0 ;

otherwise, they are linearly independent.

The inner product of x and z is defined as: x⊤z =∑D

i=1 xi · ziNon-trivial vectors {xi}Ni=1 are orthogonal if for i 6= j , x⊤i xj = 0. They arenormal vectors if for all i , x⊤i xi = 1. They are orthonormal if both hold; i.e.,∀ i , j 〈xi , xj〉 = δi ,j = I [i == j ]A set of orthogonal vectors are linearly independentEuclidean norm of a vector: ‖x‖2 =

√

〈x, x〉Angle between vectors: θ (u, v) = arccos 〈u,v〉

‖u‖‖v‖

Projection onto vector z: projz(x) =〈z,x〉〈z,z〉z


Vectors II

Vectors {xi}Ni=1 spans X if for every x ∈ X , x =∑N

i=1 αixi

{xi}Ni=1 is a basis for X if it spans X & is linearly independent

The dimension of X is the number of elements in any basis of X :Every vector in X can be represented as its projection onto an orthonormalbasis {xi}Ni=1 of X :

z =∑D

i=1 projxi (z) =∑D

i=1 〈xi , z〉 · xi

This is a Fourier decomposition of z in that basis.


Matrices

Matrices are a N × D grid of reals.

A =

A1,1 A1,2 . . . A1,D

A2,1 A2,2 . . . A2,D

......

. . ....

AN,1 AN,2 . . . AN,D

=

−A⊤

1,•−

−A⊤

2,•−...

−A⊤

N,•−

=

| | |A•,1 A•,2 . . . A•,D

| | |

Matrix-Vector Multiplication (x is a D-vector & z is a N-vector):

Ax =

−A⊤

1,•−−A⊤

2,•−...

−A⊤

N,•−

|x|

=

A⊤

1,•xA⊤

2,•x...

A⊤

N,•x

z⊤A =[

z⊤A•,1 z⊤A•,2 . . . z⊤A•,1

]

Matrix-Matrix Multiplication:

AB =

−A⊤

1,•−−A⊤

2,•−...

−A⊤

N,•−

| | |B•,1 B•,2 . . . B•,D

| | |

=

A⊤

1,•B•,1 A⊤

1,•B•,2 . . . A⊤

1,•B•,D

A⊤

2,•B•,1 A⊤

2,•B•,2 . . . A⊤

2,•B•,D

......

. . ....

A⊤

N,•B•,1 A⊤

N,•B•,2 . . . A⊤

N,•B•,D


Matrices

Matrix Multiplication as summations:

[Ax]i =∑

ℓ Ai ,ℓxℓ

[

z⊤A]

j=

∑

k zkAk,j

z⊤Ax =∑

k,ℓ zkAk,ℓxℓ [AB]i ,k =∑

j Ai ,jBj ,k

Special forms of matrices:

Lower Triangular

• 0 0 . . . 0• • 0 . . . 0• • • . . . 0...

......

. . ....

• • • . . . •

Diagonal Matrix

• 0 0 . . . 00 • 0 . . . 00 0 • . . . 0...

......

. . ....

0 0 0 . . . •

Upper Triangular

• • • . . . •0 • • . . . •0 0 • . . . •...

......

. . ....

0 0 0 . . . •


Basic Linear Algebra

Suppose matrix A is an N × D matrix

A square matrix has same # of rows & columns; i.e., N = D

The identity matrix IN is a N × N diagonal matrix of 1’s

For any A, AID = A and INA = A

The transpose of A is denoted by A⊤ (it is D × N

A symmetric matrix is its own transpose: A = A⊤

The inverse of A is denoted by A−1: AA−1 = I & A−1A = I

An orthonormal or unitary matrix is its own inverse:AA⊤ = A⊤A = I. . . both the columns & rows of A form a basis

The rank of matrix A is the maximum number of columns of A that arelinearly independent (i.e., the dimension of its column space). A isfull-rank if rank (A) = min(M,N)


Singularity

Definition 2

A matrix A is singular if there exists some x 6= 0 such that Ax = 0;otherwise, A is nonsingular.

Theorem 3

The following are equivalent:

Matrix A is invertible

Matrix A is nonsingular

Matrix A is full-rank

The spectrum of A does not contain 0; i.e., 0 6∈ eig (A)

If A is square, the (linear) function f (x) = Ax is one-to-one & onto,f (x) = b has at least 1 solution, and f (x) = 0 only has solution x = 0


Eigenvalues & Eigenvectors

Given an N × N matrix A, an eigenvector of A is a non-trivial vector vthat satisfies

Av = λv ;

the corresponding value λ is an eigenvalue

The Rayleigh quotient is defined by

λ =v⊤Av

v⊤v

In fact, the maximum eigen-value/vector pair of A is a solution to

max‖x‖=1

x⊤Ax

x⊤x

with x restricted to have norm 1


Eigen-Decomposition & Deflation

Deflation: for any eigen-value/vector pair (λ,v) of A, the transform

A← A− λvv⊤

deflates the matrix; i.e., v is an eigenvector of A but has eigenvalue 0

A symmetric matrix has N orthonormal eigenvectors {vi} correspondingto N eigenvalues—its spectrum; eig (A)

λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λN (A)

Eigen-vectors/values form orthonormal matrix V & diagonal matrix Λ

V =

| | |v1 v2 . . . vN| | |

Λ =

λ1 (A) 0 . . . 00 λ2 (A) . . . 0...

.... . .

...0 0 λN (A)

which form the eigen-decomposition of A: A = VΛV⊤


The Spectral Theorem

Theorem 4

If A is a symmetric N × N real-valued matrix, it can be written as

A =

N∑

i=1

λN (A) viv⊤i

where (λi , vi ) are eigen-value/vector pairs of A. This is called the spectraldecomposition of A

For a matrix with rank K < N, the spectral decomposition of A onlyhas K summands


Matrix Functions

Properties of diagonal matrix D with entries Di ,i :

For k = 0, 1, 2, . . .: Dk is diagonal with entries [Dk ]i ,i = (Di ,i)k

If ∄ i s.t. Di ,i = 0 then D−1 exists, is diagonal, & [D−1]i ,i = (Di ,i)−1

√D is diagonal with entries [

√D]i ,i =

√

Di ,i

Functions of A are defined by its eigen-decomposition A = VΛV⊤ &the fact that V⊤V = I

For k = 0, 1, 2, . . . Ak = VΛkV⊤

If A is non-singular, then A−1 = VΛ−1V⊤√A = V

√ΛV⊤ (Note: this satisfies

√A√A = A)

exp (A) = V exp (Λ)V⊤

log (A) = V log (Λ)V⊤


Part III

Positive (Semi-)Definiteness


Positive (Semi-)Definite Matrices

Definition 5 (Positive Semi-Definite Matrix)

Matrix A is positive semi-definite (PSD) if all its eigenvalues arenon-negative (∀ i λi (A) ≥ 0); i.e., for all x ∈ X :

x⊤Ax ≥ 0

from the Rayleigh quotient. We use A � 0 to denote that A is PSD

Definition 6 (Positive Definite Matrix)

Matrix A is positive definite if all its eigenvalues are positive(∀ i λi (A) > 0); i.e., A is PSD &

x⊤Ax = 0 ⇔ x = 0

We denote this as A ≻ 0.


PSD Matrices

Proposition 7

Matrix A is PSD iff there exists a real matrix B such that A = B⊤B

ProofCase ⇐: Suppose A = B⊤B, then for any x

x⊤Ax = x⊤B⊤Bx = ‖Bx‖2 ≥ 0

Case ⇒: If A � 0 then its eigen-decomposition (A = VΛV⊤) has onlynon-negative eigenvalues and thus,

√Λ is a real-valued matrix. Thus, let

B =√ΛV⊤ and we have

B⊤B = V√Λ√ΛV⊤ = VΛV⊤ = A


Part IV

Reproducing Kernel Hilbert Spaces


Inner Product Space

Definition 8

An inner product space X is a vector space with an associated innerproduct 〈·, ·〉 : X × X → ℜ that satisfies:

1 (Symmetry) 〈x, z〉 = 〈z, x〉2 (Linearity) 〈a · x, z〉 = a · 〈x, z〉 & 〈w + x, z〉 = 〈w, z〉+ 〈x, z〉3 (Positive Semi-Definiteness) 〈x, x〉 ≥ 0

The inner product space is strict if 〈x, x〉 = 0 ⇔ x = 0

A strict inner product space X has a natural norm given by‖x‖2 =

√

〈x, x〉. The associated metric is d (x, z) = ‖x− z‖2The space ℜD has the inner product 〈x, z〉 = x⊤z which yields theEuclidean norm:

‖x‖22 =D∑

i=1

x2i


Hilbert Space

Definition 9

A strict inner product space F is a Hilbert space if it is

1 Complete: Every (Cauchy) sequence {hi ∈ F}∞i=1 such that

limn→∞

supm>n‖hn − hm‖ = 0

converges to an element h ∈ F ; i.e., hi → h

2 Separable: There is a countable subset F = {hi ∈ F}∞i=1 such that forall h ∈ F and ǫ > 0, there exists hi ∈ F such that

‖hi − h‖ < ǫ

Hilbert Space Examples: the interval [0, 1], the reals ℜ, the complexnumbers C, & Euclidean spaces ℜD for D ∈ ℵ.P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 23 / 47

Hilbert Space

Definition 9



limn→∞



Technical Condition required for potentially infinite-dimensional sets


‖hi − h‖ < ǫ


Hilbert Space

Definition 9



limn→∞



Technical Condition required for potentially infinite-dimensional sets


‖hi − h‖ < ǫCondition required to make Hilbert space isomorphisms


Function Spaces

What is a vector? An ordered list of D elements from ℜ indexed by theindex set ID = {1, 2, . . . ,D}. The set of all such lists is ℜD

We can extend this notion to countable sequences x = (x1, x2, . . .) byusing the index set I = ℵ

Inner-product generalizes naturally as

〈x, y〉 =∑

i∈ℵ

xi · yi

However, we need additional restrictions to make such spaces well-behavedThe subspace ℓ2 for which ∀ x 〈x, x〉 <∞ is a Hilbert space

Further, a function f : X → ℜ maps each x ∈ X to exactly one y ∈ ℜ;i.e., it is also a vector with an uncountable index set (e.g., I = ℜD)

Inner-product again generalizes naturally as

〈f , g〉 =∫

X

f (x) g (x)dx

Subspace L2 (X ) defined on X , a compact subspace of ℜD , for which∀ f ∈ L2 (X ) , 〈f , f 〉 <∞ is a Hilbert space


Properties of (Separable) Hilbert Spaces

Hilbert space F is isomorphic to H if there is a one-to-one linearmapping T : F → H such that for all x, z ∈ F

〈T (x),T (z)〉H = 〈x, z〉F

Every separable Hilbert space (A) of dimension D is isomorphic to ℜD

and (B) of infinite dimension is isomorphic to ℓ2

Since Hilbert space F is isomorphic to ℜD or ℓ2, F has an orthonormalbasis {φi} & element in x ∈ F have a Fourier decomposition:

x =∑

i 〈φi , x〉F · φi


Part V

Characterizing Kernel Functions


Kernel Terminology

Definition 10

A kernel is a two-argument real-valued function over X ×X(κ : X × X → ℜ) such that for any x, z ∈ X

κ (x, z) = 〈φ (x) , φ (z)〉F (1)

for some inner-product space F such that ∀ x ∈ X φ (x) ∈ F

Kernel functions must be symmetric since inner products are symmetric

To show that κ is a valid kernel, it is sufficient to show that a mapping φexists that yields Eq. 1. However, this is generally difficult to construct.

In this rest of this lecture, we will demonstrate additional ways toconstruct & validate kernels


Kernel Matrices

Definition 11

A kernel matrix (or Gram matrix) K is the matrix that results fromapplying κ to all pairs of datapoints in set {xi}Ni=1

K =

κ (x1, x1) κ (x1, x2) . . . κ (x1, xN)κ (x2, x1) κ (x2, x2) . . . κ (x2, xN)

......

. . ....

κ (xN , x1) κ (xN , x2) . . . κ (xN , xN)

that is, Ki ,j = κ (xi , xj )

Kernel matrices are square & symmetric


Kernel Matrices are PSD

Proposition 12

Kernel matrices, which are constructed from a kernel corresponding to astrict inner product space F , are PSD.

ProofBy definition of a kernel matrix, for all i , j ∈ 1, . . . ,N

Ki ,j = κ (xi , xj ) = 〈φ (xi ) , φ (xj )〉FThus, for any v ∈ ℜN :

v⊤Kv =∑

i ,j viKi ,jvj =∑

i ,j vi 〈φ (xi ) , φ (xj )〉F vj

=⟨

∑

i viφ (xi ),∑

j vjφ (xj)⟩

F

= ‖∑

i viφ (xi )‖2F ≥ 0


Reproducing Kernel Function

Definition 13 (Reproducing Kernel Function (Aronszajn, 1950) [1])

Suppose F is a Hilbert space of functions over X ; the functionκ : X ×X → ℜ is a reproducing kernel of F if

1 For every x ∈ X , the function fx (·) = κ (·, x) is in F .2 Reproducing Property: for every z ∈ X and every f ∈ F

f (z) = 〈f , κ (·, z)〉F

Further, the space is called a Reproducing Kernel Hilbert Space (RKHS)

By 1st property & closure of F , for any αi ∈ ℜ and xi ∈ X , we have∑N

i=1 αi · κ (·, xi ) ∈ XApplying fx from 1st property to 2nd property, for any x, z ∈ X , we have

κ (x, z) = 〈κ (·, x) , κ (·, z)〉FP. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 30 / 47

Kernel Functions

Definition 14 (Finitely Positive Semi-definite)

A function κ : X × X → ℜ is finitely positive semi-definite (FPSD) if

It is symmetric; i.e., ∀ x, z ∈ X κ (x, z) = κ (z, x)

The matrix K formed by applying κ to any finite subset of X is positivesemi-definite: K � 0


Kernel Functions I

Theorem 15

κ : X ×X → ℜ (either continuous or with a countable domain) is FPSDiff ∃ Hilbert space F with feature map φ : X → F s.t.

κ (x, z) = 〈φ (x) , φ (z)〉

ProofCase ⇐: Follows from Proposition 12.Case ⇒: Suppose κ if FPSD & we construct Hilbert Space Fκ with κ asits reproducing kernel; i.e., Fκ is the closure of functions: fx (·) = κ (·, x).Thus, for any αi , xi , g (·) = ∑

i αiκ (·, xi ) is in Fκ &, by the reproducingproperty,

〈g , g〉 = ∑

i ,j αiαjκ (xi , xj ) = α⊤Kα

where K is the kernel matrix {xi}, & thus α⊤Kα ≥ 0 since K � 0.


Kernel Functions II

(Completeness) Follows from the Cauchy-Schwarz inequality, but beyondthe scope of this course.

(Separability) Separability follows from κ being continuous or having acountable domain, but is not shown here.

Finally, the mapping φ is specified by κ and φ (x) = κ (·, x) ∈ Fκ .

Note, the inner product defined above is strict since if ‖f ‖ = 0, then forall x, |f (x)| ≤ ‖f ‖ ‖φ (x)‖ = 0


Part VI

Kernel Constructions


Simple Kernels

Clearly, the linear kernel defined by

κlin (x, z) = 〈x, z〉 = x⊤z

is a valid kernel function since it is an inner product in XFor any N × N matrix B � 0,

κB (x, z) = 〈x |B| z〉 = x⊤Bz

is a valid kernel function


Closure Properties of Kernels I

Proposition 16

Suppose κ1 & κ2 are kernels on X , a > 0, f : X → ℜ, φ : X → ℜM , & κ3is a kernel on ℜM . Then these are all kernel functions on X :

16.1 κ (x, z) = κ1 (x, z) + κ2 (x, z)

16.2 κ (x, z) = a · κ1 (x, z)16.3 κ (x, z) = κ1 (x, z) · κ2 (x, z)16.4 κ (x, z) = f (x) f (z)

16.5 κ (x, z) = κ3 (φ (x) , φ (z))


Closure Properties of Kernels II

ProofLet K1 & K2 be the kernel matrices of κ1 & κ2 applied to any set{xi}Ni=1—both these matrices are PSD. Also let α be any N-vector:

(Part 1): K = K1 +K2 ⇒ α⊤Kα = α⊤K1α+α⊤K2α ≥ 0(Part 2): K = aK1 ⇒ α⊤Kα = a · α⊤K1α ≥ 0(Part 3): Take the spectral decomposition of K1 =

∑Ni=1 λiviv

⊤i and

K2 =∑N

i=1 γiwiw⊤i . The spectral decomposition of their element-wise

product, K = K1 ⊙K2, is then K =∑N

i ,j=1

√

λiγj(vi ⊙ wj)(vi ⊙ wj)⊤;

i.e., a summation of rank-1 matrices with positive coefficients ⇒ PSD.(Part 4): κ (x, z) = 〈ψ (x) , ψ (z)〉 where ψ : x 7→ f (x); thus, κ is PSD.(Part 5): Since κ3 is a kernel, applying it to any set of vectors {φ (xi )}Ni=1

yields a PSD matrix.


Closure Properties of Kernels III

The feature spaces for these kernels are as follows:

For kernel κ1 (x, z) + κ2 (x, z), the new feature map is equivalent tostacking the feature maps of κ1 & κ2:

φ (x) =

[

φ1 (x)φ2 (x)

]

For kernel a · κ1 (x, z), its feature space is scaled by√a

For kernel κ1 (x, z) · κ2 (x, z), if φ1 has dimension N1 and φ2 hasdimension N2, φ has N1N2 features given by

[φ (x)]ij = [φ1 (x)]i · [φ2 (x)]j

It follows that the features of κ1 (x, z)d are all monomials of the form

[φ1 (x)]d11 [φ1 (x)]

d22 . . . [φ1 (x)]

dNN

∑

i di = d


Additional Kernel Functions

Proposition 17

Suppose κ1 is a kernel on X & p : ℜ → ℜ is a polynomial withnon-negative coefficients. Then, the following are kernels:

1 κ (x, z) = p (κ1 (x, z))

2 κ (x, z) = exp (κ1 (x, z))

3 Gaussian or RBF kernel: κ (x, z) = exp(

−‖x−z‖222σ2

)

Proof(Part 1) Constructing a polynomial kernel from base kernel κ1 proceedsdirectly from Proposition 16.1, 16.2, & 16.3(Part 2) Consider that exp (x) = 1+ x + 1

2x2 + . . . 1

i !xi + . . .. Thus, it is a

limit of polynomials & the PSD property is closed under pointwise limits.(Part 3) Left as an exercise.


Common Kernel Functions

Linear Kernel: κlin (x, z) = x⊤z

Polynomial Kernel: κpoly (x, z) = (x⊤z+ R)d

RBF Kernel: κrbf (x, z) = exp(

−‖x−z‖222σ2

)


Kernel Questions

Which of the following functions are kernels?

κ1 (x, z) =∑D

i=1 (xi + zi ) κ2 (x, z) =∏D

i=1 h(xi−ca

)h( zi−ca

)

κ3 (x, z) = − 〈x,z〉‖x‖2‖z‖2

κ4 (x, z) =√

‖x− z‖22 + 1

where h(x) = cos(1.75x) exp(−x2/2)κ1 is not a kernel. Consider x1 =

[

1 0]⊤

& x2 =[

0 2]⊤

. Their kernelmatrix has eigenvalues −1 and 5.κ2 is a kernel because it can be written as the product f (x)f (z) wheref (x) =

∏Di=1 h(

xi−ca

)κ3 is not a kernel because it is the negation of a valid non-trivial kernel& thus will have negative eigenvalues

κ4 is not a kernel. Consider x1 =[

1 0]⊤

& x2 =[

0 1]⊤

. Again, theirkernel matrix has a negative eigenvalue


Part VII

Transforming Kernel Matrices


Operations on Kernel MatricesSimple Transformations

Adding a non-negative constant to the Kernel Matrix: corresponds toadding a new constant feature to each training example; i.e., given thematrix Φ of features such that K = ΦΦ⊤,

[

Φ c1]

∗[

Φ c1]⊤

= K+ c211⊤

Adding a non-negative constant to its diagonal: corresponds to addingan indicator feature for every data point

φ (x1) c 0 . . . 0φ (x2) 0 c . . . 0

......

.... . .

...φ (xN) 0 0 . . . c

φ (x1) c 0 . . . 0φ (x2) 0 c . . . 0

......

.... . .

...φ (xN) 0 0 . . . c

⊤

= K+ c2I


Operations on Kernel MatricesCentering Data

Suppose we want to translate the origin to the data’s center of mass. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12




x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x




x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x




x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x

As we will see next lecture, this transformation can be expressed askernel transform

K← K− 1N11⊤K− 1

NK11⊤ + 1⊤K1

N2 11⊤


Operations on Kernel MatricesNormalizing Data

Suppose we want to project all data to be norm 1; i.e., ‖x‖ = 1. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12




x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12




x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12




x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

This transformation can be achieved using only the information from thekernel matrix:

κ (x, z) =κ (x, z)

√

κ (x, x) κ (z, z)


Summary

We explored a formal framework for kernels

We saw a formal defintion for kernel functions & matrices

We saw the properties that kernels must exhibit and how thoseproperties can be used to validate kernel functions & construct newkernels from existing kernels

We explored some operations that allow us to manipulate data infeature space

Next Lecture: we will see basic kernel-based learning algorithms

We will explore how to take the mean of data in feature space & use that toconstruct a novelty detection algorithmWe will explore how to project data in feature space & use that for a basicsubspace algorithm


Bibliography I

The Majority of the work from this talk can be found in the lecuture’saccompanying book, “Kernel Methods for Pattern Analysis.”

[1] N. Aronszajn. Theory of reproducing kernels. Transactions of theAmerican Mathematical Society, 68(3):337–404, 1950.


Documents

Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally deﬁne kernel functions ... Recall: Kernel Magic