51
Kernel Methods and Semi-Supervised Learning Zal´ anBod´o The aims of the short course Top 10 data mining algorithms Machine learning. Supervised learning Kernel methods Kernels Centroid method kNN SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation Graph-based distances Semi-supervised kNN and k-means Cluster kernels Laplacian SVM Kernel Methods and Semi-Supervised Learning Zal´ an Bod´ o Faculty of Mathematics and Computer Science, Babe¸ s–Bolyai University, Cluj-Napoca/Kolozsv´ ar Szeged, March 2012 1/41

Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Kernel Methods and Semi-SupervisedLearning

Zalan Bodo

Faculty of Mathematics and Computer Science,Babes–Bolyai University, Cluj-Napoca/Kolozsvar

−15 −10 −5 0 5 10 15 20 25−10

−5

0

5

10

15

Szeged, March 20121/41

Page 2: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Contents

The aims of the short courseTop 10 data mining algorithms

Machine learning. Supervised learning

Kernel methodsKernelsCentroid methodkNNSVMk-means

Semi-supervised learningAssumptions in SSLClasses of SSLSelf-trainingData graphsLabel propagationGraph-based distancesSemi-supervised kNN and k-meansCluster kernelsLaplacian SVM

2/41

Page 3: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

The aims of the short courseI to get an overview of kernel methods

I to get an overview of semi-supervised classification

I kernel methods: change the kernel ⇒ obtain a newalgorithm

I semi-supervised learning: use a large unlabeled dataset

I semi-supervised kernels: use a supervised algorithm with asemi-supervised kernel to obtain a semi-supervised learningmethod

I no need to develop semi-supervised methods

I use a supervised algorithm with a good – and replaceable –semi-supervised kernel

3/41

Page 4: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Top 10 data mining algorithms – of all times

1. C4.5

2. k-Means ←3. SVM ←4. Apriori

5. EM

6. PageRank (←)

7. AdaBoost

8. kNN ←9. Naive Bayes

10. CART

I website: http://www.cs.uvm.edu/∼icdm/algorithms/index.shtml

I article: http://www.cs.uvm.edu/∼icdm/algorithms/10Algorithms-08.pdf

I slides with results:http://www.cs.uvm.edu/∼icdm/algorithms/ICDM06-Panel.pdf

4/41

Page 5: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Machine learning. Supervised learningI machine learning (ML): subdomain of AI, modeling the most

important brain activities: classification, differentiation,prediction

I supervised learning: examples with teaching instructionsI example: learn to differentiate between relevant and spam

emails

I unsupervised learning: no teaching instructions; groupsimilar points into clusters

I example: find products similar to a given one

I semi-supervised learning: halfway between supervised andunsupervised learning

I example: learn to classify handwritten digits based on a fewlabeled and a large set of unlabeled examples

spamham (1200x287x16M jpeg)

5/41

Page 6: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

0.5

1

1.5

2

2.5

3

3.5

I training data: Dtrain = {(xxx i , yi ) | i = 1, . . . ,N}, xxx i ∈ X ,yi ∈ Y

I X is the input space, xxx i ’s are vectors

I Y is the label set, yi ’s are scalars

I goal: find f : X → Y to approximate f given by Dtrain

I if |Y | = 2 ⇒ binary classificationI if |Y | > 2 ⇒ multi-class classificationI if f : X → Y ⇒ single-label classificationI if f : X → 2Y ⇒ multi-label classification

6/41

Page 7: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Kernel methodsI James Mercer (1909): any continuous symmetric, positive

semi-definite kernel function can be expressed as a dotproduct in a high-dimensional space

I M. Aizerman, E. Braverman, and L. Rozonoer – 1964

I Boser, Guyon, and Vapnik – 1992 (Support Vector Machine)

I linear algorithm → non-linear algorithm

I φ: feature mapping

I kernels: k(xxx ,zzz) = 〈φ(xxx), φ(zzz)〉, φ : X → H, X ⊆ Rn i.e.cosine of the angle enclosed by the vectors in ahigh-dimensional space

I covers all geometric constructions that can be formulated interms of angles, lengths and distances

I any positive definite kernel is a dot product in another space;any mapping φ to a dot product space defines a positivedefinite kernel by 〈φ(xxx), φ(zzz)〉

7/41

Page 8: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

FEJEZET 6. KERNEL MÓDSZEREK 132

×××

×××

◦◦◦

◦◦◦ 00.2

0.40.6

0.81 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

X22

X12

X 1X 2

(a) (b)

6.1. ábra. Az XOR feladat – bal oldali ábra – és a nemlineáris vetítés, mely szeparál-hatóvá teszi az adatokat – jobbra

tól a legtávolabb van. Ez összekapcsolható a 2.4. fejezetben említett regularizációmódszerével, ugyanis ez a hipersík a legstabilabb a lehetséges hipersíkok közül. Amódszer kernelesített alkalmazásáig sok évnek kellett eltelnie: a paradigma a modernszámítógépek kapacitásának nagymértéku bovülésével lett népszeru.

19. példa. []„Mi újat hoz egy lineáris módszer kernelesítése?”A kérdésre az XOR feladattal válaszolunk a 4.4 részbol. A feladat a 6.1.a ábránlátható négy pont két osztályba való sorolása, ahol a osztályok kódjai az XOR függvénykimenetei. A pontjaink

X =

{(0, 0), (0, 1), (1, 0), (1, 1)

}

és hozzájuk a {−1, 1, 1,−1} címkék tartoznak, ebben a sorrendben. Korábban beláttuk,hogy az adatok lineárisan nem szétválaszthatók – a 6.1.a ábra pontjait nem tudjuk ket-téosztani úgy, hogy a két kör egy félsík egyik felén, a két „X” a félsík másik felén legyen.A megoldáshoz az adatokat vetítjük a következo függvény szerint:

φ(xxx) =[x21, x

22,√2x1x2

]T(6.1)

a háromdimenziós euklidészi térbe, ahol xk az xxx vektor k-adik komponensét jelöli. A

kutató, majd 2002-tol a NEC Princeton-i kutatócsoportjában csatlakozik a „MachineLearning” csoporthoz. 1995 óta a londoni Royal Holloway egyetem, 2003 óta pediga New York-i Columbia egyetem professzora.Kutatási eredményei közül fontosak a „szupport vektor gépek” és az ezt megalapozóVapnik-Chervonenkis elmélet. Wikipedia Vapnik-ról

00.2

0.40.6

0.81 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

X22

X12

X1X

2

I we want to solve the XOR-problem (shown above) by a linearseparator; data: (0, 0), (1, 1), (1, 0), (0, 1)

I easy to observe: this cannot be done in the initial space

I lets use the following mapping: φ(xxx) =[x2

1 x22

√2x1x2

]′

I then the dot product:φ(xxx)′φ(zzz) = x2

1 z21 + 2x1z1x2z2 + x2

2 z22 = (xxx ′zzz)2 = k(xxx ,zzz)

I it is called the second order homogeneous polynomial kernel

8/41

Page 9: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

General purpose kernels

I linear (dot product):

k(xxx ,zzz) = 〈xxx ,zzz〉

I polynomial:k(xxx ,zzz) = (a〈xxx ,zzz〉+ b)c

I RBF (Gaussian):

k(xxx ,zzz) = exp

(−‖xxx − zzz‖2

2σ2

)

I sigmoid:k(xxx ,zzz) = tanh(a〈xxx ,zzz〉+ r)

9/41

Page 10: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

General purpose kernels

I linear (dot product):

k(xxx ,zzz) = 〈xxx ,zzz〉

I polynomial:k(xxx ,zzz) = (a〈xxx ,zzz〉+ b)c

I RBF (Gaussian):

k(xxx ,zzz) = exp

(−‖xxx − zzz‖2

2σ2

)

I sigmoid:k(xxx ,zzz) = tanh(a〈xxx ,zzz〉+ r)

9/41

Page 11: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

General purpose kernels

I linear (dot product):

k(xxx ,zzz) = 〈xxx ,zzz〉

I polynomial:k(xxx ,zzz) = (a〈xxx ,zzz〉+ b)c

I RBF (Gaussian):

k(xxx ,zzz) = exp

(−‖xxx − zzz‖2

2σ2

)

I sigmoid:k(xxx ,zzz) = tanh(a〈xxx ,zzz〉+ r)

9/41

Page 12: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

General purpose kernels

I linear (dot product):

k(xxx ,zzz) = 〈xxx ,zzz〉

I polynomial:k(xxx ,zzz) = (a〈xxx ,zzz〉+ b)c

I RBF (Gaussian):

k(xxx ,zzz) = exp

(−‖xxx − zzz‖2

2σ2

)

I sigmoid:k(xxx ,zzz) = tanh(a〈xxx ,zzz〉+ r)

9/41

Page 13: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Kernel trickGiven an algorithm which is formulated in terms of a positivedefinite kernel k, one can construct an alternative algorithm byreplacing k by another positive definite kernel k .

10/41

Page 14: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Distances ↔ kernels

1. kernel → (squared) distance:

DDD(2) = diag(KKK )111NN + 111NNdiag(KKK )− 2KKK

2. distance → kernel:

−1

2JJJDDD(2)JJJ = −1

2JJJdiag(KKK )111NNJJJ − 1

2JJJ111NNdiag(KKK )′JJJ + JJJKKKJJJ

= JJJKKKJJJ

where JJJ = III − 1N111111′ = III − 1

N111NN is the centering matrix(since 111NNJJJ = 000)

11/41

Page 15: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Centroid method

Outline:

I learn: calculate the class centroids

I classify: label an unseen example as belonging to the classwith closest centroid

+ +

+ +

+ +

– –

c+ c–

x

.

12/41

Page 16: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Binary classification:

m+ = |{xxx i |yi = +1}|,m− = |{xxx i |yi = −1}|

centers:

ccc+ =1

m+

{i|yi=+1}

xxx i

ccc− =1

m−

{i|yi=−1}

xxx i

Let www := ccc+ − ccc− and ccc := (ccc+ + ccc−)/2; label will be determinedby 〈www ,xxx − ccc〉

y = sgn〈xxx − ccc ,www〉= sgn (〈ccc+,xxx〉 − 〈ccc−,xxx〉+ b) ,

where b :=(‖ccc−‖2 − ‖ccc+‖2

)/2

13/41

Page 17: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Substituting the corresponding expressions for ccc+ and ccc− weobtain

y = sgn

1

m+

{i|yi=+1}

〈xxx ,xxx i 〉 −1

m−

{i|yi=−1}

〈xxx ,xxx i 〉+ b

= sgn

1

m+

{i|yi=+1}

k(xxx ,xxx i )−1

m−

{i|yi=−1}

k(xxx ,xxx i ) + b

,

where

b :=1

2

1

m2−

{(i,j)|yi=yj=−1}

k(xxx i ,xxx j)−1

m2+

{(i,j)|yi=yj=+1}

k(xxx i ,xxx j)

and k(·, ·) is an arbitrary kernel function.

14/41

Page 18: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

For multi-class classification:

ccc i =1

|Ci |∑

xxx j∈Ci

xxx i

The Euclidean distance can be written in terms of the dot product:

‖xxx − zzz‖2 = xxx ′xxx − 2xxx ′zzz + zzz ′zzz = k(xxx ,xxx)− 2k(xxx ,zzz) + k(zzz ,zzz)

Since distances from the centroids are to be considered, we cancalculate:

‖xxx − ccc i‖2 = ‖xxx − 1

|Ci |∑

xxx j∈Ci

xxx i‖2

= xxx ′xxx − 2

|Ci |∑

xxx j∈Ci

xxx ′xxx j +1

|Ci |2∑

xxx j∈Ci ,xxxk∈Ci

xxx ′jxxxk

= k(xxx ,xxx)− 2

|Ci |∑

xxx j∈Ci

k(xxx ,xxx j) +1

|Ci |2∑

xxx j∈Ci ,xxxk∈Ci

k(xxx j ,xxxk)

15/41

Page 19: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

k-Nearest Neighbors

Outline:

I seemingly no separate learning phase

I classify: find the k nearest neighbors of the example inquestion and label with the majority class

1

2

2

2

34

5

5

3

33

−10 −5 0 5 10 15 20

−10

−5

0

5

10

15

16/41

Page 20: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

I assign the label which is in majority among the k-nearestneighbors

f (xxx) = argmaxc=1,2,...,K

zzz∈Nk (xxx),zzz∈Dtrain

sim(zzz ,xxx) · δ(c , f (zzz))

(we use sim(zzz ,xxx) = 1,∀zzz ,xxx)

δ(a, b) =

{1, a = b0, otherwise

(Kronecker delta)

I to determine Nk(xxx) the Euclidean distance is used

I as before, we can rewrite the Euclidean distance using dotproducts:

‖xxx − zzz‖2 = xxx ′xxx − 2xxx ′zzz + zzz ′zzz = k(xxx ,xxx)− 2k(xxx ,zzz) + k(zzz ,zzz)

17/41

Page 21: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Support Vector Machines

Outline:

I learn: find a separating hyperplane – with maximal margin –that separates the positive and negative examples

I classify: on one side of the hyperplane there are the positivepoints, on the other side the negative points

I large-margin separation: the probability of misclassification ofan unseen example to be minimized

I this hyperplane is determined only by the points which lie onthe marginal hyperplanes → Support Vector Machines(SVMs)

I setting: binary classification

18/41

Page 22: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Hyperplane:www ′xxx i + b ≥ 1 ∀xxx i s.t. yi = 1

www ′xxx i + b ≤ −1 ∀xxx i s.t. yi = −1

Decision function:f (xxx) = sgn(www ′xxx + b)

I margin of the separating hyperplane: 2/‖www‖I optimization problem:

min F (www , b) =1

2‖www‖2

subject to yi (www′xxx i + b) ≥ 1, i = 1, . . . , `

19/41

Page 23: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Linearly non-separable case

I introduce slack variables; show the level of violation of linearseparability

I new conditions:

www ′xxx i + b

{≥ 1− ξi if yi = 1

≤ −1 + ξi if yi = −1

I new optimization problem:

min F (www , b, ξξξ) =1

2‖www‖2 + C

i=1

ξi

s.t. yi (www′xxx i + b) ≥ 1− ξi , ξi ≥ 0, i = 1, . . . , `

20/41

Page 24: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

k-means

Outline:

I clustering method; no supervised information is given

I goal: cluster data into K sets, such that similar points aregrouped in the same set

I determine K cluster centers

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

21/41

Page 25: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

I objective function:

F =K∑

i=1

N∑

j=1

Uijd(xxx j ,ccc i )

whereI ccc i , i = 1, . . . ,K are the cluster centersI UUU is the membership matrix of size K × N (K clusters, N

points)I Uij ∈ {0, 1}; Uij = 1 if xxx j belongs to Ci (i-th cluster);

otherwise 0I d(·, ·) is a distance, usually the squared Euclidean:

d(xxx ,zzz) = ‖xxx − zzz‖2

I F is minimized if the points are assigned to the closest clustercenter, provided they are fixed:

Uij =

{1, if ‖xxx j − ccc i‖2 ≤ ‖xxx j − ccck‖2,∀k 6= i0, otherwise

I on the other side, if UUU is fixed, F is minimized by taking thecenters of the groups:

ccc i =1

∑Nj=1 Uij

N∑

k=1

Uikxxxk

22/41

Page 26: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

k-Means

1. Initialize the centers.

2. Determine UUU.

3. Compute cost function F .

4. If converged, then STOP.Else update centers using the new UUU; GOTO step 2.

23/41

Page 27: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Fuzzy c-means:

I generalization of k-means

I cluster membership values are ∈ [0, 1]

I objective function:

F ′ =K∑

i=1

N∑

j=1

Upij ‖xxx j − ccc i‖2

such that Uij ∈ [0, 1],∑K

i=1 Uij = 1,∀j , p ∈ [1,∞)

I if the centers are fixed, F ′ is minimized by

Uij =1

∑Kk=1

(‖xxx j−ccc i‖‖xxx j−ccck‖

) 2N−1

I the values which minimize F ′ for fixed UUU are

ccc i =1

∑Nj=1 Up

ij

N∑

j=1

Upijxxx j

24/41

Page 28: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Kernel k-means

I k-means and fuzzy c-means are limited to (properly) findpairwise linearly separable clusters

I to obtain non-linear cluster boundaries we use kernels

I the only expression one needs to rewrite for the kernel-basedvariant of the algorithm is the distance of a point to a center:

‖φ(xxx)− 1

sj

N∑

k=1

Ujkφ(xxxk)‖2

= k(xxx ,xxx) +1

s2j

N∑

k,`=1

UjkUjlk(xxxk ,xxx`)−2

sj

N∑

k=1

Ujkk(xxxk ,xxx)

where sj is the size of the j-th cluster

25/41

Page 29: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Kernel k-means

1. Generate a random UUU which satisfies the conditions, that ismake a random assignment of the points.

2. Compute the cost function F .

3. If converged then STOP.Else update UUU using the update formula used in k-means (=get new UUU); GOTO step 2.

26/41

Page 30: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Semi-supervised learning (SSL)I supervised learning:

D = {(xxx i , yi ) |xxx i ∈ X ⊆ Rd , yi ∈ {−1,+1}, i = 1, . . . , `};find f : X → {−1,+1} which agrees with D

I semi-supervised learning:D = {(xxx i , yi ) | i = 1, . . . , `} ∪ {xxx j | j = `+ 1, . . . , `+ u},`� u, N = `+ u;

I inductive: find f : X → {−1,+1} which agrees with D +use the information of DU

I transductive: find f : DU → {−1,+1} by using D = DL ∪DU

0 5 10

−6

−4

−2

0

2

4

0 5 10

−6

−4

−2

0

2

4

27/41

Page 31: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Semi-supervised learning (SSL)I supervised learning:

D = {(xxx i , yi ) |xxx i ∈ X ⊆ Rd , yi ∈ {−1,+1}, i = 1, . . . , `};find f : X → {−1,+1} which agrees with D

I semi-supervised learning:D = {(xxx i , yi ) | i = 1, . . . , `} ∪ {xxx j | j = `+ 1, . . . , `+ u},`� u, N = `+ u;

I inductive: find f : X → {−1,+1} which agrees with D +use the information of DU

I transductive: find f : DU → {−1,+1} by using D = DL ∪DU

0 5 10

−6

−4

−2

0

2

4

0 5 10

−6

−4

−2

0

2

4

27/41

Page 32: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Assumptions in SSL

I smoothness assumption: If two points xxx i and xxx j in a highdensity region are close, then so should be the correspondingoutputs yi and yj .

I cluster assumption: If two points are in the same cluster,they are likely to be of the same class.

I manifold assumption: The high dimensional data lie roughlyon a low dimensional manifold. – regarding dimensionality;but if manifold = approximation of the high-dimensionalregion ⇒ smoothness assumption

−10 −5 0 5 10 15 20−5

0

5

10

28/41

Page 33: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Assumptions in SSL

I smoothness assumption: If two points xxx i and xxx j in a highdensity region are close, then so should be the correspondingoutputs yi and yj .

I cluster assumption: If two points are in the same cluster,they are likely to be of the same class.

I manifold assumption: The high dimensional data lie roughlyon a low dimensional manifold. – regarding dimensionality;but if manifold = approximation of the high-dimensionalregion ⇒ smoothness assumption

−10 −5 0 5 10 15 20−5

0

5

10

28/41

Page 34: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Assumptions in SSL

I smoothness assumption: If two points xxx i and xxx j in a highdensity region are close, then so should be the correspondingoutputs yi and yj .

I cluster assumption: If two points are in the same cluster,they are likely to be of the same class.

I manifold assumption: The high dimensional data lie roughlyon a low dimensional manifold. – regarding dimensionality;but if manifold = approximation of the high-dimensionalregion ⇒ smoothness assumption

−10 −5 0 5 10 15 20−5

0

5

10

28/41

Page 35: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Classes of SSL

I Generativemodels

I Low-densityseparation

I Graph-basedmethods

I Change ofrepresentation

29/41

Page 36: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Classes of SSL

I Generativemodels

I Low-densityseparation

I Graph-basedmethods

I Change ofrepresentation

I model class conditional density P(xxx |y)and class priors P(y) and use the Bayestheorem for calculating posteriors, whichare used for classification

I e.g. Naive Bayes + EM

1. Train the classifier on the trainingexamples; determine probabilitiesP(ddd i | cj) and P(cj).

2. Repeat while there is improvement:

E-step: Classify unlabeled examplesusing the trained classifier.M-step: Re-estimate the classifierusing the previously classifiedexamples; recalculate P(ddd i | cj ) andP(cj ).

29/41

Page 37: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Classes of SSL

I Generativemodels

I Low-densityseparation

I Graph-basedmethods

I Change ofrepresentation

I push away a decision boundary fromlabeled and unlabeled data; naturalchoice: use a large-margin classifier

I e.g. Transductive SVM (TSVM)

min1

2‖www‖2

s.t. yi (www′xxx i + b) ≥ 1, i = 1, . . . , `

yj(www′xxx j + b) ≥ 1, j = `+ 1, . . . ,N

yj ∈ {−1,+1}, j = `+ 1, . . . ,N

29/41

Page 38: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Classes of SSL

I Generativemodels

I Low-densityseparation

I Graph-basedmethods

I Change ofrepresentation

I use the graph of the labeled andunlabeled data, represented by weightmatrix WWW ; use graph Laplacian LLL−WWW

I e.g. Label Propagation

1. Compute YYY (t + 1) = PPP YYY (t).2. Reset the labeled data,

YYY L(t + 1) = YYY L(0).3. t = t + 1 and repeat the above steps

until convergence.

29/41

Page 39: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Classes of SSL

I Generativemodels

I Low-densityseparation

I Graph-basedmethods

I Change ofrepresentation

I find some structure in the whole data set;use the information provided by thelabeled and unlabeled data set to create anew representation

1. Build the new representation – newdistance, dot-product or kernel – of thelearning examples.

2. Use a supervised learning method forobtaining the decision functionemploying the new representationobtained in the previous step.

I e.g. PCA, KPCA, LLE, ISOMAP, etc.

29/41

Page 40: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Self-training/Bootstrapping

Simple self-trainingUntil convergence:

1. Train classifier with the labeled data.

2. Classify unlabeled data with the trained classifier.

3. Add the most confident unlabeled points to the training data.

4. GOTO step 1.

Committee-based learningUntil convergence:

1. Train separate classifiers on the same labeled data.

2. Make predictions on the unlabeled data.

3. Add the most agreed points to the training set.

4. GOTO step 1.

Final prediction: weighted majority vote among all the learners.

30/41

Page 41: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Data graphs

I G = (V ,W ) undirected weighted graph(usually)

I V = XL ∪ XU (training data)

I W : V × V → R (similarity function);W (xxx i ,xxx j) = sim(xxx i ,xxx j) (def

= Wij)for example:

sim(xxx i ,xxx j) = exp(−‖xxx i−xxx j‖2

2

σ2

)

I sparsification techniques: εNN, kNNgraphs

I graph Laplacian: LLL = DDD −WWW ,DDD = diag(WWW · 111)

I nice properties: e.g. positivesemi-definite; has as many 0 eigenvaluesas connected components contains; etc.

31/41

Page 42: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Label propagation

I build data graph: Wij = exp(− 1

2σ2 ‖xxx i − xxx j‖2)

I compute transition probabilities: DDD = diag(WWW111), PPP = DDD−1WWW

I vector of labels: fff = [fff L fff U ]′

LP

1. i = 0; Initialize fff(i)U .

2. fff (i+1) = PPPfff (i).

3. Clamp the labeled data, fff(i+1)L = fff L.

4. i = i + 1; Unless convergence go tostep 2.

32/41

Page 43: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

I can be rewritten as

fi =1

∑Nj=1 Wij

N∑

k=1

Wik fk

I exact solution (LLL – graph Laplacian)

fff U = −LLL−1UU · LLLUL · fff L

I minimizes

E (fff ) =1

2

N∑

i,j=1

Wij(fi − fj)2 = fff ′LLLfff

I LP is equivalent with searching for a minimum cut in the datagraph

33/41

Page 44: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Transduction → induction

I fix the other labels and compute the new one

I we minimize:C +

i

Wiy (fi − y)2

I from setting the derivative to zero we obtain:

y =

∑i Wiy fi∑j Wjy

34/41

Page 45: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Graph-based distances

I if data lie on a manifold, graph-based distances are better

I graph-based distances = shortest path distances calculatedusing a known algorithm: Floyd–Warshall, Dijkstra etc.

��

��

��

Floyd–Warshall algorithm

1. Dij = weight of the edge; if no edge then ∞2. for k=1:N

for i=1:Nfor j=1:N

if Dij > Dik + Dkj

Dij = Dik + Dkj

35/41

Page 46: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Semi-Supervised kNN and k-Means

here: semi-supervised = graph-based

1. Initial distance matrix ⇐ kNN/εNN matrix

2. Compute the all-pair shortest path distance matrix – using forexample the Floyd–Warshall algorithm:

Dij = shortest path value among all paths from i to j

3. Perform kNN/k-Means using these graph distances.

36/41

Page 47: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Cluster kernels

Neighborhood kernel

I representing each as the average of its neighbors →neighborhood structure influences representation:

φnbd(xxx) =1

|N(xxx)|∑

xxx′∈N(xxx)

φb(xxx)

I the kernel:

knbd(xxx ,zzz) =

∑xxx′∈N(xxx),zzz′∈N(zzz) kb(xxx ′,zzz ′)

|N(xxx)||N(zzz)|

37/41

Page 48: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Bagged cluster kernel

I idea: reweighting the base kernel values by the probabilitythat the points belong to the same cluster

I use k-means clustering: random initial cluster centers →affects the output of the algorithm

BCK

1. Run k-means t times, which results in cj(xxx i ) clusterassignments, j = 1, . . . , t, i = 1, . . . ,N, c·(·) ∈ {1, . . . ,K}.

2. Construct the bagged kernel in the following way:

kbag(xxx ,zzz) =

∑tj=1[cj(xxx) = cj(zzz)]

t

The final kernel:

k(xxx ,zzz) = kb(xxx ,zzz) · kbag(xxx ,zzz)

38/41

Page 49: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Laplacian SVM

I use SVM + semi-supervised (smoothness) assumption

I add the following member to the objective function:

γI1

2

N∑

i,j=1

wij(fi − fj)2 = γI fff

′LLLfff

where LLL is the graph Laplacian, fff is the N × 1 vectorcontaining the labels of the points;γI is a parameter determining how much the unlabeled pointsinfluence the result/kernel

I the optimization problem thus becomes:

minwww ,b

1

2‖www‖2 + C

i=1

ξi + γI fff′LLLfff

s.t. yi (www′xxx i + b) ≥ 1− ξi

39/41

Page 50: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

I it can be shown that it is the same problem which is solved inthe original SVM formulation, but now using the kernel

k(xxx ,zzz) = k(xxx ,zzz)− kkk ′xxx

(1

2γIIII + LLLKKKNN

)−1

LLLkkkzzz

where k(·, ·) is the new semi-supervised kernel, while k(·, ·)is the base kernel

−20

−10

0

10

20

−20

−10

0

10

200

50

40/41

Page 51: Kernel Methods and Semi-Supervised Learningzbodo/pres/szeged2012.pdf · SVM k-means Semi-supervised learning Assumptions in SSL Classes of SSL Self-training Data graphs Label propagation

Kernel Methods andSemi-Supervised

Learning

Zalan Bodo

The aims of the shortcourse

Top 10 data miningalgorithms

Machine learning.Supervised learning

Kernel methods

Kernels

Centroid method

kNN

SVM

k-means

Semi-supervisedlearning

Assumptions in SSL

Classes of SSL

Self-training

Data graphs

Label propagation

Graph-based distances

Semi-supervised kNNand k-means

Cluster kernels

Laplacian SVM

Thank you! Questions?

41/41