Machine learning techniques in image analysis

Preview:

Citation preview

Semi-supervised learning

Learning from both labeled and unlabeled data

Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity

COMP 875 Machine learning techniques in image analysis

Semi-supervised learning

Learning from both labeled and unlabeled data

Motivation: labeled data may be hard/expensive to get, butunlabeled data is usually cheaply available in much greaterquantity

COMP 875 Machine learning techniques in image analysis

How can unlabeled data help?

COMP 875 Machine learning techniques in image analysis

How can unlabeled data help?

COMP 875 Machine learning techniques in image analysis

How can unlabeled data help?

COMP 875 Machine learning techniques in image analysis

Example: Text classification Source: J. Zhu

Classify astronomy vs. travel articles

Similarity measured by word overlap

COMP 875 Machine learning techniques in image analysis

Example: Text classification Source: J. Zhu

When labeled data alone fails:

What if there are no overlapping words?

COMP 875 Machine learning techniques in image analysis

Example: Text classification Source: J. Zhu

Unlabeled data as stepping stones:

Labels “propagate” via similar unlabeled articles

COMP 875 Machine learning techniques in image analysis

Another example Source: J. Zhu

Handwritten digits recognition with pixel-wise Euclidean distance

not similar indirectly similar with stepping stones

COMP 875 Machine learning techniques in image analysis

Types of semi-supervised learning

Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .

Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).

COMP 875 Machine learning techniques in image analysis

Types of semi-supervised learning

Inductive learning: given a training set L of labeled data andU of unlabeled data, learn a predictor that can be applied to abrand-new unlabeled point not in U .

Transductive learning: given L and U , learn a predictor thatcan be applied only to U (i.e., the predictor cannot be easilyextended to previously unseen data).

COMP 875 Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred

labels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

COMP 875 Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U

3 Remove a subset from U and add that subset and its inferredlabels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

COMP 875 Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred

labels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

COMP 875 Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred

labels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

COMP 875 Machine learning techniques in image analysis

Simplest semi-supervised learning algorithm: Self-trainingSource: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Learn predictor f from labeled data L using supervisedlearning

2 Apply f to the unlabeled instances in U3 Remove a subset from U and add that subset and its inferred

labels to L

How might we select this subset?

Advantages/disadvantages of this scheme?

COMP 875 Machine learning techniques in image analysis

Self-training with nearest-neighbor classifier Source: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Find unlabeled point x that is closest to a labeled point x′

and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.

COMP 875 Machine learning techniques in image analysis

Self-training with nearest-neighbor classifier Source: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Find unlabeled point x that is closest to a labeled point x′

and assign to x the label of x′.2 Remove x from U ; add it and its estimated label to L.

COMP 875 Machine learning techniques in image analysis

Propagating nearest-neighbor: Example Source: J. Zhu

(a) Iteration 1 (b) Iteration 25

(c) Iteration 74 (d) Final

COMP 875 Machine learning techniques in image analysis

Another example Source: J. Zhu

(a) (b)

(c) (d)

COMP 875 Machine learning techniques in image analysis

Another example Source: J. Zhu

(a) (b)

(c) (d)

COMP 875 Machine learning techniques in image analysis

Another simple approach: Cluster-and-label Source: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that

cluster

3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster

What is the underlying assumption here?

COMP 875 Machine learning techniques in image analysis

Another simple approach: Cluster-and-label Source: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that

cluster

3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster

What is the underlying assumption here?

COMP 875 Machine learning techniques in image analysis

Another simple approach: Cluster-and-label Source: J. Zhu

Input: labeled data L and unlabeled data URepeat:

1 Cluster L ∪ U2 For each cluster, let S be the set of labeled instances in that

cluster

3 Learn a supervised predictor from S and apply it to all theunlabeled instances in that cluster

What is the underlying assumption here?

COMP 875 Machine learning techniques in image analysis

Cluster-and-label: Examples Source: J. Zhu

Hierarchical clustering, majority vote predictor within cluster

COMP 875 Machine learning techniques in image analysis

Cluster-and-label: Examples Source: J. Zhu

Hierarchical clustering, majority vote predictor within cluster

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Labeled data (Xl, Yl):

Assuming each class has a Gaussian distribution, how do we findthe decision boundary?

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Labeled data (Xl, Yl):

The most likely model, and its decision boundary

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Labeled data (Xl, Yl) and unlabeled data Xu:

What is the most likely decision boundary now?

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Labeled data (Xl, Yl) and unlabeled data Xu:

What is the most likely decision boundary now?

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

The two boundaries are different because they maximize differentquantities:

p(Xl, Yl|θ) p(Xl, Yl, Xu|θ)

Gaussian mixture model: θ are the component weights, means, andcovariances

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ)

=∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ)

=∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ:

sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ)

= p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

)

∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ:

use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

Generative models Source: J. Zhu

Only labeled data:

p(Xl, Yl|θ) =∏

i

p(xi, yi|θ) =∏

i

p(yi|θ)p(xi|yi, θ)

ML estimate for θ: sample means, covariances, proportions foreach of the classes

Labeled and unlabeled data:

p(Xl, Yl, Xu|θ) = p(Xl, Yl|θ)∑Yu

p(Xu, Yu|θ)

=

( ∏i labeled

p(yi|θ)p(xi|yi, θ)

) ∏j unlabeled

∑c

p(c|θ)p(xj |c, θ)

ML estimate for θ: use EM (Yu are hidden variables)

COMP 875 Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

COMP 875 Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

COMP 875 Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

COMP 875 Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

COMP 875 Machine learning techniques in image analysis

The EM algorithm for Gaussian mixtures Source: J. Zhu

1 Start from MLE θ = {pc, µc,Σc} on (Xl, Yl):

pc: proportion of class cµc: sample mean of class cΣc: sample covariance matrix of class c

Repeat:

2 The E-step: compute the expected label p(y|x, θ) for all x inXu.

3 The M-step: update MLE θ with the “softly labeled” Xu.

Special case of EM for Gaussian mixtures where thecomponent assignments of labeled data are fixed.

Can also be viewed as a special case of self-training.

COMP 875 Machine learning techniques in image analysis

Limitations of mixture models Source: J. Zhu

Assumption: mixture components correspond toclass-conditional distributions.

When the assumption is wrong:

COMP 875 Machine learning techniques in image analysis

Discriminative approach: Semi-supervised SVMs Source: J. Zhu

Idea: try to keep labeled points outside the margin, whilemaximizing the margin.

COMP 875 Machine learning techniques in image analysis

Discriminative approach: Semi-supervised SVMs Source: J. Zhu

Idea: try to keep labeled points outside the margin, whilemaximizing the margin.

COMP 875 Machine learning techniques in image analysis

Review: Standard SVMs

Classification function: f(x) = wTx + w0.

Standard SVM objective function:

minw,w0

‖w‖2 + λ1

∑i

(1− yif(xi))+

COMP 875 Machine learning techniques in image analysis

Semi-supervised SVMs Source: J. Zhu

Classification function: f(x) = wTx + w0.

To incorporate unlabeled points, assign to them putativelabels sgn(f(x)).

Semi-supervised SVM objective function:

minw,w0

‖w‖2+λ1

∑i labeled

(1−yif(xi))+ + λ2

∑j unlabeled

(1− |f(xj)|)+

COMP 875 Machine learning techniques in image analysis

Graph-based semi-supervised learning Source: J. Zhu

Idea: construct graph where nodes are labeled and unlabeledexamples, and edges are weighted by the similarity ofexamples.Unlabeled data can help “glue” the objects of the same classtogether.Assumption: items connected by “heavy” edges are likely tohave the same label.

COMP 875 Machine learning techniques in image analysis

Graph-based semi-supervised learning Source: J. Zhu

The mincut algorithm:

Assume binary classification (class labels are 0, 1).

Approach: fix Yl, find Yu to minimize∑i∼j

wij |yi − yj |.

Combinatorial problem, but has polynomial-time solution.

Harmonic functions:

Let’s relax discrete labels to continuous values in R.

We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

COMP 875 Machine learning techniques in image analysis

Graph-based semi-supervised learning Source: J. Zhu

The mincut algorithm:

Assume binary classification (class labels are 0, 1).

Approach: fix Yl, find Yu to minimize∑i∼j

wij |yi − yj |.

Combinatorial problem, but has polynomial-time solution.

Harmonic functions:

Let’s relax discrete labels to continuous values in R.

We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

COMP 875 Machine learning techniques in image analysis

Graph-based semi-supervised learning Source: J. Zhu

The mincut algorithm:

Assume binary classification (class labels are 0, 1).

Approach: fix Yl, find Yu to minimize∑i∼j

wij |yi − yj |.

Combinatorial problem, but has polynomial-time solution.

Harmonic functions:

Let’s relax discrete labels to continuous values in R.

We want to find the harmonic function f that satisfiesf(x) = y for all x in Xl and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

COMP 875 Machine learning techniques in image analysis

A random walk interpretation Source: J. Zhu

Randomly walk from node i to j with probabilitywij∑k wik

.

Stop if we hit a labeled node.

The harmonic function has the following interpretation:f(xi) = P (hit label 1|start from i).

COMP 875 Machine learning techniques in image analysis

The harmonic solution Source: J. Zhu

We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

It can be shown that f(xi) =∑

j∼i wijf(xj)∑j∼i wij

at all unlabeled

points xi.

Iterative algorithm to compute harmonic function:

Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:

f(xi) =

∑j∼i wijf(xj)∑

j∼i wij.

COMP 875 Machine learning techniques in image analysis

The harmonic solution Source: J. Zhu

We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

It can be shown that f(xi) =∑

j∼i wijf(xj)∑j∼i wij

at all unlabeled

points xi.

Iterative algorithm to compute harmonic function:

Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:

f(xi) =

∑j∼i wijf(xj)∑

j∼i wij.

COMP 875 Machine learning techniques in image analysis

The harmonic solution Source: J. Zhu

We want to find the harmonic function f that satisfiesf(x) = y for all labeled points x and minimizes the energy∑

i∼j

wij(f(xi)− f(xj))2.

It can be shown that f(xi) =∑

j∼i wijf(xj)∑j∼i wij

at all unlabeled

points xi.

Iterative algorithm to compute harmonic function:

Initially, fix f(x) = y for all labeled data and set f to arbitraryvalues for all unlabeled data.Repeat until convergence: For each unlabeled xi, set f(xi) toits weighted neighborhood average:

f(xi) =

∑j∼i wijf(xj)∑

j∼i wij.

COMP 875 Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =

∑j wij .

The graph Laplacian matrix is defined as L = D −W .

Then we can write∑i,j

wij(f(xi)− f(xj))2 = fTLf.

We want to minimize fTLf subject to constraints f(xi) = yi

on labeled data.

Solution: fu = −L−1uuLul yl, where yl are the labels for labeled

data, and

L =[Lll Llu

Lul Luu

].

COMP 875 Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Let W be a symmetric weight matrix with entries wij , and Dbe a diagonal matrix with entries Dii =

∑j wij .

The graph Laplacian matrix is defined as L = D −W .

Then we can write∑i,j

wij(f(xi)− f(xj))2 = fTLf.

We want to minimize fTLf subject to constraints f(xi) = yi

on labeled data.

Solution: fu = −L−1uuLul yl, where yl are the labels for labeled

data, and

L =[Lll Llu

Lul Luu

].

COMP 875 Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:

minf

∑i labeled

c(f(xi)− yi)2 + fTLf.

Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as

minf

(f − y)TC(f − y) + fTLf

where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.

Then the solution is given by the linear system

(C + L)f = Cy.

COMP 875 Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:

minf

∑i labeled

c(f(xi)− yi)2 + fTLf.

Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as

minf

(f − y)TC(f − y) + fTLf

where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.

Then the solution is given by the linear system

(C + L)f = Cy.

COMP 875 Machine learning techniques in image analysis

The graph Laplacian Source: J. Zhu

Alternative approach: Allow f(xi) to be different from yi onlabeled data, but penalize it:

minf

∑i labeled

c(f(xi)− yi)2 + fTLf.

Let C be a diagonal matrix where Cii = c if i is a labeledpoint, and Λii = 0 otherwise. Then we can write the objectivefunction as

minf

(f − y)TC(f − y) + fTLf

where y is a vector whose entries correspond to labels oflabeled points, and are arbitrary otherwise.

Then the solution is given by the linear system

(C + L)f = Cy.

COMP 875 Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

COMP 875 Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

COMP 875 Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

COMP 875 Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

COMP 875 Machine learning techniques in image analysis

Graph spectrum Source: J. Zhu

The spectrum of the graph represented by W is given by theeigenvalues and eigenvectors (λi, φi)n

i=1 of the Laplacian L.

Properties of the graph spectrum:

A graph has k connected components if and only ifλ1 = λ2 = . . . = λk = 0. The corresponding eigenvectors areconstant on individual connected components, and zeroelsewhere.

L =∑n

i=1 λiφiφTi .

Any function f on the graph can be written as a linearcombination of eigenvectors: f =

∑ni=1 aiφi.

The “smoothness” of f can be written as fTLf =∑n

i=1 a2iλi.

COMP 875 Machine learning techniques in image analysis

Using the graph spectrum

Objective function

minf

∑i labeled

c(f(xi)− yi)2 + fTLf

= (f − y)TC(f − y) + fTLf.

We can restrict our solution to “smooth” functions f , i.e.,linear combinations of the first k eigenvectors associated withthe smallest eigenvalues: f =

∑ki=1 aiφi.

Now we can obtain f by solving a k × k linear system insteadof an n× n linear system.

COMP 875 Machine learning techniques in image analysis

References

J. Zhu, Semi-supervised learning survey, University of Wisconsin technicalreport, 2008.http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html

J. Zhu, Semi-supervised learning tutorial, Chicago Machine Learning SummerSchool, 2009.http://pages.cs.wisc.edu/~jerryzhu/pub/sslchicago09.pdf

COMP 875 Machine learning techniques in image analysis

Recommended