42
Semi-supervised Learning Rong Jin

Semi-supervised Learning Rong Jin. Semi-supervised learning Label propagation Transductive learning Co-training Active learning

Embed Size (px)

Citation preview

Page 1: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Semi-supervised Learning

Rong Jin

Page 2: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Semi-supervised learning

Label propagation Transductive learning Co-training Active learning

Page 3: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation A toy problem

Each node in the graph is an example Two examples are labeled Most examples are unlabeled

Compute the similarity between examples Sij

Connect examples to their most similar examples

How to predicate labels for unlabeled nodes using this graph?

Unlabeled example

Two labeled examples

wij

Page 4: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation Forward propagation

Page 5: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation Forward propagation Forward propagation

Page 6: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation Forward propagation Forward propagation Forward propagation

How to resolve conflicting cases

What label should be given to this node ?

Page 7: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation Let S be the similarity matrix S=[Si,j]nxn

Let D be a diagonal matrix where Di = i j Si,j

Compute normalized similarity matrix S’ S’=D-1/2SD-1/2

Let Y be the initial assignment of class labels Yi = 1 when the i-th node is assigned to the positive class Yi = -1 when the i-th node is assigned to the negative class Yi = 0 when the I-th node is not initially labeled

Let F be the predicted class labels The i-th node is assigned to the positive class if Fi >0 The i-th node is assigned to the negative class if Fi < 0

Page 8: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation Let S be the similarity matrix S=[Si,j]nxn

Let D be a diagonal matrix where Di = i j Si,j

Compute normalized similarity matrix S’ S’=D-1/2SD-1/2

Let Y be the initial assignment of class labels Yi = 1 when the i-th node is assigned to the positive class Yi = -1 when the i-th node is assigned to the negative class Yi = 0 when the i-th node is not initially labeled

Let F be the predicted class labels The i-th node is assigned to the positive class if Fi >0 The i-th node is assigned to the negative class if Fi < 0

Page 9: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Propagation One iteration

F = Y + S’Y = (I + S’)Y weights the propagation values

Two iteration F =Y + S’Y + 2S’2Y = (I + S’ + 2S’2)Y

How about the infinite iteration

F = (n=01nS’n)Y = (I - S’)-1Y

Any problems with such an approach?

Page 10: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Label Consistency Problem Predicted vector F may

not be consistent with the initially assigned class labels Y

Page 11: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Energy Minimization Using the same notation

Si,j: similarity between the I-th node and j-th node

Y: initially assigned class labels F: predicted class labels

Energy: E(F) = i,jSi,j(Fi – Fj)2 Goal: find label assignment F that is consistent with

labeled examples Y and meanwhile minimizes the energy function E(F)

Page 12: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Harmonic Function E(F) = i,jSi,j (Fi – Fj)2 = FT(D-S)F Thus, the minimizer for E(F) should be (D-S)F = 0,

and meanwhile F should be consistent with Y. FT = (Fl

T, FuT), YT = (Yl

T, YuT)

Fl = Yl

ll ul

lu uu

L LD S L

L L

l l u

u l u

Y Y FF 0

F Y Fll ul ll ul

lu uu ul uu

L L L LL

L L L L1

u lF Yuu ul L L

Page 13: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Optical Character Recognition Given an image of a digit letter, determine its value

1 2

Create a graph for images of digit letters

Page 14: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Optical Character Recognition #Labeled_Examples+#Unlabeled_Examples = 4000

CMN: label propagation

1NN: for each unlabeled example, using the label of its closest neighbor

Page 15: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Spectral Graph Transducer Problem with harmonic function

Why this could happen ? The condition (D-S)F = 0 does not hold for constrained

cases

l l u

u l u

Y Y F 0F

F Y F 0ll ul ll ul

lu uu ul uu

L L L LL

L L L L

Page 16: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Spectral Graph Transducer Problem with harmonic function

Why this could happen ? The condition (D-S)F = 0 does not hold for constrained

cases

l l u

u l u

Y Y F 0F

F Y F 0ll ul ll ul

lu uu ul uu

L L L LL

L L L L

Page 17: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Spectral Graph TransducerminF FTLF + c (F-Y)TC(F-Y)

s.t. FTF=n, FTe = 0 C is the diagonal cost matrix, Ci,i = 1 if the i-th node is

initially labeled, zero otherwise Parameter c controls the balance between the consistency

requirement and the requirement of energy minimization Can be solved efficiently through the computation of

eigenvector

Page 18: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Empirical Studies

Page 19: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Green’s Function The problem of minimizing energy and meanwhile being

consistent with initially assigned class labels can be formulated into Green’s function problem

Minimizing E(F) = FTLF LF = 0 Turns out L can be viewed as Laplacian operator in the discrete case LF = 0 r2F=0

Thus, our problem is find solution F

r2F=0, s.t. F = Y for labeled examples We can treat the constraint that F = Y for labeled examples as

boundary condition (Von Neumann boundary condition) A standard Green function problem

Page 20: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Why Energy Minimization?

2,

1 1

( ) ( )n n

i j i ji j

E Y w y y

Final classification results

Page 21: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Cluster Assumption Cluster assumption

Decision boundary should pass low density area

Unlabeled data provide more accurate estimation of local density

Page 22: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Cluster Assumption vs. Maximum Margin Maximum margin classifier (e.g. SVM)

denotes +1

denotes -1

wx+b Maximum margin

low density around decision boundary

Cluster assumption

Any thought about utilizing the unlabeled data in support vector machine?

Page 23: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM Decision boundary given a

small number of labeled examples

Page 24: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM Decision boundary given a

small number of labeled examples

How will the decision boundary change given both labeled and unlabeled examples?

Page 25: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM Decision boundary given a

small number of labeled examples

Move the decision boundary to place with low local density

Page 26: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM Decision boundary given

a small number of labeled examples

Move the decision boundary to place with low local density

Classification results How to formulate this

idea?

Page 27: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM: Formulation Labeled data L: Unlabeled data D: Maximum margin principle for mixture of

labeled and unlabeled data For each label assignment of unlabeled data,

compute its maximum margin Find the label assignment whose maximum

margin is maximized

1 1 2 2{( , ), ( , ),..., ( , )}n nL x y x y x y

1 2{( ), ( ),..., ( )}n n n mD x x x

Page 28: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Tranductive SVM

Different label assignment for unlabeled data

different maximum margin

Page 29: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Transductive SVM: Formulation

* *

,

1 1

2 2

{ , }= argmin

1

1 labeled

examples....

1

w b

n n

w b w w

y w x b

y w x b

y w x b

Original SVM

1

* *

,..., ,

1 1

2 2

1 1

{ , }= argmin argmin

1

1 labeled

examples....

1

1 unlabeled

....examples

1

n n my y w b

n n

n n

n m n m

w b w w

y w x b

y w x b

y w x b

y w x b

y w x b

Transductive SVM

Constraints for unlabeled data

A binary variables for label of each example

Page 30: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Computational Issue

No longer convex optimization problem. (why?) How to optimize transductive SVM? Alternating optimization

1

* *1 1

,..., ,

1 1 11 1 1

2 2 2

{ , }= argmin argmin

1 1

1 labeled unlabeled ....

examples exampl....1

1

n n m

n ni ii i

y y w b

n n

n m n m mn n n

w b w w

y w x by w x b

y w x b

y w x by w x b

es

Page 31: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Alternating Optimization

Step 1: fix yn+1,…, yn+m, learn weights w

Step 2: fix weights w, try to predict yn+1,…, yn+m (How?)

1

* *1 1

,..., ,

1 1 11 1 1

2 2 2

{ , }= argmin argmin

1 1

1 labeled unlabeled ....

examples exampl....1

1

n n m

n ni ii i

y y w b

n n

n m n m mn n n

w b w w

y w x by w x b

y w x b

y w x by w x b

es

Page 32: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Empirical Study with Transductive SVM

10 categories from the Reuter collection

3299 test documents 1000 informative words

selected using MI criterion

Page 33: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training for Semi-supervised Learning Consider the task of classifying web pages into two

categories: category for students and category for professors

Two aspects of web pages should be considered Content of web pages

“I am currently the second year Ph.D. student …”

Hyperlinks “My advisor is …” “Students: …”

Page 34: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training for Semi-Supervised Learning

Page 35: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training for Semi-Supervised Learning

It is easy to classify the type of

this web page based on its

content

It is easier to classify this web

page using hyperlinks

Page 36: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Two representation for each web page

Content representation:

(doctoral, student, computer, university…)

Hyperlink representation:

Inlinks: Prof. Cheng

Oulinks: Prof. Cheng

Page 37: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training: Classification Scheme1. Train a content-based classifier using labeled web pages

2. Apply the content-based classifier to classify unlabeled web pages

3. Label the web pages that have been confidently classified

4. Train a hyperlink based classifier using the web pages that are initially labeled and labeled by the classifier

5. Apply the hyperlink-based classifier to classify the unlabeled web pages

6. Label the web pages that have been confidently classified

Page 38: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Train a content-based classifier

Page 39: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Train a content-based classifier using

labeled examples Label the unlabeled examples that are

confidently classified

Page 40: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Train a content-based classifier using

labeled examples Label the unlabeled examples that are

confidently classified Train a hyperlink-based classifier

Prof. : outlinks to students

Page 41: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Train a content-based classifier using

labeled examples Label the unlabeled examples that are

confidently classified Train a hyperlink-based classifier

Prof. : outlinks to students

Label the unlabeled examples that are confidently classified

Page 42: Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning

Co-training Train a content-based classifier using

labeled examples Label the unlabeled examples that are

confidently classified Train a hyperlink-based classifier

Prof. : outlinks to

Label the unlabeled examples that are confidently classified