30
{ On SSL, and beyond } Lab Seminar Presentation Eunjeong Park Nov. 23 rd , 2009 - Theories, Methods, and a Possible Suggestion on Semi-Supervised Learning -

On Semi-Supervised Learning and Beyond

Embed Size (px)

DESCRIPTION

- Theories, Methods, and a Possible Suggestion on Semi-Supervised Learning -

Citation preview

Page 1: On Semi-Supervised Learning and Beyond

{ On SSL, and beyond }

Lab Seminar Presentation Eunjeong Park

Nov. 23rd, 2009

- Theories, Methods, and a Possible Suggestion on Semi-Supervised Learning -

Page 2: On Semi-Supervised Learning and Beyond

1. Background

2. Semi-Supervised Learning Methods

3. Assumptions on SSL

4. Future Work

Agenda

Page 3: On Semi-Supervised Learning and Beyond

Agenda

1. Background

2. Semi-Supervised Learning Methods

3. Assumptions on SSL

4. Future Work

Page 4: On Semi-Supervised Learning and Beyond

Examples (1/2)

• Spam E-mail Classification

Background

spam

inbox

?

Page 5: On Semi-Supervised Learning and Beyond

Examples (2/2)

• Response Modeling

Background

respondents

non-respondents

unlabeled

?

Page 6: On Semi-Supervised Learning and Beyond

the Question (1/2)

• Statistical learning methods require LOTS of training data – But since we only have a limited amount of labeled data, – Can we figure out a way for our learning algorithms to take

advantage of all the unlabeled data?

Background

Unlabeled … Labeled

Page 7: On Semi-Supervised Learning and Beyond

the Question (2/2)

• Text/Web Mining – Document classification

• f: Doc → Class • Spam filtering, web page classification

– Information extraction • f: Sentence → Fact, f: Doc → Fact

– Translation • f: EnglishDoc → FrenchDoc

f : x → y

<xi, yi> <xi> …?

Background

• Marketing - Response Modeling

• f: Demo+RFM → Response - Fraud Detection

• f: Demo+PaymentHistory → Fraud - Customer Segmentation

• f: Demo+RFM → Customer Seg.

Page 8: On Semi-Supervised Learning and Beyond

Agenda

1. Background

2. Semi-Supervised Learning Methods

3. Assumptions on SSL

4. Future Work

Page 9: On Semi-Supervised Learning and Beyond

Methodology [1]

• Generative models – Unlabeled data is used to to either modify or reprioritize hypotheses obtained from

labeled data alone – Given the Bayesian formula:

we can easily discover that p(x) influences p(y|x) – Mixture models with EM is in this category, and to some extent self-training, too

• Discriminative models

– Original discriminative training cannot be used for SSL, since p(y|x) is estimated ignoring p(x)

– To solve the problem, p(x) dependent terms are often brought into the objective function, which amounts to assuming p(y|x) and p(x) share parameters

– Transductive SVM, Gaussian processes, information regularization, graph-based methods are in this category

Semi-Supervised Learning

( | ) ( )( | )( )

p x y P yP y xp x

=

※ For more on GM, DM refer to Appendix 1.

Page 10: On Semi-Supervised Learning and Beyond

Previous methods Semi-Supervised Learning

SSL Semi-Supervised Learning

• EM w/ Generative Mixture Models (Nigam et al., 2000; Miller & Uyar, 1997)

•Self-Training

• Co-Training and Multiview Learning (Blum & Mitchell, 1998; Goldman & Zhou, 2000)

• TSVMs (Bennett et al., 1999; Joachims, 1999)

•Gaussian Processes

•Information Regularization

•Entropy Minimization

• Graph-based methods (Blum & Chawla, 2001)

Ref [1], [2] reorganized

※ For more on the use of above methods, refer to Appendix 2.

Page 11: On Semi-Supervised Learning and Beyond

Previous methods: EM w/Generative Models (1/3)

Semi-Supervised Learning

Basic EM Algorithm Incorporated w/ unlabeled data [3]

Page 12: On Semi-Supervised Learning and Beyond

• In a binary classification problem, if we assume each class has a Gaussian distribution, then we can use unlabeled data to help parameter estimation. [1]

Previous methods: EM w/Generative Models (2/3)

Semi-Supervised Learning

Page 13: On Semi-Supervised Learning and Beyond

Previous methods: EM w/Generative Models (3/3)

Semi-Supervised Learning

Page 14: On Semi-Supervised Learning and Beyond

Previous methods: Co-Training (1/4)

Semi-Supervised Learning

Professor Cho My Advisor

Page 15: On Semi-Supervised Learning and Beyond

Previous methods: Co-Training (2/4)

Semi-Supervised Learning

Professor Cho My Advisor

Classifier 1: Hyperlinks only Classifier 2: Page only

• Key Idea: Classifier1 and Classifier2 must… – Correctly classify labeled examples – Agree on classification of unlabeled

Page 16: On Semi-Supervised Learning and Beyond

• Given: labeled data L, unlabeled data U

• Loop: – Train g1 (hyperlink classifier) using L – Train g2 (page classifier) using L – Allow g1 to label p positive, n negative examples from U – Allow g2 to label p positive, n negative examples from U – Add these self-labeled examples to L

Previous methods: Co-Training (3/4) [4]

Semi-Supervised Learning

Answer1

Classifier1

Answer2

Classifier2

Professor Cho My Advisor

Page 17: On Semi-Supervised Learning and Beyond

• Experimental Settings: – begin with 12 labeled web pages (academic course) – provide 1,000 additional unlabeled web pages – average error: learning from labeled data 11.1%; – average error: cotraining 5.0%

Previous methods: Co-Training (4/4)

Semi-Supervised Learning

Page 18: On Semi-Supervised Learning and Beyond

Previous methods: TSVMs

Semi-Supervised Learning

+

+

+

+

+

-

-

-

-

Page 19: On Semi-Supervised Learning and Beyond

Previous methods: Graph-based methods

• Key idea: Define a graph where… – nodes are labeled and unlabeled examples in the dataset, and – edges (may be weighted) reflect the similarity of examples

– Then, nodes connected by a large-weight edge tend to have the

same label, and labels can propagation throughout the graph

• Note: Graph-based methods enjoy nice properties from spectral graph theory

Semi-Supervised Learning

Page 20: On Semi-Supervised Learning and Beyond

Agenda

1. Background

2. Semi-Supervised Learning Methods

3. Assumptions on SSL

4. Future Work

Page 21: On Semi-Supervised Learning and Beyond

The Utility of Unlabeled Data

• Many SSL papers start with an introduction like… “labeled data…is often very difficult and expensive to obtain, and

thus…unlabeled data holds significant promise in terms of vastly expanding the applicability of learning methods [5]”

…but is this necessarily true? – No! Do not take it for granted! – Even though you don’t to have to spend as much time labeling

training data, you still need to spend much effort to design good models / features / kernels / similarity functions for SSL!

• A good matching of problem structure with model assumption is necessary to effectively use unlabeled data – Bad matching can lead to degradation in classifier performance

Assumptions on SSL

Page 22: On Semi-Supervised Learning and Beyond

An Example (1/2)

• Unlabeled Data Can Degrade Classification Performance of Generative Classifiers [6] (1/2)

Assumptions on SSL

Naive Bayes classifier from data generated from a Naive Bayes model (left) and a TAN model (right). Each point summarizes 10 runs of each classifier on testing data; bars cover 30 to 70 percentiles.

Page 23: On Semi-Supervised Learning and Beyond

An Example (2/2) Assumptions on SSL

#of the word ‘Loan’

Spam=0 Spam=1

Q1: Is this e-mail spam? Q2: Was this e-mail written on a Sunday?

Page 24: On Semi-Supervised Learning and Beyond

Agenda

1. Problem Definition

2. Semi-Supervised Learning Methods

3. Assumptions on SSL

4. Future Work

Page 25: On Semi-Supervised Learning and Beyond

Multi-Edge Graph-Based SSL

• Aside to Semi-Supervised Classification, there are more… – Semi-Supervised Clustering – Semi-Supervised Regression

• There are also very similar methods such as… – Active learning

• Based on the theories noted above, here’s my question:

Future Work

f : x → y <x1i> <x2i> <x3i> <x4i>

Page 26: On Semi-Supervised Learning and Beyond

• Ex1: • Ex2:

Multi-Edge Graph-Based SSL Future Work

Page 27: On Semi-Supervised Learning and Beyond

Any Questions?

?

Page 28: On Semi-Supervised Learning and Beyond

• Discriminative models

– 방법론: 결정경계의 도입

– PR이 처음 레이더 신호 해석에 쓰이기 시작하던 1950년대부터, 1990년대 중반까지 사실상 PR을 대표하는 독점적인 방법이었음

– Rosenblat의 Perceptron(1958)과, PDP학파의 MLP(1986)역시 이러한 방향에서 주장된 것이었음

• Generative models

– 1996년, PDP학파의 핵심멤버였던 Geoffrey Hinton에 의해 처음 소개됨 (Hinton, G., Using Generative Models for Handwritten Digit Recognition, tPAMI, 1996.)

– 이로 인해, clustering 정도 밖에 없다고 여겨졌던 unsupervised learning도 다시 조명을 받게 되었고, 곧 subspace analysis(ex: PCA)라는 우군을 얻게 되어 급격히 발전함

– 즉, class의 위치가 반드시 서로 다른 class간에 떨어져 있으리란 법이 없으며, 따라서 그보다는 분포를 잘 묘사할 중심분포, 즉 혼재된 basis들로 기술해야한다는 관점임 (ex: 푸리에 급수)

Appendix 1 GM vs. DM

Page 29: On Semi-Supervised Learning and Beyond

The Use of SSL Methods[1]

• Do the classes produce well clustered data? – EM w/ generative mixture models

• Is the existing supervised classifier complicated and hard to modify? – Self-training

• Do the features naturally split into two sets? – Co-training

• Already using SVM? – TSVMs

• Is it true that two points with similar features tend to be in the same class? – Graph-based methods

Appendix 2

Page 30: On Semi-Supervised Learning and Beyond

[1] Zhu, X., (2005). Semi-Supervised Learning Literature Survey, Computer Sciences, University of Wisconsin-Madison.

[2] Seeger, M., (2001). Learning with labeled and unlabeled data (Technical Survey). [3] Nigam, K., McCallum, A. K., Mitchell, T. M., (2000). Text Classification from

Labeled and Unlabeled Documents using EM, Machine Learning 39, 103-134. [4] Mitchell, T. M., (1999). The Role of Unlabeled Data in Supervised Learning, Sixth

International Colloquium on Cognitive Science. [5] Raina, R., Battle, A., Packer, B., Ng, A. Y., (2007). Self-taught Learning: Transfer

Learning from Unlabeled Data, 24th International Conference on Machine Learning. [6] Cozman, F. G., Cohen, I., Cirelo M., (2002). Unlabeled data can degrade

classification performance of generative classifiers, FLAIRS-02. [7] Balcan, M., Blum, A., Choi, P. P., Lafferty, J., Pantano, B., Rwebangira, M. R.,

Zhu, X., (2005). Person Identification in Webcam Images: An Application of Semi-Supervised Learning, Proc. of the 22 st ICML Workshop on Learning with Partially Classified Training Data, Bonn, Germany.

References