Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?

Preview:

DESCRIPTION

My presentation at SSLNLP Workshop (NAACL 2009) on June 4th, 2009

Citation preview

Is Unlabeled Data Suitable for Multiclass SVM-basedWeb Page Classification?

Arkaitz Zubiaga, Vıctor Fresno, Raquel Martınez

Universidad Nacional de Educacion a Distancia

June 4, 2009

Text Classification

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 2 / 31

Text Classification

What is it?

We have a set of documents:

D = {d1, ..., d|D|}

With a set of predefined categories:

C = {c1, ..., c|C |}

Classification is known as:

〈dj , ci 〉 ∈ D × C

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 3 / 31

Motivation

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 4 / 31

Motivation

Motivation

Several studies for plain text classification (news), but a few for webpage classification.

Typical web page classification task:

Semi-supervised: not much labeled documents.Multiclass: taxonomy > 2.

(Joachims, 1999) proved the suitability of unlabeled data for binarytasks.

What about multiclass tasks?(Chapelle et al., 2006) did it over image datasets, but never fortext/web pages.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 5 / 31

Support Vector Machines

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 6 / 31

Support Vector Machines

SVM

It looks for a hyperplane to separate the classes

Margin maximization

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 7 / 31

Support Vector Machines

SVM

It looks for a hyperplane to separate the classes

Margin maximization

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 7 / 31

Support Vector Machines

SVM

Optimization function: min 12 ||ω||

2 + C ·∑n

i=1 ξdi

Subject to: yi (ω · xi + b) ≥ 1− ξi , ξi ≥ 0

It only handles binary and supervised problems by nature.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 8 / 31

Multiclass SVM

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 9 / 31

Multiclass SVM

Multiclass SVM

Approaches to multiclass SVM:

Direct.Combining binary classfiers.

One-against-one.One-against-all.

Usually applied to supervised tasks, but hardly ever to semi-supervisedones.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 10 / 31

Multiclass SVM

Multiclass SVM: Direct approach

The optimization function considers all the hyperplanes at the sametime.

min1

2

n∑m=1

||wm||2 + Cl∑

i=1

∑m 6=yi

ξmi

Subject to:

wyi · xi + byi ≥ wm · xi + bm + 2− ξmi , ξmi ≥ 0

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 11 / 31

Multiclass SVM

Multiclass SVM: One-against-one

It creates k·(k−1)2 binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 12 / 31

Multiclass SVM

Multiclass SVM: One-against-one

It creates k·(k−1)2 binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 12 / 31

Multiclass SVM

Multiclass SVM: One-against-one

It creates k·(k−1)2 binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 12 / 31

Multiclass SVM

Multiclass SVM: One-against-one

It creates k·(k−1)2 binary classifiers

sign(ωTij · x + bij) −→ Add a vote for the winning class between i and j

The class with more votes will be the output.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 12 / 31

Multiclass SVM

Multiclass SVM: One-against-all

It creates k binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 13 / 31

Multiclass SVM

Multiclass SVM: One-against-all

It creates k binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 13 / 31

Multiclass SVM

Multiclass SVM: One-against-all

It creates k binary classifiers

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 13 / 31

Multiclass SVM

Multiclass SVM: One-against-all

It creates k binary classifiers

Ci = arg maxi=1,...,k

(ωi · x + bi )

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 13 / 31

S3VM

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 14 / 31

S3VM

Semi-supervised SVM (S3VM)

Unlabeled documents are considered during the learning phase.

The optimization function results:

min1

2· ||ω||2 + C ·

l∑i=1

ξdi + C ∗ ·u∑

j=1

ξ∗d

j

Convex optimization algorithms required.

Commonly used over binary taxonomies, but hardly ever with moreclasses.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 15 / 31

Multiclass S3VM

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 16 / 31

Multiclass S3VM

Multiclass S3VM

(Yajima and Kuo, 2006) present the following optimization function:

min(1

2

h∑i=1

βiT K−1βi + Cl∑

j=1

∑i 6=yj

max(0, 1− (βyj

j − βij ))2)

where β represents the product of a vector and a kernel matrix defined bythe author.

(Chapelle et al., 2006): direct approach by means of the ContinuationMethod.

2 steps:

(Qi et al., 2004) use Fuzzy C-Means to predict new unlabeleddocuments.(Xu and Schuurmans, 2005) rely on a clustering-based approach tolabel the unlabeled data.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 17 / 31

Compared Approaches: Multiclass SVM vs Multiclass S3VM

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 18 / 31

Compared Approaches: Multiclass SVM vs Multiclass S3VM

Multiclass SVM vs Multiclass S3VM

2-steps-SVM/1-step-SVM: Multiclass SVM.Does an intermediate step adding newly labeled data improveclassifier’s performance?

One-against-all-S3VM/One-against-all-SVM.

One-against-one-S3VM/One-agaisnt-one-SVM.Does unlabeled data help to improve binary combining classifier’sresults?

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 19 / 31

Experiments

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 20 / 31

Experiments

Experiments settings

Datasets:

BankSearch: 10.000 web documents / 10 categories (4.000 for thetraining set).WebKB: 4.518 web documents / 6 categories (2.000 for the trainingset).Yahoo! Science: 788 web documents / 6 categories (200 for thetraining set).

Numerous labeled/unlabeled sets.

9 executions for each.

Representation: TF-IDF.

Software:

SVM-light (http://svmlight.joachims.org)SVM-multiclass

Evaluation by means of the accuracy (percent of correct predictions).

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 21 / 31

Results

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 22 / 31

Results

Results: BankSearch

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 23 / 31

Results

Results: WebKB

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 24 / 31

Results

Results: Yahoo! Science

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 25 / 31

Results

Results

Supervised multiclass approaches (2-steps-SVM & 1-step-SVM)outperform the rest.

Among binary combinations, one-against-all outperformsone-against-one.

Unlabeled data slightly helps for one-against-all.

1-step-SVM and 2-steps-SVM show similar results, except forWebKB, where the former wins.

It could be due to the homogeneous nature of the WebKB dataset.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 26 / 31

Conclusions and Outlook

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 27 / 31

Conclusions and Outlook

Conclusions

Comparison of multiclass SVM and S3VM approaches for web pageclassification.

Direct and combining approaches.

Direct approaches outperform the rest.

Unlabeled data did not provide considerable improvements, and evenprovide worsenings in some cases.

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 28 / 31

Conclusions and Outlook

Future Work

To add more multiclass S3VM approaches to the study.

To test with different SVM settings (kernel, parameters,...).

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 29 / 31

Thank you

Index

1 Text Classification

2 Motivation

3 Support Vector Machines

4 Multiclass SVM

5 S3VM

6 Multiclass S3VM

7 Compared Approaches: Multiclass SVM vs Multiclass S3VM

8 Experiments

9 Results

10 Conclusions and Outlook

11 Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 30 / 31

Thank you

Thank you

Thank you

A. Zubiaga, V. Fresno, R. Martınez (UNED) Unlabeled Data for Multiclass SVM June 4, 2009 31 / 31

Recommended