22
Intelligent Database Systems Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classi cation from unlabeled documents with bootstrapping and feature projection techniques

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Presenter : Chang,Chun-Chih

Authors : Youngjoong Ko, Jungyun Seo

2009, IPM

Text classification from unlabeled documents with bootstrapping

and feature projection techniques

Page 2: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Motivation

• A general inductive process automatically builds a text classifier by learning, generally known as supervised learning.

• The most notable problem is that they require a large number of labeled training documents for accurate learning.

Page 4: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Objectives

• The propose a new text classification method based on unsupervised or semi-supervised learning

• The proposed method launches text classification tasks with only unlabeled documents.

Page 5: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-Framework

Page 6: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology -Creating keyword lists

Page 7: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology -Creating keyword lists

1 = 1.0+( 1.0 - 1.0 )

Student

traffic

is

1.0

1.0

Title WordTitle WordStudent

trafficbook

0.05

0.6

1.15 = 0.6+( 0.6 – 0.05 )

Page 8: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology -Extracting & verifying centroid-context

Page 9: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category

1.

Page 10: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category2.

3.

Page 11: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category

EX: 1. eat Banana 2. taste Banana 3. eat Apple

Page 12: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-The TCFP classifier with robustness from noisy data

Page 13: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Methodology-The TCFP classifier with robustness from noisy data

Page 14: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 15: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 16: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 17: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 18: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 19: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 20: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Experiments

Page 21: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Conclusions

• The proposed method is useful for low-cost text classification

• If some text classification tasks require high accuracy, can be used as an assistant tool for easily creating training data.

Page 22: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents

Intelligent Database Systems Lab

Comments

• Advantages– faster – less expensive

• Applications– Text classification