Upload
clementine-howard
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Intelligent Database Systems Lab
Presenter : Chang,Chun-Chih
Authors : Youngjoong Ko, Jungyun Seo
2009, IPM
Text classification from unlabeled documents with bootstrapping
and feature projection techniques
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• A general inductive process automatically builds a text classifier by learning, generally known as supervised learning.
• The most notable problem is that they require a large number of labeled training documents for accurate learning.
Intelligent Database Systems Lab
Objectives
• The propose a new text classification method based on unsupervised or semi-supervised learning
• The proposed method launches text classification tasks with only unlabeled documents.
Intelligent Database Systems Lab
Methodology-Framework
Intelligent Database Systems Lab
Methodology -Creating keyword lists
Intelligent Database Systems Lab
Methodology -Creating keyword lists
1 = 1.0+( 1.0 - 1.0 )
Student
traffic
is
1.0
1.0
Title WordTitle WordStudent
trafficbook
0.05
0.6
1.15 = 0.6+( 0.6 – 0.05 )
Intelligent Database Systems Lab
Methodology -Extracting & verifying centroid-context
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category
1.
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category2.
3.
Intelligent Database Systems Lab
Methodology-Creating the context-cluster of each category
EX: 1. eat Banana 2. taste Banana 3. eat Apple
Intelligent Database Systems Lab
Methodology-The TCFP classifier with robustness from noisy data
Intelligent Database Systems Lab
Methodology-The TCFP classifier with robustness from noisy data
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Conclusions
• The proposed method is useful for low-cost text classification
• If some text classification tasks require high accuracy, can be used as an assistant tool for easily creating training data.
Intelligent Database Systems Lab
Comments
• Advantages– faster – less expensive
• Applications– Text classification