Upload
kosey
View
66
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Clustering tagged documents with labeled and unlabeled documents. Presenter : Jian-Ren Chen Authors : Chien -Liang Liu*, Wen -Hoar Hsaio , Chia -Hoang Lee, Chun- Hsien Chen 2013 , IPM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
Presenter : JIAN-REN CHEN
Authors : Chien-Liang Liu*, Wen-Hoar Hsaio, Chia-Hoang Lee,
Chun-Hsien Chen
2013 , IPM
Clustering tagged documents with labeled and unlabeled documents
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
MotivationTags can provide semantic information about the resources and
they can help machines perform the classification or clustering
tasks accurately.
Probabilistic latent semantic analysis (PLSA)
- aspect model
- statistical clustering model
Intelligent Database Systems Lab
ObjectivesThis study employs Constrained-PLSA to cluster tagged documents
with a small amount of seeds.
The Constrained-PLSA is based on statistical clustering model
rather than aspect model.
Intelligent Database Systems Lab
Methodology - PLSA
Terms (keywords) of the document collection
documents
E-step
M-step
Intelligent Database Systems Lab
Methodology - Constrained-PLSAE-step
M-step
Intelligent Database Systems Lab
Experiments - Data set A (CiteULike)
Intelligent Database Systems Lab
Experiments (Data set A)
Intelligent Database Systems Lab
Experiments - Data set B (CiteULike)
Intelligent Database Systems Lab
Experiments (Data set B)
Intelligent Database Systems Lab
Conclusions• The performance of ‘‘tags as words’’ representation scheme is
more stable than ‘‘words + tags’’ representation scheme.
• Unsupervised learning methods fail to function properly in
the data set with noisy information, but Constrained-PLSA
function properly and stable even though only a small amount
of labeled data is available.
Intelligent Database Systems Lab
Comments• Advantages
- Constrained-PLSA outperforms the other methods• Disadvantage
- too much artificial processing in experiment• Applications- text mining- tagged document clustering