Upload
alberto-perez
View
633
Download
0
Tags:
Embed Size (px)
DESCRIPTION
http://nlp.uned.es/~alpgarcia/pub_index.php
Citation preview
Web Page Clustering Using a Fuzzy Logic BasedRepresentation and Self-organizing Maps
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez
NLP & IR Group, UNED
December 12, 2008
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 2
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 3
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Objectives
Group HTML documents by content similarity.
Self-Organizing Maps (SOM) to organize, visualize andnavigate through the collection.
Term weighting function taking advantage of HTML tags
Combining, by means of fuzzy logic, heuristic criteria based onthe inherent semantics of some HTML tags and word positionsin the document.
Hypothesis
An improvement in document representation will involve anincrease in map quality.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 4
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 5
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Fuzzy logic
Capturing human expert knowledge.
Close to natural language.
Knowledge base: defined by a set of IF-THEN rules.
Linguistic variables
Defined using natural language words and fuzzy sets.These sets allow the description of the membership degree ofan object to a particular class.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 6
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 7
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 8
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 9
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 10
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 11
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 12
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 13
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 14
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 15
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 16
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 17
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Extended Fuzzy Combination of Criteria
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 18
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 19
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 20
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 21
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 22
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 23
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 24
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Linguistic Variables
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 25
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 26
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 27
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 28
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 29
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Knowledge Base
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 30
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
1 Dimensionality Reduction2 Document Map3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 31
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Dimensionality Reduction
Input vectors dimension ranging from 100 to 5000
Stopwords, puntuaction marks suffixes, and words occurringless than 50 times in the whole corpus were removed.
Two well known methods:
Document frequency reduction.Random projection method.
Three proposed rank-based methods:
Most Valued Terms.Fixed reduction method.More Frequent Terms until n level.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 32
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
1 Dimensionality Reduction2 Document Map3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 33
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Document Map Construction
Benchmark dataset for clustering: Banksearch1
10000 documents10 classes
SOM size was set equal to the number of classes of inputdocuments, i.e. 5x2, in order to compare clustering results.
1M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing
Systems: Design, Management, and Applications, 2002.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 34
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
1 Dimensionality Reduction2 Document Map3 Evaluation Methods
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 35
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Evaluation Methods
Weighted average of the F-measure for each class.
After mapping the collection in the trained map, the classwith greater number of documents mapped on a neuron willbe selected to label the unit.
All the document vectors in a neuron which class is differentfrom the neuron label will be counted as errors.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 36
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 37
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Best reduction for each term weighting function
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 38
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
MFTn reduction provides stability
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 39
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
EFCC+MFTn obtains its best results with thesmallest number of features
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 40
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Table of Contents
1 Objectives
2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)
3 Experiment Description
4 Results
5 Conclusion
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 41
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Conclusion
Unsupervised document representation method, based onfuzzy logic, focused on clustering HTML documents by meansof self-organizing maps.
MFTn reduction is the most stable reduction in all cases.
EFCC representation allows to obtain better results using asmaller vocabulary.
Smaller number of features needed to represent the inputdocuments and SOM unit vectors, which implies animprovement in computational cost.
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 42
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Thank You!
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 43
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives Our Approach Experiment Description Results Conclusion
Related Work
VSM Topic Document Weighting ModifiesInformation Type Function SOM
Self organization ofa Massive Document Yes Yes Text Shannon’s Entrophy NoCollection2
Document Clustering Yes No Text Binary, TF, TF-IDF Nousing Phrases3
Document Clustering Yes Yes Text ESVM, HSVM, HyM Nousing WordNet4
Conceptional SOM5 Yes No Text TF Yes
2T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a
massive document collection. IEEE Trans. on Neural Networks, 2000.3
J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.4
C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.Hybrid Intell. Syst., 2004
5Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,
2008
Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 44