Intelligent Database Systems Lab
Presenter : JHOU, YU-LIANG
Authors :Shady Shehata , Fakhri Karray, Mohamed S. Kamel, Fellow
2012, IEEE
An Efficient Concept-Based Mining Model for Enhancing Text Clustering
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodology EvaluationConclusionsComments
Intelligent Database Systems Lab
Motivation• In text mining ,the term frequency is
computed to explore the importance of the term in document.
• However, two terms can have the same frequency in documents, but one term contributes more to the meaning of its sentences than the other term.
Intelligent Database Systems Lab
ObjectivesUsing Concept-Based Mining Model for Text Clustering , improve the clustering quality.
Intelligent Database Systems Lab
Methodology CONCEPT-BASED MINING MODEL
Ex: a concept c which appears twice in document d in the first and the second sentences The concept c appears five times in the verb argument structures of the first sentence s 1 , and three times in the verb argument structuresof the second sentence s 2 . ans : ctf value = (5+3)/2=4
Intelligent Database Systems Lab
Methodology Example of Conceptual Term Frequency
. [ARG0 Texas and Australia researchers] have [TARGET created] [ARG1 industry-ready sheets of materials made from nanotubes that could lead tothe development of artificial muscles].
[ARG1 materials] [TARGET made ] [ARG2 from nanotubes that could leadto the development of artificial muscles].
[ARG1 nanotubes] [R-ARG1 that] [ARGM-MOD could] [TARGET lead] [ARG2 to the development of artificial muscles].
Intelligent Database Systems Lab
Methodology Example of Conceptual Term Frequency
1. First verb argument structure for the verb created:. [ARG0 Texas and Australia researchers]. [TARGET created]. [ARG1 industry-ready sheets of materials madefrom nanotubes that could lead to the development of artificial muscles].
2. Second verb argument structure for the verb made:. [ARG1 materials]. [TARGET made]. [ARG2 from nanotubes that could lead to the development of artificial muscles].
3. Third verb argument structure for the verb lead:. [ARG1 nanotubes]. [R-ARG1 that]. [ARGM-MOD could]. [TARGET lead]. [ARG2 to the development of artificial muscles].
Intelligent Database Systems Lab
MethodologyExample of Conceptual Term Frequency
1. Concepts in the first verb argument structure of the verb created:. Texas Australia researchers. created. industry-ready sheets materials nanotubes lead development artificial muscles
2. Concepts in the second verb argument structure of the verb made:. materials. nanotubes lead development artificial muscles
3. Concepts in the third verb argument structure of the verb lead:. nanotubes. lead. development artificial muscles.