Upload
lenka
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic. Presenter : Min-Cong Wu Authors : Lam Hong Lee, Dino Isa, Wou Onn Choo , Wen Yeen Chue 2012.ESA. Outlines. Motivation Objectives Methodology Experiments - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
Presenter : MIN-CONG WU
Authors : LAM HONG LEE, DINO ISA, WOU ONN CHOO ,
WEN YEEN CHUE
2012.ESA
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation• Bayesian classification as compared to other
classification approaches is its ability and simplicity in handling raw text data directly
• As a trade-off to its simplicity, Bayesian classification has been reported as one of the poorest-performing classification approaches.
Intelligent Database Systems Lab
Objectives
• By use to HRKE facility enhance the accuracy of Bayesian classifier without sacrificing the low cost.
Intelligent Database Systems Lab
Methodology – Block diagram
Intelligent Database Systems Lab
Methodology – Bayesian Classifier
Intelligent Database Systems Lab
Methodology – TF-IDF method• TF-IDF =• TF(Term Frequency), IDF(Inverse Document Frequency)• N=This word contains the number of document in dataset• Example:
TF*IDF
Boat Aircrafts Cars Trains
coupe 5 5 90 5
engine 90 90 100 100
wheel 5 90 90 90
Intelligent Database Systems Lab
Methodology – HRKE facility
• The degree of relevance of keywords in the classification task can be adjusted by setting a threshold, m/n.
threshold1.0 without the
inclusion of HRKE facility.
.
.0.1
Intelligent Database Systems Lab
Experiment-the basic flat ranking multivariate
Intelligent Database Systems Lab
Experiment - Featured Articles dataset
Intelligent Database Systems Lab
Experiment - Vehicles dataset
Intelligent Database Systems Lab
Experiment - Mathematics dataset
Intelligent Database Systems Lab
Experiment - 20-Newsgroups dataset
Intelligent Database Systems Lab
Experiment - Summary
Intelligent Database Systems Lab
Conclusions
• HRKE facility is achieved through applying unique feature selection method based on the occurrence of keywords in documents from a specified category, and compares the occurrence of those keywords in each of the competing categories.
Intelligent Database Systems Lab
Comments• Advantages
Improve Bayesian classification performance and can maintain low cost.
• Applications- Feature selection