16
Intelligent Database Systems Presenter : MIN-CONG WU Authors : LAM HONG LEE, DINO ISA, WOU ONN CHOO , WEN YEEN CHUE 2012.ESA High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Outlines

  • Upload
    lenka

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic. Presenter : Min-Cong Wu Authors : Lam Hong Lee, Dino Isa, Wou Onn Choo , Wen Yeen Chue 2012.ESA. Outlines. Motivation Objectives Methodology Experiments - PowerPoint PPT Presentation

Citation preview

Page 1: Outlines

Intelligent Database Systems Lab

Presenter : MIN-CONG WU

Authors : LAM HONG LEE, DINO ISA, WOU ONN CHOO ,

WEN YEEN CHUE

2012.ESA

High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Page 2: Outlines

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Outlines

Intelligent Database Systems Lab

Motivation• Bayesian classification as compared to other

classification approaches is its ability and simplicity in handling raw text data directly

• As a trade-off to its simplicity, Bayesian classification has been reported as one of the poorest-performing classification approaches.

Page 4: Outlines

Intelligent Database Systems Lab

Objectives

• By use to HRKE facility enhance the accuracy of Bayesian classifier without sacrificing the low cost.

Page 5: Outlines

Intelligent Database Systems Lab

Methodology – Block diagram

Page 6: Outlines

Intelligent Database Systems Lab

Methodology – Bayesian Classifier

Page 7: Outlines

Intelligent Database Systems Lab

Methodology – TF-IDF method• TF-IDF =• TF(Term Frequency), IDF(Inverse Document Frequency)• N=This word contains the number of document in dataset• Example:

TF*IDF

Boat Aircrafts Cars Trains

coupe 5 5 90 5

engine 90 90 100 100

wheel 5 90 90 90

Page 8: Outlines

Intelligent Database Systems Lab

Methodology – HRKE facility

• The degree of relevance of keywords in the classification task can be adjusted by setting a threshold, m/n.

threshold1.0 without the

inclusion of HRKE facility.

.

.0.1

Page 9: Outlines

Intelligent Database Systems Lab

Experiment-the basic flat ranking multivariate

Page 10: Outlines

Intelligent Database Systems Lab

Experiment - Featured Articles dataset

Page 11: Outlines

Intelligent Database Systems Lab

Experiment - Vehicles dataset

Page 12: Outlines

Intelligent Database Systems Lab

Experiment - Mathematics dataset

Page 13: Outlines

Intelligent Database Systems Lab

Experiment - 20-Newsgroups dataset

Page 14: Outlines

Intelligent Database Systems Lab

Experiment - Summary

Page 15: Outlines

Intelligent Database Systems Lab

Conclusions

• HRKE facility is achieved through applying unique feature selection method based on the occurrence of keywords in documents from a specified category, and compares the occurrence of those keywords in each of the competing categories.

Page 16: Outlines

Intelligent Database Systems Lab

Comments• Advantages

Improve Bayesian classification performance and can maintain low cost.

• Applications- Feature selection