32
Incremental Context Mining for Adaptive Document Classification Advisor Dr. Hsu Graduate Chien-Shing C hen Author Rey-Long Liu Yun-Ling Lu

Incremental Context Mining for Adaptive Document Classification

  • Upload
    polly

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Incremental Context Mining for Adaptive Document Classification. Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Rey-Long Liu Yun-Ling Lu. Outline. Motivation Objective Introduction Overview of the approach Incremental context mining for ACclassifier Experiments - PowerPoint PPT Presentation

Citation preview

Page 1: Incremental Context Mining for Adaptive Document Classification

Incremental Context Mining for Adaptive Document Classification

Advisor : Dr. HsuGraduate : Chien-Shing ChenAuthor : Rey-Long Liu

Yun-Ling Lu

Page 2: Incremental Context Mining for Adaptive Document Classification

Motivation Objective Introduction Overview of the approach Incremental context mining for ACclassifier Experiments Conclusions Personal Opinion Review

Outline

Page 3: Incremental Context Mining for Adaptive Document Classification

Motivation

Adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion.

Page 4: Incremental Context Mining for Adaptive Document Classification

Objective

1.CR terms should be mined by analyzing multiple documents from multiple categories.

2.Inappropriate feature may introduce the problems of inefficiency and errors.

3.ADC may serve as the basis for supporting efficient and high-precision DC.

Page 5: Incremental Context Mining for Adaptive Document Classification

1.Introduction

Two components of ACclassifier (Adaptive Context-based Classifier). 1. An incremental context miner 2. Document classifier.

Both components work on a given text hierarchy in which a node corresponds to a document category.

Page 6: Incremental Context Mining for Adaptive Document Classification

2.Overview of the approach

CR of 管理學院

CR of 資管 CR of 財管

CR of MIS CR of DSS CR of 管理學

Page 7: Incremental Context Mining for Adaptive Document Classification

3-1.An incremental context miner

管理學院

資管 財管

MIS DSS 管理學

Page 8: Incremental Context Mining for Adaptive Document Classification

3-2.An incremental context miner

資管

MIS DSS

Computer 5/20Dos 10/20EC 2/20

Manage 5/30BtoB 3/20 Computer 10/40

Notebook 3/40Computer 3/15

Page 9: Incremental Context Mining for Adaptive Document Classification

3-3.CR

MIS

Computer 15/90Dos 10/90

Manage 5/90EC 3/90

BtoB 3/90Notebook 3/90

CR : Contextual Requirement of the category

DSS

Computer 3/15EC 10/15

Page 10: Incremental Context Mining for Adaptive Document Classification

Strength: w serving as a context word for the documents under c

TFIDF (Term Frequency * Inverse Document Frequency)

3-4. TFIDF

Page 11: Incremental Context Mining for Adaptive Document Classification

3-5. TFIDF

Strength(Wcomputer,CMIS)=

Strength(Wdos,CMIS)=

Page 12: Incremental Context Mining for Adaptive Document Classification

3-6. The incremental context miner

資管

MIS DSS

S(computer)=0.909

S(dos)=2S(EC)=0.476

S(computer)=0.022

S(computer)>0.909 電機

Page 13: Incremental Context Mining for Adaptive Document Classification

3-7.An incremental context miner

Page 14: Incremental Context Mining for Adaptive Document Classification

4-1. DOA

Given a document d to be classified, the basic idea is to compute the degree of acceptance (DOA).The DOA is computed based on the strengths of d ’s distinct words on c.

Page 15: Incremental Context Mining for Adaptive Document Classification

4-2. Two phases of classifier

(1) The estimation of DOA for each category.(2) The identification of the winner category.

Page 16: Incremental Context Mining for Adaptive Document Classification

4-3. Estimation of DOA for each C

DOA of 管理學院

DOA of 資管 DOA of 財管

DOA of MIS DOA of DSS DOA of 管理學

Page 17: Incremental Context Mining for Adaptive Document Classification

4-4. DOA

Frequency:5D1 : 5000minSupport:0.001

If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d.

Page 18: Incremental Context Mining for Adaptive Document Classification

4-5. Constraint I

New Di

Computer 20/40DOS 10/40Java 2/40

Mouse 3/40Delphi 1/40

Page 19: Incremental Context Mining for Adaptive Document Classification

4-6. Constraint II

資管課程

作業系統S(DOS)

=2

演算法S(DOS)=0.9982

資訊網路S(DOS)

=0.6

電子商務S(DOS)=0.003

資料結構S(DOS)=1.112

Page 20: Incremental Context Mining for Adaptive Document Classification

4-7. Given a document to be classified

MIS DSS New Di

Computer 20/40DOS 10/40

S(computer)=0.909

S(dos)=2

S(EC)=0.476

S(computer)=0.022

If w is a strong context word in c and occurs many times in d, c is more likely to “accept” d.

Page 21: Incremental Context Mining for Adaptive Document Classification

4-8. DOA

DOAMIS=0.909 * 20/40 = 0.4545

DOAMIS=2 * 10/40 = 0.5

DOAMIS=0.9545

DOAMIS of Dnew

Page 22: Incremental Context Mining for Adaptive Document Classification

DOA of 管理學院

DOA of 資管 DOA of 財管

DOA of MIS DOA of DSS DOA of 管理學

DOA of 管理實務

4-9. Complete the DOA of all Category

Page 23: Incremental Context Mining for Adaptive Document Classification

4-9. The document classifier

Page 24: Incremental Context Mining for Adaptive Document Classification

5-1. correct classification

Builting from the 1100 documents for initial training.

Page 25: Incremental Context Mining for Adaptive Document Classification

5-2. correct classification

Baseline :allowed to use 5000 features in their feature set.

Page 26: Incremental Context Mining for Adaptive Document Classification

5-3. correct classification

Using all training documents to build their feature set and classifiers.

Page 27: Incremental Context Mining for Adaptive Document Classification

5-4. Consider the test document entitled

“Setting up Email in DOS with today’s ISP using a dialup PPP TCP/IP connection”.

Baseline systems: “Software”,””Windows”,and “Operating Systems”

ACclassifier:”TCP/IP”,”connection”,”computernetworking”,”userID”

Page 28: Incremental Context Mining for Adaptive Document Classification

5-5. cumulative training & testing time(sec.)

The time spent by ACclassifier grew slower when about 1400 training documents were entered.

Page 29: Incremental Context Mining for Adaptive Document Classification

5-6. cumulative training & testing time(sec.)

The time spent by ACclassifier grew slower when about 1400 training documents were entered.

Page 30: Incremental Context Mining for Adaptive Document Classification

6. Conclusions

1.Efficient mining of the contextual requirements for high-precision DC.2.Incremental mining without reprocessing previous documents.3.Evolutionary maintenance of the feature set.4.Efficient and fault-tolerant hierarchical DC.

Page 31: Incremental Context Mining for Adaptive Document Classification

7.Personal Opinion

It’s acceptable on purity in hierarchy.

Page 32: Incremental Context Mining for Adaptive Document Classification

8.Review