22
Improving Health Question Classification by Word Location Weights Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Improving Health Question Classification by Word Location Weights

  • Upload
    rendor

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving Health Question Classification by Word Location Weights. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Background Problem definition The proposed approach: WLW Empirical evaluation Conclusion. Background. Categories of Health Questions. - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Health Question Classification by Word Location Weights

Improving Health Question Classification

by Word Location Weights

Rey-Long Liu

Dept. of Medical Informatics

Tzu Chi University

Taiwan

Page 2: Improving Health Question Classification by Word Location Weights

Outline

• Background

• Problem definition

• The proposed approach: WLW

• Empirical evaluation

• Conclusion

2

Page 3: Improving Health Question Classification by Word Location Weights

Background

3

Page 4: Improving Health Question Classification by Word Location Weights

Categories of Health Questions

4

Page 5: Improving Health Question Classification by Word Location Weights

Classification of Health Questions

• Why health questions?– Health questions provide both reliable and

readable health information

• Why classification of health questions?– Given a health question q, retrieve related

questions (and their answers)

5

Page 6: Improving Health Question Classification by Word Location Weights

Problem Definition

6

Page 7: Improving Health Question Classification by Word Location Weights

Goal & Motivation• Goal

– Target: Chinese Health Questions (CHQs)– Contribution: Developing a technique WLW

(Word Location Weight) that estimates the location weights of words in a CHQ based on their locations

• Motivation– Location weights can be used by classifiers (e.g.,

SVM) to improve the classification • Classifying in-space CHQs (cause, diagnosis, process)

• Filtering out-space CHQs (may be whatever)7

Page 8: Improving Health Question Classification by Word Location Weights

Basic Idea

• Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ

• Examples:如何 (how to)克服 (deal with)緊張 (nervous)的情緒 (mood)? process

嬰兒 (infant)體溫 (body temperature)太低 (too low)怎麼辦 (how to do)? process

8

Page 9: Improving Health Question Classification by Word Location Weights

Related Work

• Recognition of question types (e.g., when, where) – Weakness: Types Intended categories of CHQs

• Classification by parsing– Weakness I: Parsing Chinese is still challenging– Weakness II: CHQs are NOT always well-formed

• Classification by pattern matching– Weakness: Difficult to construct the string patterns

9

Page 10: Improving Health Question Classification by Word Location Weights

The Proposed Approach: WLW

10

Page 11: Improving Health Question Classification by Word Location Weights

Main Challenges

(1) Defining the two weights of a location p in a CHQ q

11

Page 12: Improving Health Question Classification by Word Location Weights

Main Challenges (cont.)

(2) Encoding the location weights of a word w into two features for the underlying classifier

12

Page 13: Improving Health Question Classification by Word Location Weights

Interesting Behaviors of WLW

• A word w in a question q has two features– Fvaluefront and Fvaluerear

– Applicable to different categories and languages (e.g., English)

• When w is far from the front and the rear– Both features reduce to the term frequency (TF) of w– WLW reduces to traditional feature-encoding

approach (using TF as the features)

13

Page 14: Improving Health Question Classification by Word Location Weights

Empirical Evaluation

14

Page 15: Improving Health Question Classification by Word Location Weights

Experimental Design

• CHQs were downloaded from a health information provider– 864 in-space CHQs

• cause (category 1): 313 • diagnosis (category 2): 92 • process (category 3): 459

– 100 out-space CHQs• whatever (general description)

• Five-fold cross validation

15

Page 16: Improving Health Question Classification by Word Location Weights

Underlying Classifiers

• Underlying classifier – The Support Vector Machine (SVM)

classifier

16

Page 17: Improving Health Question Classification by Word Location Weights

Results: Classification of In-Space CHQs

• Evaluation criteria– Micro-averaged F1 (MicroF1)

– Macro-averaged F1 (MacroF1)

17

Page 18: Improving Health Question Classification by Word Location Weights

SVM+WLW is significantly better than SVM

18

Page 19: Improving Health Question Classification by Word Location Weights

Results: Filtering of Out-Space CHQs

• Evaluation criteria– Filtering ratio (FR) =

# out-space CHQs successfully rejected by all categories / # out-space CHQs

– Average number of misclassifications (AM) =

# misclassifications for the out-space CHQs / # out-space CHQs

19

Page 20: Improving Health Question Classification by Word Location Weights

SVM+WLW achieves higher FR and lower AM

20

Page 21: Improving Health Question Classification by Word Location Weights

Conclusion

21

Page 22: Improving Health Question Classification by Word Location Weights

• Healthcare consumers often read health information on the Internet

• Health questions as the valuable resources for healthcare consumers– Providing both reliable and readable health

information

• Classification of health questions is basis for the retrieval of related questions– cause, diagnosis, process, whatever

• WLW can help SVM to improve the classification of CHQs

22