Upload
rendor
View
22
Download
0
Embed Size (px)
DESCRIPTION
Improving Health Question Classification by Word Location Weights. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Background Problem definition The proposed approach: WLW Empirical evaluation Conclusion. Background. Categories of Health Questions. - PowerPoint PPT Presentation
Citation preview
Improving Health Question Classification
by Word Location Weights
Rey-Long Liu
Dept. of Medical Informatics
Tzu Chi University
Taiwan
Outline
• Background
• Problem definition
• The proposed approach: WLW
• Empirical evaluation
• Conclusion
2
Background
3
Categories of Health Questions
4
Classification of Health Questions
• Why health questions?– Health questions provide both reliable and
readable health information
• Why classification of health questions?– Given a health question q, retrieve related
questions (and their answers)
5
Problem Definition
6
Goal & Motivation• Goal
– Target: Chinese Health Questions (CHQs)– Contribution: Developing a technique WLW
(Word Location Weight) that estimates the location weights of words in a CHQ based on their locations
• Motivation– Location weights can be used by classifiers (e.g.,
SVM) to improve the classification • Classifying in-space CHQs (cause, diagnosis, process)
• Filtering out-space CHQs (may be whatever)7
Basic Idea
• Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ
• Examples:如何 (how to)克服 (deal with)緊張 (nervous)的情緒 (mood)? process
嬰兒 (infant)體溫 (body temperature)太低 (too low)怎麼辦 (how to do)? process
8
Related Work
• Recognition of question types (e.g., when, where) – Weakness: Types Intended categories of CHQs
• Classification by parsing– Weakness I: Parsing Chinese is still challenging– Weakness II: CHQs are NOT always well-formed
• Classification by pattern matching– Weakness: Difficult to construct the string patterns
9
The Proposed Approach: WLW
10
Main Challenges
(1) Defining the two weights of a location p in a CHQ q
11
Main Challenges (cont.)
(2) Encoding the location weights of a word w into two features for the underlying classifier
12
Interesting Behaviors of WLW
• A word w in a question q has two features– Fvaluefront and Fvaluerear
– Applicable to different categories and languages (e.g., English)
• When w is far from the front and the rear– Both features reduce to the term frequency (TF) of w– WLW reduces to traditional feature-encoding
approach (using TF as the features)
13
Empirical Evaluation
14
Experimental Design
• CHQs were downloaded from a health information provider– 864 in-space CHQs
• cause (category 1): 313 • diagnosis (category 2): 92 • process (category 3): 459
– 100 out-space CHQs• whatever (general description)
• Five-fold cross validation
15
Underlying Classifiers
• Underlying classifier – The Support Vector Machine (SVM)
classifier
16
Results: Classification of In-Space CHQs
• Evaluation criteria– Micro-averaged F1 (MicroF1)
– Macro-averaged F1 (MacroF1)
17
SVM+WLW is significantly better than SVM
18
Results: Filtering of Out-Space CHQs
• Evaluation criteria– Filtering ratio (FR) =
# out-space CHQs successfully rejected by all categories / # out-space CHQs
– Average number of misclassifications (AM) =
# misclassifications for the out-space CHQs / # out-space CHQs
19
SVM+WLW achieves higher FR and lower AM
20
Conclusion
21
• Healthcare consumers often read health information on the Internet
• Health questions as the valuable resources for healthcare consumers– Providing both reliable and readable health
information
• Classification of health questions is basis for the retrieval of related questions– cause, diagnosis, process, whatever
• WLW can help SVM to improve the classification of CHQs
22