Upload
holly-allison
View
218
Download
0
Embed Size (px)
DESCRIPTION
33 Community-based Question Answering Knowledge dissemination, information seeking Natural language questions Explicit, self-contained answers
Citation preview
11
A Classification-based Approach to Question Routing in Community Question Answering
Tom Chao Zhou1, Michael R. Lyu1, Irwin King1,2
1 The Chinese University of Hong Kong2 AT&T Labs Research
{czhou,lyu,king}@[email protected]
Workshop on Community Question Answering on the Webin Conjunction with World Wide Web 2012
April 17, 2012
22
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
33
Community-based Question Answering• Knowledge dissemination, information
seeking• Natural language questions• Explicit, self-contained answers
44
How CQA Works
SubmitQuestion
GetAnswers?
Answer Selection, Question Resolved
yes
no
Question Not Resolved
CQA users
• The number of posted questions grows fast.
• Whether users could get questions resolved within a reasonable period?
55
Whether Questions Get Resolved• Randomly sample 140 questions from each
category in Yahoo! Answers• 26 top-level categories• In total 3,640 questions• Track the status of each question
6
1 2 3 4 5 6 7 8 911.95% 19.95% 24.75% 26.48% 27.31% 51.32% 61.92% 63.41% 64.45%
Percentage of Questions Resolved
77
CQA users
How CQA Works
SubmitQuestion
GetAnswers?
Answer Selection, Question Resolved
yes
no
Question Not Resolved
How about we carefully select a set of CQA users who may be interested in the question?
88
Question Routing• Definition
– Routing open questions to suitable answerers who may be interested in the question
Not interestedin the question
Interestedin the question
No
Yes
99
Question Routing• Benefits
– Asker’s Perspective• Reduce time lag between the time a question is
posted and it is answered– Answerer’s Perspective
• More enthusiastic in providing answers for interested questions
– CQA’s perspective• Leverage users’ answering passion, leading to the
improvement of the CQA, as well as the boosts of the user’s adhesiveness and loyalty to the system
1010
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
1111
Problem Definition
Question Routing Problem
Given a question and a user in CQA, determine whether the user will contribute his/her
knowledge to answer the question
1212
Feature Investigation• Local Features
– Only local information about question, user history and question-user relationships are needed
• Global Features– Take into account the global information of CQA – Consider category as the global information – Questions in the same category discuss similar
topics – Incorporating global information act as the
smoothing effect
1313
Feature Investigation
# of features Question User History Question-User Relationship
Local Features
3 10 7
Global Features
3 2 1
Feature Investigation Summary
1414
Local Features• Question (3 features)
– Question Length• Agichtein et al. 2008 found question length an
important feature to measure question quality1.Title length2.Detail length
– Question Type3.5W1H type
– Why, what, where, who and how
1515
Local Features• User History (10 features)
– Users’ history would have implications for users’ interests and behaviors
– Profile, question and answering behaviors1.Member since2.Percentage of best answer3.Total points4.Number of answers5.Number of best answers6.Number of asked questions7.Number of resolved questions
1616
Local Features• User History (10 features)
8. Number of stars received9. Answer/question ratio10.Best answer/question ratio
1717
Local Features• Question-User Relationship (7 features)
– Capture the relationship between a question and a user
– Features adapted from the existing CQA service1. Top contributor
– Features that measure the extent the user is interested in the category given question belongs to
2. Ratio of answered question in the category3. Ratio of best answered question in the category4. Ratio of asked question in the category5. Ratio of starred question in the category
1818
Local Features• Question-User Relationship (7 features)
– Features describing the similarity of the question’s language model and the user’s language model
6. KL-divergence between given question and a user’s answered questions
7. KL-divergence between given question and a user’s background language model (answered, asked, and starred questions)
1919
Global Features• Question (3 features)
– Category-level features that smooth each question
1. Average title length2. Average detail length
– Whether the question is representative in the category
3. KL-divergence value between given question and questions in the category given question belongs to
2020
Global Features• User History (2 features)
– Capture the uniqueness of a user• Question-User Relationship (1 feature)
– The more similar the language model of a user’s answered questions and that of the questions in a category, the more probable a user would answer the questions from the category
• KL-divergence between the user’s answered questions and questions in the category given question belongs to
2121
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
2222
Experiments• Classification Algorithm
– Support vector machines (SVM) with linear kernel
• Metrics– Precision, recall, F1 for positive class– Accuracy for both classes
• Dataset– Crawled from 3,500 users’ “Answers”,
“Questions”, and “Starred Questions” pages from Yahoo! Answers
2323
Effect of Local Features
Precision Recall F1 AccuracyQuestion 0.5314 0.3896 0.4496 0.5157
User History 0.8278 0.4682 0.5981 0.6805Question-User Relationship
0.5824 0.935 0.7178 0.6267
• Question-User Relationship achieves the best F1 and recall• Capture the user’s performance and interests in the category
of the given question• Capture the semantic relatedness of the given question and
the user• User History achieves the best precision
• Some users are quite active in the system• These highly active users only account for a few percentage
among all users
2424
Effect of Local Features
Precision Recall F1 AccuracyQ + QU Relationship
0.5974 0.9134 0.7223 0.6435
U + QU Relationship
0.7362 0.8275 0.7792 0.7619
Q + U + QU Relationship
0.7418 0.8253 0.7814 0.7655
Top 10 features in Local features
0.6964 0.8095 0.7487 0.7241
• The combination of all local features achieves the best F1• Results of employing the top 10 features are also
encouraging
2525
Effect of Local Features• Two most important local features
– KL-divergence value between given question and questions answered by the user
• Capture the most accurate semantic relatedness between the given question and the knowledge of the user
– KL-divergence value between given question and questions answered, asked, and starred by the user
• Consider the user’s interests as well by incorporating other factors
2626
Effect of Local and Global Features
Precision Recall F1 AccuracyLocal 0.7418 0.8253 0.7814 0.7655
Global 0.5779 0.8713 0.6949 0.6109
Local + Global 0.7279 0.8499 0.7842 0.7689
• Combination of local features and global features promise to maintain the best elements of the two, and the best F1 score is consequently achieved
2727
Effect of Local and Global Features• Three most important features
– KL-divergence value between given question and questions answered by the user
– KL-divergence value between given question and questions answered, asked, and starred by the user
– KL-divergence value between given question and questions from the same category
• If a question is quite typical in the category, it would have higher chance to be answered by users, and this could also partially explain the reason why CQA services usually have well-structured categories
2828
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
2929
Related Work• Question Routing
– Zhou et al. 2009, expertise-based question routing
– Li and King 2010, language model based framework for combining expertise estimation and availability estimation
– Li et al. 2011, category-sensitive language model• Link analysis and Expert Finding
– Jurczyk and Agichtein, 2007– Zhang, Ackerman and Adamic, 2007– Apply PageRank and HITS in social media
3030
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
3131
Conclusions• Formulate question routing as a
classification task• Derive a variety of local and global
features• Analyze the contributions from different
sources• Thorough experimental study
3232
Future Work• Semi-supervised approach• Incorporate social aspects into the model
3333
Thanks Q&A