22
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR 2009 2010.04.27 Summarized and presented by Sang-il Song, IDS Lab., Seoul National University

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Embed Size (px)

DESCRIPTION

Copyright  2010 by CEBT Context Query Classification  Motivation Example Query “Jaguar” w.o. context – Ambiguous that user is interested in “car” or “animal” Query “jaguar” before “BMW” – Clear that User is interested in “car”  Context Information Adjacent queries Clicked URLs  This paper is modeling context information with CRF 3

Citation preview

Page 1: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Context-Aware Query Classifica-tion

Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang

Microsoft Research AsiaSIGIR 2009

2010.04.27Summarized and presented by Sang-il Song, IDS Lab., Seoul National Uni-

versity

Page 2: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 2

Query Classification Query Classification (QC)

Understanding user’s search intent Classifying user queries into predefined target categories. Difference from traditional text classification

– Queries are usually very short– Many queries are ambiguous, so that it belongs to multiple cate-

gories Approaches

– Augmenting the queries with extra data (search results)– Leveraging unlabeled data to help improve the accuracy of su-

pervised learning– Expanding training data by automatically labeling some queries

in some click-through data via a self-training These approaches doesn’t consider user behavior history

Page 3: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 3

Context Query Classification Motivation Example

Query “Jaguar” w.o. context– Ambiguous that user is interested in “car” or “animal”

Query “jaguar” before “BMW”– Clear that User is interested in “car”

Context Information Adjacent queries Clicked URLs

This paper is modeling context information with CRF

Page 4: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 4

User Session User search session

Series of observation Each consists of a query and a set of URL , clicked

by user for

Page 5: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 5

Taxonomy Taxonomy

Tree of categories Each node corresponds to a predefined category

Page 6: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 6

Conditional Random Field Undirected graphical model input sequence pij depends on feature function Motivation for using CRF

Suitable for capturingcontext information

Doesn’t need anyprior knowledge

Flexible to richer features2

s1

s3

s4

p11

p22

p44

p33p23

p21

p24

p12p13p14

p32

p42

p41p43

p31

p34

Page 7: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 7

Context-Aware QC with CRFworld cup

worldcup.fifa.com

fifafifa10.ea.com

fifa news

fifaworldcup.ea.com

0.80.2

0.30.70.050.95

0.70.30.40.60.70.30.40.6

0.8

0.2

0.240.560.010.19

0.1680.0720.2240.3360.0070.0030.0760.114

soccergame

Category Label

Page 8: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 8

Conditional Probability Conditional Probability

Category label sequence Observation sequence Conditional Probability

– Z(o) : normalization factor Potential function

– fk : feature function– lk : weight of fk

Page 9: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 9

Training and Classification Training

Given Training Data Objective

– find a set of parameters– Maximize the conditional log-likelihood:

Inferring the category label ct for the test query as

Page 10: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 10

Features

Feature What does it use?

localfeature

Query terms Query terms

Pseudo feedback External Web directory

Implicit feedback External Web directory +click information

contex-tual fea-

ture

Direct Association be-tween

adjacent labelsPrevious labels

Taxonomy-based associa-tion between adjacent la-

belsTaxonomy structure

Feature

Page 11: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 11

Local Feature Query Terms

Elementary feature too sparse – training data couldn’t

cover terms sufficiently Pseudo feedback

Using top M results returned by an external Web directory

Mapping its category label to a category in the target tax-onomy

General label confidence

– Meaning the number of returned related search results of whose category labels are after mapping

Page 12: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 12

Local Features (contd.) Implicit feedback

Similar to Pseudo feedback, but using click information click-based label confidence score

Calculating1. Using Web Directory, get corresponding categories2. Obtain a document collection for each possible query3. Build a Vector Space Model for each category4. Use cosine Similarity term vector of and snippets of the

Page 13: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 13

Contextual Features Direct Association between adjacent labels

Using occurrence of a pair of labels The Higher the weight ,

the larger the probability transits into

Taxonomy-based association between adjacent labels Limited by size of training data, some transition may not

occur. Using Structure of Taxonomy The association between two

sibling categories stronger than that of two non-sibling categories

Page 14: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 14

Experimental Setup Taxonomy of ACM KDD Cup’05

Target Taxonomy 7 level-one category 67 level-two category

Data set Extracting 10,000 sessions from one day’s search log Each session contains at least two queries Three human labelers label the queries of each session

Page 15: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 15

Baseline Bridging classifier (BC)

Training a classifier on an intermediate taxonomy Bridging the queries and the target taxonomy in the online

step of QC Outperforming the winning approach in KDD Cup’ 05

Collaborating classifier (CC) Naïve context-aware approach Define score function of query q and category c by BC Using current query and past query, association of previous

category and estimated category

Page 16: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 16

Evaluation For a test query , true category label Given the classification results

is a set of the top K predicted category labels

Recall

Precision

F1 Score

Page 17: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 17

Results

CRF-B: CRF with Basic Features – Query terms, General label confi-dence and Direct association between adjacent labels

CRF-B-C: CRF-B + Click-based label confidenceCRF-B-C-T: CRF-B-C + Taxonomy-based association

The average overall recall

Page 18: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 18

Results (contd.)

The average overall F1 score

The average overall precision

Page 19: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 19

Case Study

Without considering context, Many possible search intents– General information of Santa Fe => Information\Local & Re-

gional– Travel information of Santa Fe => Living\Travel & Vacation

Page 20: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 20

Conclusions Novel Approach for leveraging context information to

classify queries by modeling search through CRFs This approach consistently outperforms a non-context-

aware baseline and a naïve context-aware baselines The effectiveness of context information

Page 21: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Copyright 2010 by CEBT 21

Discussions Experiments on real data set clearly show that this ap-

proach outperforms non-context-aware baseline

The first-query problem Not being able to find a search context if query is located at

the beginning of the session

Experiments are too simple size of session height of taxonomy

Page 22: Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR

Q & A

Thank you