Upload
kennedy-scranton
View
222
Download
4
Tags:
Embed Size (px)
Citation preview
Exploring the Query-Flow Graph with a Mixture Model for
Query Recommendation
Lu Bai, Jiafeng Guo, Xueqi Cheng, Xiubo Geng, Pan Du
Institute of Computing Technology , CAS
Outline
• Introduction• Our approach• Experimental results• Conclusion & Future work
Introduction
• Query recommendation – Generated from web query log – Different types of information
are considered, including search results, clickthrough data, search sessions.
Introduction
• Recently, query-flow graph was introduced into query recommendation.
360 Xbox 360 kinect
360 Xbox 360 Xbox 720
Yahoo 360
Kinect Xbox 720 1 121 1
Yahoo Yahoo mail
Yahoo mail Yahoo messenger
Yahoo messenger Yahoo
1
1 1
apple Yahoo
apple apple tree
11
Introduction• Traditionally, personalized random walk over query-
flow graph was used for recommendation.• Dangling queries– No out links– Nearly 9% of whole queries
• Ambiguous queries – Mixed recommendation
• Hard to read
– Dominant recommendation• Cannot satisfy different needs
Query = 360
Xbox 360Xbox 720
Kinect
1 121 1
1
1 1
11
Query = apple
Yahoo
apple tree
Yahoo mail
Our Work
• Explore query-flow graph for better recommendation– Apply a novel mixture model over query-flow
graph to learn the intents of queries.– Perform an intent-biased random walk on the
query-flow graph for recommendation.
Probabilistic model of generating query-flow graph
• Model the generation of the query-flow graph with a novel mixture model
• Assumptions– Queries are triggered by query intents.– Consecutive queries in one search session are
from the same intent.
Probabilistic model of generating query-flow graph
• Process of generating a directed edge– Draw an intent indicator
from the multinomial distribution .
– Draw query nodes from the same multinomial intent distribution , respectively.
– Draw the directed edge from a binomial distribution
ije
ijrg
,i jq q
ije
,ij r
r
, , ,11 : ( )
Pr( | , , )ijN K
r r i r j ij rri j j C i
wG
Likelihood function
Probabilistic model of generating query-flow graph
• EM algorithm is used to estimate parameters– E step
– M step
,1 : ( )
,1 1 : ( )
N
ij ij ri j j C i
r K N
ij ij rr i j j C i
w q
w q
����������������������������
����������������������������
, ,: ( ) : ( )
,
, ,1 : ( ) : ( )
ij ij r ki ki rj j C i k i C k
r i N
ij ij r ki ki ri j j C i k i C k
w q w q
w q w q
��������������������������������������������������������
��������������������������������������������������������
,
,, ,
ij ij r
ij rij ij r ji ji r
w q
w q w q
����������������������������
����������������������������������������������������������������������
, , ,
,
, , ,1
r r i r j ij r
Kij r
r r i r j ij rr
q
��������������
��������������
��������������
Intent-biased random walk• Based on the learned query intents, we apply intent-
biased random walk for query recommendation.
– Dangling queries: back off to its intents– Ambiguous queries: recommend under the each intent
, ,(1 )i r i rA M P 1
, · (1 )Ti r i rP e ,
A row vector of query distribution of intent r
transition probability matrix
preference vector
All entries are zeroes, except that the i-th is 1
row normalized weight matrix
Experiments
• Data Set– A 3-month query log generated from a commercial
search engine.– Sessions are split by 30 minutes. – No stemming and no stop words removing.– The biggest connected graph is extracted for
experiments, which is consisted of 16,980 queries and 51,214 edges.
Experiments
• Learning performance on different intent number.
Experiments
• Learned query intents:lyrics cars poemslyrics bmw poems
song lyrics lexus love poems
lyrics com audi poetry
a z lyrics toyota friendship poems
music lyrics acura famous love poems
azlyrics nissan love quotes
lyric infiniti sad poems
az lyrics mercedes benz quotes
rap lyrics volvo mother s day poems
country lyrics mercedes mothers day poems
Experiments
• Dangling query suggestion • Ambiguous query suggestionQuery = yamaha motor
Baseline Ours
mapquest yamaha
american idol honda
yahoo mail suzuki
home depot kawasaki
bank of america yamaha motorcycles
target yamaha motorcycle
Query = hilton
Baseline Ours
marriott [hotel]
expedia marriott
holiday inn holiday inn
hyatt sheraton
hotel hampton inn
mapquest embassy suites
hampton inn hotels com
sheraton [celebrity]
hilton com paris hilton
hotels com michelle wie
embassy suites nicole richie
residence inn jessica simpson
choice hotels pamela anderson
marriot daniel dipiero
hilton honors richard hatch
Experiments
• Performance improvement based on user click behaviors
Baseline method Our approachAverage Hit Number 4.09 4.21(+2.9%)
Average Hit Score 0.598 0.652(+9.0%)
Average Score 0.181 0.194(+7.1%)
Conclusion and Future work
• conclusion– We explore the query-flow graph with a novel
probabilistic mixture model for learning query intents.
– An intent-biased random walk is introduced to integrate the learned intents for recommendation.
• Future work– Learn query intents with more auxiliary
information: clicks, URLs, words etc.