Upload
grace-yang
View
667
Download
0
Embed Size (px)
Citation preview
WIN-WIN SEARCH: DUAL-AGENTSTOCHASTIC GAME IN SESSIONSEARCH
Jiyun Luo Sicong Zhang Grace Hui Yang
Department of Computer ScienceGeorgetown University
jl1749, [email protected] [email protected]
2
AGE OF EMPIRE
2
A NEW PERSPECTIVE TO LOOK AT SEARCH
3
Documents to explore Information
need
Observed documents
User
Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.
3
WHY USERS MAKE CERTAIN MOVES?
4
Markov Chain of Decision Making States
RELATED WORK! queries suitable for personalization [Teevan et al. SIGIR’08]! task types [Kanoulas et al. TREC’12]! roles of task stage and task type [Liu et al. SIGIR’10]! session query changes [Guan et al. SIGIR’13]! user intensions and attention [Carterette et al. CIKM’11]! user click model [Craswell et al. SIGIR’07]! page re-ranking [Jin et al. WWW’13]! Search topics [Jones et al. CIKM’08]! Ads selection using pomdp[Yuan et al. CIKM’12]
!Our work is a retrieval model! not a user study
5
OUR SOLUTION
6
Try to find an optimal solution through a sequence of dynamic
interactions
Trial and Error: learn from repeated, varied attempts
which are continued until success
6
TRIAL AND ERROR
7
! q1 – "dulles hotels"! q2 – "dulles airport"
! q3 – "dulles airport location"
! q4 – "dulles metrostop"7
8
! Rich interactionsQuery formulation, Document clicks, Document
examination, eye movement, mouse movements, etc.! Temporal dependency
! Overall goal
RECAP – CHARACTERISTICS OFDYNAMIC IR
8
9
! Model interactions, which means it needs to have place holders for actions;
! Model information need hidden behind user queries and other interactions;
! Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;
! Represent Markov properties to handle the temporal dependency.
WHAT IS A DESIRABLE MODEL FORDYNAMIC IR
A model in Trial and Error setting will do!
A Markov Model will do!
9
10
! Two agents work together to fulfill the information need
!Dual-agent stochastic game! Partially Observable Markov Decision Process ! Joint Optimization
!To achieve Win-win
WIN-WIN SEARCH
WIN-WIN SEARCH
11
! A tuple (S, T, A, R, γ, O, Θ, B)! S : state space! T: transition matrix! A: action space(Au, Ase, Σu, Σse)! R: reward function(Ru, Rse)! γ: discount factor, 0< γ ≤1! O: observation set(Ωu, Ωse)
an observation is a symbol emitted according to a hidden state.! Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).! B: belief space
Belief is a probability distribution over hidden states.
12
Name Symbol Meanings
state S the four hidden decision states
user action Au add/remove/keep query terms
search engine action
Ase increase/decrease/keep term weights, adjust search techniques, etc.
message from user to search engine
Σu clicked and SAT clicked documents
message from search engine to user
Σse top k returned documents
user's observation Ωu observations that the user makes from the world
search engine's observation
Ωse observations that the search engine makes from the world and from the user
user reward Ru relevant information the user gains from reading the documents
search engine reward
Rse nDCG that the search gains by returning documents
belief state B belief states generated from the belief updater and shared by both agents
STATES (S)
13
SRTRelevant &
Exploitation
SRRRelevant & Exploration
SNRTNon-Relevant & Exploitation
SNRRNon-Relevant & Exploration
! scooter price ⟶ scooter stores
! collecting old US coins⟶ selling old US coins
! Philadelphia NYC travel ⟶ Philadelphia NYC train
! Boston tourism ⟶ NYC tourism
q0
ACTIONS (AU, ASE, ΣU, ΣSE)! User Action (Au)
! add query terms (+Δq)! remove query terms (-Δq)! keep query terms (qtheme)
! Search Engine Action(Ase)! increase term weights! decrease term weights! keep term weights! adjust search techniques, etc.
! Message from the user(Σu) ! clicked documents ! SAT clicked documents
! Message from search engine(Σse) ! top k returned documents 14
1. At iteration t, the user agent takes action *+,
(query change).
15
2. The search engine picks the best action *-., to search
DUAL-AGENT STOCHASTIC GAME
3. Search engine returns document set Dt
as message 4-., .
16
4. The user agent examines Dt
and sends clicks as feedback messages 4+, .
34
DUAL-AGENT STOCHASTIC GAME
Messages are essentially documents that an agent thinks they are relevant.
DUAL-AGENT STOCHASTIC GAME
5. The user agent again makes action 5+,67
(query changes).
6. The world moves into iteration t + 1.
7. The loop continues
17
4 3
OBSERVATION FUNCTION (O)
18
Probability of making observation ω after taking action a and landing in state s
e.g., Prob. of making observation ω after taking action a and landing in state
SRT=O(SREL, a, ω)O(SEXPLOITATION, a, ω)
OBSERVATION FUNCTION (O)! Intuition """" Relevant or Non-relevant?
! Observation function
89:, ; Re=, 4+, ?, ; Re=) ∝ A9:, ; Re=|?, ; Re=)A9?, ; Re=|4+)
! A :, ; Re= ?, ; Re= and A9?, ; CD=|4+) are estimated from ! log data! TREC ground truth. 19
st is likely to be
Relevant
Non-Relevant
If ∃d ∈ D∃d ∈ D∃d ∈ D∃d ∈ Dtttt----1111 and and and and d is SAT Clickedd is SAT Clickedd is SAT Clickedd is SAT Clicked
otherwise
# TU TV:DWXDY WD=DX5Z[D# TU TV:DWX5\]TZ:
# TU ob:DWXDY \W_D WD=DX5Z[D# TUTV:DWXDY WD=DX5Z[D
! Intuition """" Exploration or Exploitation!!!!
! Observation Function89:, ; `ab=TW5\]TZ, 5+ ; cde,, 4-. ; f,g7, ?, ; `ab=TW5\]TZ)∝ A9:, ; `ab=TW5\]TZ|?, ; `ab=TW5\]TZ)A9?, ; `ab=TW5\]TZ| c de,, f,g7)
! A9:, ; `ab=TW5\]TZ|?, ; `ab=TW5\]TZ) 5ZY A9?, ; `ab=TW5\]TZ| c de,, f,g7)are estimated! log data! human judgment.
20
st is likely to be
Exploration
Exploitation
if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∉D∉D∉D∉Dtttt----1111) ) ) ) oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅ ∅ ∅ ∅ and and and and ----ΔΔΔΔqqqqtttt≠∅ ≠∅ ≠∅ ≠∅ ))))
if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∈∈∈∈DDDDtttt----1111) ) ) ) oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅ ∅ ∅ ∅ and and and and ––––ΔΔΔΔqqqqtttt;∅ );∅ );∅ );∅ )
OBSERVATION FUNCTION (O)
# TU TV:DWXDY Dab=TW5\]TZ Y_D \T 5YY \DWl:# TU TV:DWX5\]TZ: Y_D \T 5YY \DWl:
# TU TV:DWXDY \W_D Dab=TW5\]TZ# TU TV:DWXDY Dab=TW5\]TZ
! At every search iteration the belief state b is updatedwhen a new observation is obtained.
21
V,679:m) ; A9:m|?,, 5,, V,n
;A9?,|:m, 5,, V,) o A9:m|:p, 5,, V,)V,9:pn
-q∈rA9?,|5,, V,)
;89:m, 5,, ?,) o A9:m|:p, 5,, V,)V,9:pn
-q∈rA9?,|5,, V,)
BELIEF UPDATES (B)
22
! q1=“best US destinations” observation= NRRSRT
Relevant & Exploitation
0.1784
SRRRelevant & Exploration
0.1135
SNRTNon-Relevant & Exploitation
0.2838
SNRRNon-Relevant & Exploration
0.4243
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
BELIEF UPDATES (B)
q0
23
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
SRTRelevant &
Exploitation0.0005
SRRRelevant & Exploration
0.0068
SNRTNon-Relevant & Exploitation
0.0715
SNRRNon-Relevant & Exploration
0.9212
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
BELIEF UPDATES (B)
q0
24
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
SRTRelevant &
Exploitation0.0005
SRRRelevant & Exploration
0.0068
SNRTNon-Relevant & Exploitation
0.0715
SNRRNon-Relevant & Exploration
0.9212
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
BELIEF UPDATES (B)
q0
25
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0151
SRRRelevant & Exploration
0.4347
SNRTNon-Relevant & Exploitation
0.0276
SNRRNon-Relevant & Exploration
0.5226
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
BELIEF UPDATES (B)
q0
26
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0151
SRRRelevant & Exploration
0.4347
SNRTNon-Relevant & Exploitation
0.0276
SNRRNon-Relevant & Exploration
0.5226
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
BELIEF UPDATES (B)
q0
27
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0291
SRRRelevant & Exploration
0.7837
SNRTNon-Relevant & Exploitation
0.0081
SNRRNon-Relevant & Exploration
0.1790
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
! q20=“Philadelphia NYC train” observation = NRT
……
BELIEF UPDATES (B)
q0
28
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0291
SRRRelevant & Exploration
0.7837
SNRTNon-Relevant & Exploitation
0.0081
SNRRNon-Relevant & Exploration
0.1790
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
! q20=“Philadelphia NYC train” observation = NRT
……
BELIEF UPDATES (B)
q0
29
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0304
SRRRelevant & Exploration
0.8126
SNRTNon-Relevant & Exploitation
0.0066
SNRRNon-Relevant & Exploration
0.1505
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
……
! q20=“Philadelphia NYC train” observation = NRT
! q21=“Philadelphia NYC bus” observation = NRT
BELIEF UPDATES (B)
q0
30
! q1=“best US destinations” observation= NRR
! q2=“distance New York Boston” observation = RT
! q3=“maps.bing.com” observation = NRT
SRTRelevant &
Exploitation0.0304
SRRRelevant & Exploration
0.8126
SNRTNon-Relevant & Exploitation
0.0066
SNRRNon-Relevant & Exploration
0.1505
TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
……
! q20=“Philadelphia NYC train” observation = NRT
! q21=“Philadelphia NYC bus” observation = NRT
BELIEF UPDATES (B)
q0
! The long term reward function for the search engine agent
! The long tern reward function for the user agent
! Joint optimization
31
s-.9V, 5) ; oV9:)C9:, 5)-∈r
c t o A9?|V, 5+, 4-.)A9?|V, 4+)l5au s-.9Vv, 5wx∈y
s+9V, 5+) ; C9:, 5+) c t z 9:,|:,g7, f,g7)u|max-~s+9:,g7, 5+)
= P(qt|d) +t z P9e,|e,g7, f,g7, 5)u max~A 9e,g7|f,g7)
5-. ; argmaxu
9s-.9V, 5) c s+9V, 5+))
JOINT OPTIMIZATION — WIN-WIN
EXPERIMENTS! Evaluate on TREC 2012 and 2013 Session Tracks
! The session logs contain! session topic! user queries! previously retrieved URLs, snippets! user clicks, and dwell time etc.
! Task: retrieve 2,000 documents for the last query in each session
! The evaluation is based on the whole session. ! A document related to any query in the session is a good document
32
! Datasets ! ClueWeb09 CatB ! ClueWeb12 CatB ! spam documents are
removed! duplicated documents
are removed
ACTIONS
! increasing weights of the added terms by a factor of x=1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2;
! decreasing weights of the added terms by a factor of y=0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95;
! QCM proposed in Guan et. al SIGIR’13;! Pseudo Relevance Feedback which assumes the top
20 retrieved documents are relevant;! directly uses the query in current iteration to perform
retrieval;! combines all queries in a session weights them
equally.33
SEARCH ACCURACY! Search accuracy on TREC 2012 Session Track
34
TREC 2012 Session Track
# Win-win outperforms most retrieval algorithms on TREC 2012.
35
# Systems in TREC 2012 perform better than in TREC 2013. # many relevant documents are not included in ClueWeb12 CatB collection
# Win-win outperforms all retrieval algorithms on TREC 2013.# It is highly effective in Session Search.
SEARCH ACCURACY! Search accuracy on TREC 2013 Session Track
TREC 2013 Session Track
IMMEDIATE SEARCH ACCURACY
36
# Original run: top returned documents provided by TREC log data# win-win’s immediate search accuracy is better than the Original at
every iteration# win-win's immediate search accuracy increases while the number of
search iterations increases
TREC 2012 Session Track TREC 2013 Session Track
Conclusions
37
! A novel session search framework! Model the interactions between user and search
engine as a dual-agent stochastic game! Able to perform efficient optimization
! a finite discrete set of states and actions! Jointly search for the goal in a trial-and-error
manner