Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)

WIN-WIN SEARCH: DUAL-AGENTSTOCHASTIC GAME IN SESSIONSEARCH

Jiyun Luo Sicong Zhang Grace Hui Yang

Department of Computer ScienceGeorgetown University

jl1749, [email protected] [email protected]

2

AGE OF EMPIRE

2

A NEW PERSPECTIVE TO LOOK AT SEARCH

3

Documents to explore Information

need

Observed documents

User

Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.

3

WHY USERS MAKE CERTAIN MOVES?

4

Markov Chain of Decision Making States

RELATED WORK! queries suitable for personalization [Teevan et al. SIGIR’08]! task types [Kanoulas et al. TREC’12]! roles of task stage and task type [Liu et al. SIGIR’10]! session query changes [Guan et al. SIGIR’13]! user intensions and attention [Carterette et al. CIKM’11]! user click model [Craswell et al. SIGIR’07]! page re-ranking [Jin et al. WWW’13]! Search topics [Jones et al. CIKM’08]! Ads selection using pomdp[Yuan et al. CIKM’12]

!Our work is a retrieval model! not a user study

5

OUR SOLUTION

6

Try to find an optimal solution through a sequence of dynamic

interactions

Trial and Error: learn from repeated, varied attempts

which are continued until success

6

TRIAL AND ERROR

7

! q1 – "dulles hotels"! q2 – "dulles airport"

! q3 – "dulles airport location"

! q4 – "dulles metrostop"7

8

! Rich interactionsQuery formulation, Document clicks, Document

examination, eye movement, mouse movements, etc.! Temporal dependency

! Overall goal

RECAP – CHARACTERISTICS OFDYNAMIC IR

8

9

! Model interactions, which means it needs to have place holders for actions;

! Model information need hidden behind user queries and other interactions;

! Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;

! Represent Markov properties to handle the temporal dependency.

WHAT IS A DESIRABLE MODEL FORDYNAMIC IR

A model in Trial and Error setting will do!

A Markov Model will do!

9

10

! Two agents work together to fulfill the information need

!Dual-agent stochastic game! Partially Observable Markov Decision Process ! Joint Optimization

!To achieve Win-win

WIN-WIN SEARCH

WIN-WIN SEARCH

11

! A tuple (S, T, A, R, γ, O, Θ, B)! S : state space! T: transition matrix! A: action space(Au, Ase, Σu, Σse)! R: reward function(Ru, Rse)! γ: discount factor, 0< γ ≤1! O: observation set(Ωu, Ωse)

an observation is a symbol emitted according to a hidden state.! Θ: observation function

Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).! B: belief space

Belief is a probability distribution over hidden states.

12

Name Symbol Meanings

state S the four hidden decision states

user action Au add/remove/keep query terms

search engine action

Ase increase/decrease/keep term weights, adjust search techniques, etc.

message from user to search engine

Σu clicked and SAT clicked documents

message from search engine to user

Σse top k returned documents

user's observation Ωu observations that the user makes from the world

search engine's observation

Ωse observations that the search engine makes from the world and from the user

user reward Ru relevant information the user gains from reading the documents

search engine reward

Rse nDCG that the search gains by returning documents

belief state B belief states generated from the belief updater and shared by both agents

STATES (S)

13

SRTRelevant &

Exploitation

SRRRelevant & Exploration

SNRTNon-Relevant & Exploitation

SNRRNon-Relevant & Exploration

! scooter price ⟶ scooter stores

! collecting old US coins⟶ selling old US coins

! Philadelphia NYC travel ⟶ Philadelphia NYC train

! Boston tourism ⟶ NYC tourism

q0

ACTIONS (AU, ASE, ΣU, ΣSE)! User Action (Au)

! add query terms (+Δq)! remove query terms (-Δq)! keep query terms (qtheme)

! Search Engine Action(Ase)! increase term weights! decrease term weights! keep term weights! adjust search techniques, etc.

! Message from the user(Σu) ! clicked documents ! SAT clicked documents

! Message from search engine(Σse) ! top k returned documents 14

1. At iteration t, the user agent takes action *+,

(query change).

15

2. The search engine picks the best action *-., to search

DUAL-AGENT STOCHASTIC GAME

3. Search engine returns document set Dt

as message 4-., .

16

4. The user agent examines Dt

and sends clicks as feedback messages 4+, .

34


Messages are essentially documents that an agent thinks they are relevant.


5. The user agent again makes action 5+,67

(query changes).

6. The world moves into iteration t + 1.

7. The loop continues

17

4 3

OBSERVATION FUNCTION (O)

18

Probability of making observation ω after taking action a and landing in state s

e.g., Prob. of making observation ω after taking action a and landing in state

SRT=O(SREL, a, ω)O(SEXPLOITATION, a, ω)

OBSERVATION FUNCTION (O)! Intuition """" Relevant or Non-relevant?

! Observation function

89:, ; Re=, 4+, ?, ; Re=) ∝ A9:, ; Re=|?, ; Re=)A9?, ; Re=|4+)

! A :, ; Re= ?, ; Re= and A9?, ; CD=|4+) are estimated from ! log data! TREC ground truth. 19

st is likely to be

Relevant

Non-Relevant

If ∃d ∈ D∃d ∈ D∃d ∈ D∃d ∈ Dtttt----1111 and and and and d is SAT Clickedd is SAT Clickedd is SAT Clickedd is SAT Clicked

otherwise

# TU TV:DWXDY WD=DX5Z[D# TU TV:DWX5\]TZ:

# TU ob:DWXDY \W_D WD=DX5Z[D# TUTV:DWXDY WD=DX5Z[D

! Intuition """" Exploration or Exploitation!!!!

! Observation Function89:, ; àb=TW5\]TZ, 5+ ; cde,, 4-. ; f,g7, ?, ; àb=TW5\]TZ)∝ A9:, ; àb=TW5\]TZ|?, ; àb=TW5\]TZ)A9?, ; àb=TW5\]TZ| c de,, f,g7)

! A9:, ; àb=TW5\]TZ|?, ; àb=TW5\]TZ) 5ZY A9?, ; àb=TW5\]TZ| c de,, f,g7)are estimated! log data! human judgment.

20

st is likely to be

Exploration

Exploitation

if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∉D∉D∉D∉Dtttt----1111) ) ) ) oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅ ∅ ∅ ∅ and and and and ----ΔΔΔΔqqqqtttt≠∅ ≠∅ ≠∅ ≠∅ ))))

if 9c9c9c9cΔΔΔΔqqqqtttt≠∅ and c≠∅ and c≠∅ and c≠∅ and cΔΔΔΔqqqqtttt∈∈∈∈DDDDtttt----1111) ) ) ) oooor 9r 9r 9r 9ccccΔΔΔΔqqqqtttt;;;;∅ ∅ ∅ ∅ and and and and ––––ΔΔΔΔqqqqtttt;∅ );∅ );∅ );∅ )

OBSERVATION FUNCTION (O)

# TU TV:DWXDY Dab=TW5\]TZ Y_D \T 5YY \DWl:# TU TV:DWX5\]TZ: Y_D \T 5YY \DWl:

# TU TV:DWXDY \W_D Dab=TW5\]TZ# TU TV:DWXDY Dab=TW5\]TZ

! At every search iteration the belief state b is updatedwhen a new observation is obtained.

21

V,679:m) ; A9:m|?,, 5,, V,n

;A9?,|:m, 5,, V,) o A9:m|:p, 5,, V,)V,9:pn

-q∈rA9?,|5,, V,)

;89:m, 5,, ?,) o A9:m|:p, 5,, V,)V,9:pn

-q∈rA9?,|5,, V,)

BELIEF UPDATES (B)

22

! q1=“best US destinations” observation= NRRSRT

Relevant & Exploitation

0.1784


0.1135


0.2838


0.4243

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?

BELIEF UPDATES (B)

q0

23

! q1=“best US destinations” observation= NRR

! q2=“distance New York Boston” observation = RT

SRTRelevant &

Exploitation0.0005


0.0068


0.0715


0.9212


BELIEF UPDATES (B)

q0

24



SRTRelevant &

Exploitation0.0005


0.0068


0.0715


0.9212


BELIEF UPDATES (B)

q0

25



! q3=“maps.bing.com” observation = NRT

SRTRelevant &

Exploitation0.0151


0.4347


0.0276


0.5226


BELIEF UPDATES (B)

q0

26




SRTRelevant &

Exploitation0.0151


0.4347


0.0276


0.5226


BELIEF UPDATES (B)

q0

27




SRTRelevant &

Exploitation0.0291


0.7837


0.0081


0.1790


! q20=“Philadelphia NYC train” observation = NRT

……

BELIEF UPDATES (B)

q0

28




SRTRelevant &

Exploitation0.0291


0.7837


0.0081


0.1790



……

BELIEF UPDATES (B)

q0

29




SRTRelevant &

Exploitation0.0304


0.8126


0.0066


0.1505


……


! q21=“Philadelphia NYC bus” observation = NRT

BELIEF UPDATES (B)

q0

30




SRTRelevant &

Exploitation0.0304


0.8126


0.0066


0.1505


……


! q21=“Philadelphia NYC bus” observation = NRT

BELIEF UPDATES (B)

q0

! The long term reward function for the search engine agent

! The long tern reward function for the user agent

! Joint optimization

31

s-.9V, 5) ; oV9:)C9:, 5)-∈r

c t o A9?|V, 5+, 4-.)A9?|V, 4+)l5au s-.9Vv, 5wx∈y

s+9V, 5+) ; C9:, 5+) c t z 9:,|:,g7, f,g7)u|max-~s+9:,g7, 5+)

= P(qt|d) +t z P9e,|e,g7, f,g7, 5)u max~A 9e,g7|f,g7)

5-. ; argmaxu

9s-.9V, 5) c s+9V, 5+))

JOINT OPTIMIZATION — WIN-WIN

EXPERIMENTS! Evaluate on TREC 2012 and 2013 Session Tracks

! The session logs contain! session topic! user queries! previously retrieved URLs, snippets! user clicks, and dwell time etc.

! Task: retrieve 2,000 documents for the last query in each session

! The evaluation is based on the whole session. ! A document related to any query in the session is a good document

32

! Datasets ! ClueWeb09 CatB ! ClueWeb12 CatB ! spam documents are

removed! duplicated documents

are removed

ACTIONS

! increasing weights of the added terms by a factor of x=1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2;

! decreasing weights of the added terms by a factor of y=0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95;

! QCM proposed in Guan et. al SIGIR’13;! Pseudo Relevance Feedback which assumes the top

20 retrieved documents are relevant;! directly uses the query in current iteration to perform

retrieval;! combines all queries in a session weights them

equally.33

SEARCH ACCURACY! Search accuracy on TREC 2012 Session Track

34

TREC 2012 Session Track

# Win-win outperforms most retrieval algorithms on TREC 2012.

35

# Systems in TREC 2012 perform better than in TREC 2013. # many relevant documents are not included in ClueWeb12 CatB collection

# Win-win outperforms all retrieval algorithms on TREC 2013.# It is highly effective in Session Search.

SEARCH ACCURACY! Search accuracy on TREC 2013 Session Track

TREC 2013 Session Track

IMMEDIATE SEARCH ACCURACY

36

# Original run: top returned documents provided by TREC log data# win-win’s immediate search accuracy is better than the Original at

every iteration# win-win's immediate search accuracy increases while the number of

search iterations increases

TREC 2012 Session Track TREC 2013 Session Track

Conclusions

37

! A novel session search framework! Model the interactions between user and search

engine as a dual-agent stochastic game! Able to perform efficient optimization

! a finite discrete set of states and actions! Jointly search for the goal in a trial-and-error

manner

THANK YOU

[email protected]

38