32
Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A Decision-Theoretic Framework for Optimal Interactive Retrieval through Dynamic User Modeling Including joint work with Xuehua Shen, Bin Tan

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Putting Query Representation and Understanding in Context:

ChengXiang Zhai

Department of Computer Science

University of Illinois at Urbana-Champaign

A Decision-Theoretic Framework for Optimal Interactive Retrieval through Dynamic User Modeling

Including joint work with Xuehua Shen, Bin Tan

Page 3: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Query must be put in a context

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

3

Jaguar Search

Mac OS? Car ? Animal ?

What queries did the user type in before this query? What documents were just viewed by this user? What documents were skipped by this user? What other users looked for similar information? ……

Page 4: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Context helps query understanding

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

4

Car

Car

Car

Car

Software

Animal

Suppose we know:

1. Previous query = “racing cars” vs. “Apple OS”

2. “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days

3. User just viewed an “Apple OS” document

Page 5: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Questions• How can we model a query in a context-

sensitive way? Generalize query representation to user model

• How can we model the dynamics of user information needs? Dynamic updating of user models

• How can we put query representation into a retrieval framework to improve search? A framework for optimal interactive retrieval

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

5

Page 6: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Rest of the talk: UCAIR Project

1. A decision-theoretic framework

2. Statistical language models for implicit feedback (personalized search without extra user effort)

3. Open challenges

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

6

Page 7: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

7

UCAIR Project• UCAIR = User-Centered Adaptive IR

– user modeling (“user-centered”)– search context modeling (“adaptive”)– interactive retrieval

• Implemented as a personalized search agent that– sits on the client-side (owned by the user)– integrates information around a user (1 user vs.

N sources as opposed to 1 source vs. N users)– collaborates with each other– goes beyond search toward task support

Page 8: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

8

Main Idea: Putting the User in the Center!

Search Engine

“java”

Personalized search agent

WEB

Search Engine

Email

Search Engine

DesktopFiles

Personalized search agent

“java”

...Viewed Web pages

QueryHistory

A search agent can know abouta particular user very well

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 9: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

1. A Decision-Theoretic Framework for Optimal Interactive Retrieval

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 10: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

IR as Sequential Decision Making

User System

A1 : Enter a query Which documents to present?How to present them?

Ri: results (i=1, 2, 3, …)Which documents to view?

A2 : View documentWhich part of the document to show? How?

R’: Document contentView more?

A3 : Click on “Back” button

(Information Need) (Model of Information Need)

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 11: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Retrieval Decisions

User U: A1 A2 … … At-1 At

System: R1 R2 … … Rt-1

Given U, C, At , and H, choosethe best Rt from all possibleresponses to At

History H={(Ai,Ri)} i=1, …, t-1

DocumentCollection

C

Query=“Jaguar”

All possible rankings of C

The best ranking for the query

Click on “Next” button

All possible rankings of unseen docs

The best ranking of unseen docs

Rt r(At)

Rt =?

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 12: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

12

A Risk Minimization Framework

User: U Interaction history: HCurrent user action: At

Document collection: C

Observed

All possible responses: r(At)={r1, …, rn}

User Model

M=(S, U…) Seen docs

Information need

L(ri,At,M) Loss Function

Optimal response: r* (minimum loss)

( )arg min ( , , ) ( | , , , )tt r r A t tM

R L r A M P M U H A C dM ObservedInferredBayes risk

SIGIR 2010 Workshop on Query Representation and

Understanding, Geneva, Switzerland, July 23, 2010

Page 13: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

13

• Approximate the Bayes risk by the loss at the mode of the posterior distribution

• Two-step procedure– Step 1: Compute an updated user model M*

based on the currently available information– Step 2: Given M*, choose a response to minimize

the loss function

A Simplified Two-Step Decision-Making Procedure

( )

( )

( )

arg min ( , , ) ( | , , , )

arg min ( , , *) ( * | , , , )

arg min ( , , *)

* arg max ( | , , , )

t

t

t

t r r A t tM

r r A t t

r r A t

M t

R L r A M P M U H A C dM

L r A M P M U H A C

L r A M

where M P M U H A C

Page 14: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

14

Optimal Interactive Retrieval

User

A1

U C

M*1 P(M1|U,H,A1,C)

L(r,A1,M*1)

R1A2

L(r,A2,M*2)

R2

M*2 P(M2|U,H,A2,C)

A3 …

Collection

IR system

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 15: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

15

Refinement of Risk Minimization• r(At): decision space (At dependent)

– r(At) = all possible subsets of C (document selection)– r(At) = all possible rankings of docs in C – r(At) = all possible rankings of unseen docs– r(At) = all possible subsets of C + summarization strategies

• M: user model – Essential component: U = user information need– S = seen documents– n = “Topic is new to the user”

• L(Rt ,At,M): loss function– Generally measures the utility of Rt for a user modeled as M– Often encodes retrieval criteria (e.g., using M to select a ranking of

docs)• P(M|U, H, At, C): user model inference

– Often involves estimating a unigram language model U

Page 16: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

16

Case 1: Context-Insensitive IR– At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– p(M|U,H,At,C)=p(U |Q)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 17: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

17

Case 2: Implicit Feedback – At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– H={previous queries} + {viewed snippets}– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 18: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

18

Case 3: General Implicit Feedback

– At=“enter a query Q” or “Back” button, “Next” button

– r(At) = all possible rankings of unseen docs in C

– M= (U, S), S= seen documents

– H={previous queries} + {viewed snippets}– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Page 19: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

19

Case 4: User-Specific Result Summary

– At=“enter a query Q”

– r(At) = {(D,)}, DC, |D|=k, {“snippet”,”overview”}

– M= (U, n), n{0,1} “topic is new to the user”

– p(M|U,H,At,C)=p(U,n|Q,H), M*=(*, n*)

( , , ) ( , , *, *)

( , *) ( , *)

( * || ) ( , *)i

i t i i

i i

d id D

L r A M L D n

L D L n

D L n

n*=1 n*=0

i=snippet 1 0i=overview 0 1

( , *)iL n

Choose k most relevant docs If a new topic (n*=1), give an overview summary;otherwise, a regular snippet summary

Page 20: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

2. Statistical Language Models for implicit feedback

(Personalized search without extra user effort)

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 21: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

22

Risk Minimization for Implicit Feedback – At=“enter a query Q”

– r(At) = all possible rankings of docs in C

– M= U, unigram language model (word distribution)

– H={previous queries} + {viewed snippets}– p(M|U,H,At,C)=p(U |Q,H)

1

1

1 2

( , , ) (( ,..., ), )

( | ) ( || )

( | ) ( | ) ....

( || )

i

i

i t N U

N

i U di

t U d

L r A M L d d

p viewed d D

Since p viewed d p viewed d

the optimal ranking R is given by ranking documents by D

Need to estimate a context-sensitive LM

Page 22: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

23

Estimate a Context-Sensitive LM

Q2

C2={C2,1 , C2,2 ,C2,3 ,… }

C1={C1,1 , C1,2 ,C1,3 ,…} User Clickthrough

Qk

Q1 User Query e.g., Apple software

e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, …

e.g., Jaguar

1 1 1 1,...,( | ,) ( | ,...,, ) ?k kk kp w p Q CQ Q Cw User Model:

Query History Clickthrough

Page 23: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Short-term vs. long-term implicit feedback• Short term implicit feedback

– context = current retrieval session – past queries in the context are closely related

to the current query– clickthroughs user’s current interests

• Long term implicit feedback– context = all search interaction history – not all past queries/clickthroughs are related to

the current query

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

24

Page 24: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

25

“Bayesian interpolation” for short-term implicit feedback

Q1

Qk-1

C1

Ck-1

Average user query andclickthrough history

CH

QH1

11

1

( | ) ( | )i k

Q iki

p w H p w Q

11

11

( | ) ( | )i k

C iki

p w H p w C

Intuition: trust the current query Qk more if it’s longer

Qk

Dirichlet Prior

( , ) ( | ) ( | )

| |( | ) k Q C

k

c w Q p w H p w H

k Qp w

k

Page 25: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

26

Overall Effect of Search Context

Query

FixInt

(=0.1,=1.0)

BayesInt

(=0.2,=5.0)

OnlineUp

(=5.0,=15.0)

BatchUp

(=2.0,=15.0)

MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20

Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483

Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067

Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%

Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933

Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250

Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%

• Short-term context helps system improve retrieval accuracy

• BayesInt better than FixInt; BatchUp better than OnlineUp

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

Page 26: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

27

Using Clickthrough Data Only

Query MAP pr@20

Q3 0.0421 0.1483

Q3+HC 0.0766 0.2033

Improve 81.9% 37.1%

Q4 0.0536 0.1930

Q4+HC 0.0925 0.2283

Improve 72.6% 18.1%

BayesInt (=0.0,=5.0)

Clickthrough is the major contributor

13.9% 67.2%Improve

0.1880.0739Q4+HC

0.1650.0442Q4

42.4%99.7%Improve

0.1780.0661Q3+HC

0.1250.0331Q3

pr@20MAPQuery

Performance on unseen docs

-4.1%15.7%Improve

0.18500.0620Q4+HC

0.19300.0536Q4

23.0%23.8%Improve

0.18200.0521Q3+HC

0.14830.0421Q3

pr@20MAPQuery

Snippets for non-relevant docs are still useful!

Page 27: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

28

Mixture model with dynamic weighting for long-term implicit feedback

q1D1C1

S1

θS1

q2D2C2

S2

θS2

... qt-1Dt-1Ct-1

St-1

θSt-1

θH

qtDt

St

θq

θq,H

λ1?λ2?

λq?1-λq

λt-1?

select {λ} to maximize P(Dt | θq, H)EM algorithm

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 28: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

29

Results: Different Individual Search Models

recurring fresh≫

combination ≈ clickthrough > docs > query, contextless

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 29: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

30

Results: Different Weighting Schemes for Overall History Model

hybrid ≈ EM > cosine > equal > contextless

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Page 30: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

31

3. Open Challenges

• What is a query?• How to collect as much context information as possible

without infringing user privacy? • How to store and organize the collected context

information? • How to accurately interpret/exploit context information? • How to formally represent the evolving information need

of a user?• How to optimize search results for an entire session? • What’s the right architecture (client-side, server-side,

and client-server combo)?

Page 31: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

References

• Framework– Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for

Personalized Search , In Proceedings of CIKM 2005, pp. 824-831.– ChengXiang Zhai and John Lafferty, A risk minimization framework for

information retrieval , Information Processing and Management, 42(1), Jan. 2006. pages 31-55.

• Short-term implicit feedback– Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information

Retrieval with Implicit Feedback, Proceedings of SIGIR 2005, pp. 43-50.

• Long-term implicit feedback – Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history

to improve search accuracy , Proceedings of KDD 2006, pp. 718-723.

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

32

Page 32: Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A

Thank You!

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010