25
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009

Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Embed Size (px)

Citation preview

Page 1: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Diversifying Search Results

Rakesh Agrawal, Sreenivas Gollapudi,Alan Halverson, Samuel Ieong

Search Labs, Microsoft Research

WSDM, February 10, 2009

Page 2: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Ambiguity and Diversification

• Many queries are ambiguous– “Barcelona” (City? Football team? Movie?)– “Michael Jordan”

Michael I. Jordan Michael J. Jordan

Page 3: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Ambiguity and Diversification

• Many queries are ambiguous– “Barcelona” (City? Football team? Movie?)– “Michael Jordan” (which one?)

How best to answer ambiguous queries?

• Use context, make suggestions, …

• Under the premise of returning a single (ordered) set of results, how best to diversify the search results so that a user will find something useful?

Page 4: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Intuition behind Our Approach

• Analyze click logs for classifying queries and docs

• Maximize the probability that the average user will find a relevant document in the retrieved results

• Use the analogy of marginal utility to determine whether to include more results from an already covered category

Page 5: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Outline

• Problem formulation

• Theoretical analysis

• Metrics to measure diversity

• Experiments

Page 6: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Assumptions

• A taxonomy (categorization of intents) C– For each query q, P(c | q) denote distribution of intents– c ∊ C P(c | q) = 1

• Quality assessment of documents at intent level– For each doc d, V(d | q, c) denote probability of the doc

satisfying the intent– Conditional independence

• Users are interested in finding at least one satisfying document

1¡ V(d1;d2jq;c) = (1¡ V(d1jq;c))(1¡ V(d2jq;c))

Page 7: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Problem Statement

DIVERSIFY(K)

• Given a query q, a set of documents D, distribution P(c | q), quality estimates V(d | c, q), and integer k

• Find a set of docs S D with |S| = k that maximizes

interpreted as the probability that the set S is relevant to the query over all possible intentions

c Sd

cqdVqcPqSP )),|(1(1)(|()|(

Find at least one relevant docMultiple intents

Page 8: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Discussion of Objective

• Makes explicit use of taxonomy– In contrast, similarity-based: [CG98], [CK06], [RKJ08]

• Captures both diversification and doc relevance– In contrast, coverage-based: [Z+05], [C+08], [V+08]

• Specific form of “loss minimization” [Z02], [ZL06]

• “Diminishing returns” for docs w/ the same intent

• Objective is order-independent– Assumes that all users read k results– May want to optimize k P(k) P(S | q)

Page 9: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Outline

• Problem formulation

• Theoretical analysis

• Metrics to measure diversity

• Experiments

Page 10: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Properties of the Objective

• DIVERSIFY(K) is NP-Hard– Reduction from Max-Cover

• No single ordering that will optimize for all k

• Can we make use of “diminishing returns”?

Page 11: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

• Intent distribution: P(R | q) = 0.8, P(B | q) = 0.2.

0.4

A Greedy Algorithm

0.9

0.5

0.4

0.4

D V(d | q, c)

0.08

0.72

0.40

0.32

0.08

g(d | q, c)

U(R | q) = U(B | q) =0.8 0.2

×0.8×0.8×0.8×0.2×0.2

×0.08×0.08×0.2×0.2

0.08

0.08

0.04

0.03

0.08

0.12

×0.08×0.08

×0.12 0.050.4

0.9

0.4

0.07

S• Actually produces an

ordered set of results

• Results not proportional to intent distribution

• Results not according to (raw) quality

• Better results ⇒ less needed to be shown

Page 12: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Formal Claims

Lemma 1 P(S | q) is submodular.– Same intuition as diminishing returns– For sets of documents where S T, and a document d,

Theorem 1 Solution is an (1 – 1/e) approx from opt.– Consequence of Lemma 1 and [NWF78]

Theorem 2 Solution is optimal when each document can only satisfy one category.– Relative quality of docs does not change

P (S [ fdgjq) ¡ P (Sjq) ¸ P (T [ fdg) ¡ P (Tjq)

Page 13: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Outline

• Problem formulation

• Theoretical analysis

• Metrics to measure diversity

• Experiments

Page 14: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

How to Measure Success?

• Many metrics for relevance– Normalized discounted cumulative gains at k (NDCG@k)– Mean average precision at k (MAP@k)– Mean reciprocal rank (MRR)

• Some metrics for diversity– Maximal marginal relevance (MMR) [CG98]– Nugget-based instantiation of NDCG [C+08]

• Want a metric that can take into account both relevance and diversity

[JK00]

Page 15: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Generalizing Relevance Metrics

• Take expectation over distribution of intents– Interpretation: how will the average user feel?

• Consider NDCG@k– Classic:

– NDCG-IA depends on intent distribution and intent-specific NDCG

DCG(S;k) =kX

j =1

f (relevance(Sj ))=discount(j )

NDCG(S;k) = DCG(S;k)=maxR

DCG(R;k)

NDCG-IA(S;k) =P

c P (cjq)NDCG(S;kjc)

Page 16: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Outline

• Problem formulation

• Theoretical analysis

• Metrics to measure diversity

• Experiments

Page 17: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Setup

• 10,000 queries randomlysampled from logs– Queries classified acc.

to ODP (level 2) [F+08]– Keep only queries with

at least two intents (~900)

• Top 50 results from Live, Google, and Yahoo!

• Documents are rated on a 5-pt scale– >90% docs have ratings– Docs w/o ratings are assigned random grade according

to the distribution of rated documents

2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

Category Count

Que

ry C

ount

Page 18: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Experiment Detail

• Documents are classified using a Rocchio classifier– Assumes that each doc belongs to only one category

• Quality scores of documents are estimated based on textual and link features of the webpage– Our approach is agnostic of how quality is determined– Can be interpreted as a re-ordering of search results

that takes into account ambiguities in queries

• Evaluation using generalized NDCG, MAP, and MRR– f(relevance(d)) = 2^rel(d); discount(j) = 1 + lg2 (j)

– Take P(c | q) as ground truth

Page 19: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

NDCG-IA

NDCG-IA@3 NDCG-IA@5 [email protected]

0.05

0.10

0.15

0.20

0.25

0.30

Diverse Engine 1 Engine 2 Engine 3

NDC

G-IA

val

ue

Page 20: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

MAP-IA and MRR-IA

MAP-IA@3 MAP-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60

0.70Diverse Engine 1 Engine 2 Engine 3

MAP

-IA v

alue

MRR-IA@3 MRR-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60

0.70Diverse Engine 1 Engine 2 Engine 3

MRR

-IA v

alue

Page 21: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Evaluation using Mechanical Turk

• Created two types of HITs on Mechanical Turk– Query classification: workers are asked to choose

among three interpretations– Document rating (under the given interpretation)

• Two additional evaluations– MT classification + current ratings– MT classification + MT document ratings

Page 22: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Evaluation using Mechanical Turk

MAP-IA@3 MAP-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60Diverse Engine 1 Engine 2 Engine 3

MAP

-IA v

alue

NDCG-IA@3 NDCG-IA@5 [email protected]

0.05

0.10

0.15

0.20

0.25Diverse Engine 1 Engine 2 Engine 3

NDC

G-IA

val

ue

MRR-IA@3 MRR-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60 Diverse Engine 1 Engine 2 Engine 3

Page 23: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Concluding Remarks

• Theoretical approach to diversification supported by empirical evaluation

• What to show is a function of both intent distribution and quality of documents– Less is needed when quality is high

• There are additional flexibilities in our approach– Not tied to any taxonomy– Can make use of context as well

Page 24: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Future Work

• When is it right to diversify?– Users have certain expectations about the workings of a

search engine

• What is the best way to diversify?– Evaluate approaches beyond diversifying the

retrieved results

• Metrics that capture both relevance and diversity– Some preliminary work suggests that there will be

certain trade-offs to make

Page 25: Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint

Thanks

{rakesha, sreenig, alanhal, sieong}@microsoft.com