Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Diversifying Search Result

WSDM 2009

Intelligent Database Systems Lab.

School of Computer Science & Engineering

Seoul National University

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Presented by Sung Eun, Park1/25/2011

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel IeongMicrosoft Research

Copyright 2010 by CEBT

Contents

Introduction

Intuition

Preliminaries

Model

Problem Formulation

Complexity

Greedy algorithm

Evaluation

Measure

Empirical analysis

2


Introduction

Ambiguity and diversification

For the ambiguous queries, diversification may help users to find at least one relevant document

Ex) the other day, we were trying to find the meaning of the word “ 왕건” .

– In the context of “ 우와 저거 진짜 왕건이다”

– But search result was all about the king of Goguryu

3

King 왕건

왕건 as a Big thing


Preliminaries

4


Problem Formulation

d fails to satisfy user that issues query q with the intended category c

Multiple intents

The probability that some document will satisfy category c


Complexity


A Greedy Algorithm

R(q) be the top k documents selected by some classical ranking algorithm for the target query The algorithm reorder the R(q) to maximize the objective

P(S|q) Input: k, q, C, D, P(c | q), V (d | q, c), Output : set of

documents S

0.4

0.9

0.5

0.4

0.4

D V(d | q, c)

0.08

0.72

0.40

0.32

0.08

g(d | q, c)U(R | q) = U(B | q) =0.8 0.2

×0.8×0.8×0.8×0.2×0.2

×0.08×0.08×0.2×0.2

0.08

0.08

0.04

0.03

0.08

0.12

×0.08×0.08

×0.12 0.050.4

0.9

0.4

0.07S

• Produces an ordered set of results

• Results not proportional to intent distribution

• Results not according to (raw) quality


Greedy Algorithm (IA-SELECT)

Input: k, q, C, D, P(c | q), V (d | q, c)

Output : set of documents S

When documents may belong to multiple categories, IA-SELECT is no longer guaranteed to be optimal.(Notice this problem is NP-hard)

S = ∅∀c ∈ C, U(c | q) ← P(c | q)while |S| < k do for d ∈ D do g(d | q, c) ← c U(c | q)V (d | q, c) end for d∗ ← argmax g(d | q, c) S ← S {∪ d∗} ∀c ∈ C, U(c | q) ← (1 − V (d ∗ | q, c))U(c | q) D ← D \ {d∗}end while

Marginal Utility

U(c | q): conditional prob of intent c given query qg(d | q, c): current prob of d satisfying q, c


Classical IR Measures(1)

1. Doc 1, rel=32. Doc 2, rel=33. Doc 3, rel=24. Doc 4, rel=05. Doc 5, rel=16. Doc 6, rel=2

1. Doc 1, rel=32. Doc 2, rel=33. Doc 3, rel=24. Doc 4, rel=05. Doc 5, rel=16. Doc 6, rel=2

Result Doc Set



RR,MRR

Navigational Search/ Question Answering

– A need for a few high-ranked result

Reciprocal Ranking

– How far is an answer document from rank 1?

Example) ½=0.5

Mean Reciprocal Ranking

– Mean of RR of the query test set

1. Doc N2. Doc P3. Doc N4. Doc N5. Doc N

1. Doc N2. Doc P3. Doc N4. Doc N5. Doc N

Result Doc Set



MAP

Average Precision

– ( 1.00 + 1.00 + 0.75 +

0.67 + 0.38 ) / 6 = 0.633

Mean Average Precision

– Average of the average precision value for a set of queries

– MAP = ( AP1 + AP2 + ... + APn ) / (# of Queries)


Evaluation Measure


Empirical Evaluation

10,000 queries randomlysampled from logs Queries classified acc.

to ODP (level 2)

Keep only queries withat least two intents (~900)

Top 50 results from Live, Google, and Yahoo!

Documents are rated on a 5-pt scale >90% docs have ratings

Docs without ratings are assigned random grade according to the distribution of rated documents

QueryQuery

intentscategoryintents

category docdoc

ODP

Proprietary repository of human judgment

A queryclassifier

A queryclassifier


Results

NDCG-IA

MAP-IA and MRR-IA


Evaluation using Mechanical Turk

Sample 200 queries from the dataset used in Experiment 1

query

category1

category2

category3

+

a category they most closely associate with the given query

1. Doc 1, rel=?2. Doc 2, rel=?3. Doc 3, rel=?4. Doc 4, rel=?5. Doc 5, rel=?

Result Doc Set

Judge the corresponding results with respect to the chosen category using the same 4-point scale



Evaluation using Mechanical Turk


Conclusion

How best to diversify results in the presence of ambiguous queries

Provided a greed algorithm for the objective with good approximation guarantees

Q&A

Thank you

19