Diversifying Search Results

Diversifying Search Results

Rakesh Agrawal Sreenivas GollapudiSearch Labs Search LabsMicrosoft Research Microsoft [email protected] [email protected]

Alan Halverson Samuel IeongSearch Labs Search LabsMicrosoft Research Microsoft [email protected] [email protected]

WSDM ’09

mailto:[email protected]




Outline

• Introduction• Problem Formulation• A Greedy Algorithm for DIVERSIFY(K)• Performance Metrics• Evaluation• Conclusions

Introduction

• Minimize the risk of dissatisfaction of the average user

• Assume that there exist – a taxonomy – a model user intents

• Consider both the relevance of the documents and diversity of the search result

• Tradeoff relevance and diversity

Problem Formulation

• The number of results to show for each category according to the percentage of users interested in that category may perform poorly

• Example : Flash– Technology : 0.6

Problem Formulation

• Non-order• Our algorithm is also designed to generate an

ordering of results rather than just a set of results

Problem Formulation

• DIVERSIFY(k) is NP-hard• Optimal for DIVERSIFY (k-1) need not be a

subset of documents optimal for DIVERSIFY (k)• Example :

p(c1|q)=p(c2|q)=0.5

DIVERSIFY(1):d1,d2,d3

DIVERSIFY(2):d2,d3,d1

A Greedy Algorithm for DIVERSIFY(K)

Performance Metrics

• NDCG,MRR,MAP do not take into account the value of diversification

• Intent Aware Measure example:

p(c2|q)>>p(c1|q)d1 is Excellent for c1(but unrelated to c2)d2 is Good for c2(but unrelated to c1)Classical IR metrics:d1,d2

Intent aware measures:d2,d1

Intent Aware Measure

Evaluation

• Evaluate our approach against three commercial search engine

• Conduct three sets of experiments• Differ in how the distributions of intents and

how the relevance of the documents are obtained

Experiment 1

• The distributions of intents for both queries and documents via standard classifiers

• The relevance of documents from a proprietary repository of human judgements that we have been granted access to

• Dataset : 10,000 random queries with top 50 documents

• Many documents are assigned human judgments in the top 10 for each query

Experiment 1

• sample about 900 queries– at least two categories– a significant fraction of associated documents

have human judgments

Experiment 1

Experiment 2

• Obtain the distributions of intents for queries and the document relevance using the Amazon Mechanical Turk platform

• Sample 200 queries from the dataset– at least three categories

• Submit these queries along with the three most likely categories as estimated by the classier and the top five results produced by IA-Select to the Turks

Experiment 2

Experiment 3

• IA-Select : p(c|q) from Amazon Mechanical Turk platform

• Metrics : p(c|q) and relevance documents are the same as used in Experiment 1

Conclusions

• Provide a greedy algorithm with good approximation gurantees

• To evaluate the effectiveness of our approach, we proposed generalizations of well-studied metrics to take into account of the intentions of the users

• Our approach outperforms results produced by commercial search engines over all of the metrics

Documents

Diversifying Search Results