17
Diversifying Search Results Rakesh Agrawal Sreenivas Gollapudi Search Labs Search Labs Microsoft Research Microsoft Research [email protected] [email protected] Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research [email protected] [email protected] WSDM ’09

Diversifying Search Results

  • Upload
    dore

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

Diversifying Search Results. Rakesh Agrawal Sreenivas Gollapudi Search Labs Search Labs Microsoft Research Microsoft Research [email protected] [email protected] Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research - PowerPoint PPT Presentation

Citation preview

Page 1: Diversifying Search Results

Diversifying Search Results

Rakesh Agrawal Sreenivas GollapudiSearch Labs Search LabsMicrosoft Research Microsoft [email protected] [email protected]

Alan Halverson Samuel IeongSearch Labs Search LabsMicrosoft Research Microsoft [email protected] [email protected]

WSDM ’09

Page 2: Diversifying Search Results

Outline

• Introduction• Problem Formulation• A Greedy Algorithm for DIVERSIFY(K)• Performance Metrics• Evaluation• Conclusions

Page 3: Diversifying Search Results

Introduction

• Minimize the risk of dissatisfaction of the average user

• Assume that there exist – a taxonomy – a model user intents

• Consider both the relevance of the documents and diversity of the search result

• Tradeoff relevance and diversity

Page 4: Diversifying Search Results

Problem Formulation

• The number of results to show for each category according to the percentage of users interested in that category may perform poorly

• Example : Flash– Technology : 0.6

Page 5: Diversifying Search Results

Problem Formulation

• Non-order• Our algorithm is also designed to generate an

ordering of results rather than just a set of results

Page 6: Diversifying Search Results

Problem Formulation

• DIVERSIFY(k) is NP-hard• Optimal for DIVERSIFY (k-1) need not be a

subset of documents optimal for DIVERSIFY (k)• Example :

p(c1|q)=p(c2|q)=0.5

DIVERSIFY(1):d1,d2,d3

DIVERSIFY(2):d2,d3,d1

Page 7: Diversifying Search Results

A Greedy Algorithm for DIVERSIFY(K)

Page 8: Diversifying Search Results

Performance Metrics

• NDCG,MRR,MAP do not take into account the value of diversification

• Intent Aware Measure example:

p(c2|q)>>p(c1|q)d1 is Excellent for c1(but unrelated to c2)d2 is Good for c2(but unrelated to c1)Classical IR metrics:d1,d2

Intent aware measures:d2,d1

Page 9: Diversifying Search Results

Intent Aware Measure

Page 10: Diversifying Search Results

Evaluation

• Evaluate our approach against three commercial search engine

• Conduct three sets of experiments• Differ in how the distributions of intents and

how the relevance of the documents are obtained

Page 11: Diversifying Search Results

Experiment 1

• The distributions of intents for both queries and documents via standard classifiers

• The relevance of documents from a proprietary repository of human judgements that we have been granted access to

• Dataset : 10,000 random queries with top 50 documents

• Many documents are assigned human judgments in the top 10 for each query

Page 12: Diversifying Search Results

Experiment 1

• sample about 900 queries– at least two categories– a significant fraction of associated documents

have human judgments

Page 13: Diversifying Search Results

Experiment 1

Page 14: Diversifying Search Results

Experiment 2

• Obtain the distributions of intents for queries and the document relevance using the Amazon Mechanical Turk platform

• Sample 200 queries from the dataset– at least three categories

• Submit these queries along with the three most likely categories as estimated by the classier and the top five results produced by IA-Select to the Turks

Page 15: Diversifying Search Results

Experiment 2

Page 16: Diversifying Search Results

Experiment 3

• IA-Select : p(c|q) from Amazon Mechanical Turk platform

• Metrics : p(c|q) and relevance documents are the same as used in Experiment 1

Page 17: Diversifying Search Results

Conclusions

• Provide a greedy algorithm with good approximation gurantees

• To evaluate the effectiveness of our approach, we proposed generalizations of well-studied metrics to take into account of the intentions of the users

• Our approach outperforms results produced by commercial search engines over all of the metrics