76
Learning to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 1 Tie-Yan Liu @ Renmin University

Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Embed Size (px)

Citation preview

Page 1: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Learning to Rank for Information Retrieval

Tie-Yan Liu

Lead Researcher

Microsoft Research Asia

4/23/2008 1Tie-Yan Liu @ Renmin University

Page 2: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Outline

• Ranking in Information Retrieval

• Learning to Rank

– Introduction to Learning to Rank

– Pointwise approach

– Pairwise approach

– Listwise approach

• Summary

4/23/2008 3Tie-Yan Liu @ Renmin University

Page 3: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Introduction to Ranking in IR

4/23/2008 Tie-Yan Liu @ Renmin University 4

Page 4: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Inside Search Engine

4/23/2008 Tie-Yan Liu @ Renmin University 5

Link Analysis

Caching

Web Page Parser

InvertedIndex

Page & SiteStatistics

Web Graph

Crawler

User Interface

CachedPages Links &

AnchorsPages

Index Builder

Link Map

Page Ranks

Web Graph Builder

Query

Relevance Ranking

Online Part

Offline Part

Page 5: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Ranking in Information Retrieval

Ranking model

2008elcome/wwworg/blog/wwww.iw3c2.

htmlons/index.g/submissiwww2008.or

gwww2008.or

2

1

q

k

q

q

d

d

d

}2008{wwwq

Query

Document index

Ranking of documents

idD

4/23/2008 6Tie-Yan Liu @ Renmin University

Page 6: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Widely-used Judgments

• Binary judgment

– Relevant vs. Irrelevant

• Multi-level ratings

– Perfect > Excellent > Good > Fair > Bad

• Pairwise preferences

– Document A is more relevant than document B with respect to query q

4/23/2008 7Tie-Yan Liu @ Renmin University

Page 7: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Evaluation Measures

• MRR (Mean Reciprocal Rank)

– For query , rank position of the first relevant document:

– MRR: average of over all queries

• WTA (Winners Take All)

– If top ranked document is relevant: 1; otherwise 0

– average over all queries

• MAP (Mean Average Precision)

• NDCG (Normalized Discounted Cumulative Gain)

iriq

ir/1

4/23/2008 8Tie-Yan Liu @ Renmin University

Page 8: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

MAP

• Precision at position n

• Average precision

• MAP: averaged over all queries in the test set

94/23/2008 Tie-Yan Liu @ Renmin University

n

nnP

}results in top documentsrelevant {#@

}documentsrelevant {#

}relevant is document {@

n

nInP

AP

76.05

3

3

2

1

1

3

1

AP =

Page 9: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

NDCG

• NDCG at position n:

• NDCG averaged for all queries in test set

4/23/2008 Tie-Yan Liu @ Renmin University 10

)1log(/)12()(1

)( jZnNn

j

jr

n

Page 10: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

More about Evaluation Measures

• Query-level

– Averaged over all queries.

– The measure is bounded, and every query contributes equally to the averaged measure.

• Position

– Position discount is used here and there.

• Rating

– Different ratings may correspond to different gains.

4/23/2008 Tie-Yan Liu @ Renmin University 11

Page 11: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Learning to Rank for IR

OverviewPointwise Approach

Pairwise ApproachListwise Approach

4/23/2008 Tie-Yan Liu @ Renmin University 12

Page 12: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Conventional IR ModelsCategory Example Models

Similarity-based models

Boolean model

Vector space model

Latent Semantic Indexing

Probabilistic models

BM25 model

Language model for IR

Hyperlink-basedmodels

HITS

PageRank

……

4/23/2008 13Tie-Yan Liu @ Renmin University

Page 13: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Problems with Conventional Ranking Models

• For a particular model

– Parameter tuning is difficult, especially when there are many parameters to tune.

• For comparison between two models

– Given a test set, when one model is over-tuned (over-fitting) while the other is not, how can we compare them?

• For a collection of models

– There are hundreds of models proposed in the literature. It is non-trivial how to combine them.

4/23/2008 Tie-Yan Liu @ Renmin University 14

Page 14: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Machine Learning Can Help

• Machine learning is an effective tool

– To automatically tune parameters

– To combine multiple evidences

– To avoid over-fitting (regularization framework, structure risk minimization, etc.)

4/23/2008 Tie-Yan Liu @ Renmin University 15

Page 15: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Framework of Learning to Rank for IR

16

Learning System

Ranking System

Model

),,( wdqf

),,(,

),,(,

),,(,

22

11

wdqfd

wdqfd

wdqfd

inin

ii

ii

1,

2,

4,

)1(

)1(

2

)1(

1

)1(

)1(nd

d

d

q

2,

3,

5,

)(

)(

2

)(

1

)(

)(

m

n

m

m

m

md

d

d

q

Labels: multiple-level ratings for example

queries

do

cum

ents

Training Data

Test data

?),(......,

?),(?),,( 21

nd

dd

q

4/23/2008 Tie-Yan Liu @ Renmin University

Page 16: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Connection with Conventional Machine Learning

• With certain assumptions, learning to rank can be transformed (reduced) to conventional machine learning problems

– Regression (assume labels to be scores)

– Classification (assume labels to have no orders)

– Pairwise classification (assume document pairs to be independent of each other)

4/23/2008 17Tie-Yan Liu @ Renmin University

Page 17: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Difference from Conventional Machine Learning

• Unique properties of ranking in IR– Role of query

• Query determines the logical structure of the ranking problem.

• Loss function should be defined on ranked lists w.r.t. a query, since IR measures are computed at query level.

– Relative order is important• No need to predict category, or value of f(x)

– Position• Top-ranked objects are more important

4/23/2008 18Tie-Yan Liu @ Renmin University

Page 18: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Categorization of Learning to Rank Methods

PointwiseApproach

Pairwise Approach Listwise Approach

Reduced to classification or regression

Discriminative model for IR (SIGIR 2004)McRank (NIPS 2007)…

Ranking SVM (ICANN 1999)RankBoost (JMLR 2003)LDM (SIGIR 2005)RankNet (ICML 2005)Frank (SIGIR 2007)GBRank (SIGIR 2007)QBRank (NIPS 2007)MPRank (ICML 2007)…

New problem IRSVM (SIGIR 2006)MHR (SIGIR 2007)…

AdaRank (SIGIR 2007)SVM-MAP (SIGIR 2007)SoftRank (LR4IR 2007)GPRank (LR4IR 2007)CCA (SIGIR 2007)LambdaRank (NIPS 2006)RankCosine (IP&M 2007)ListNet (ICML 2007)ListMLE (ICML 2008)…

4/23/2008 19Tie-Yan Liu @ Renmin University

Page 19: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Learning to Rank for IR

Overview

Pointwise ApproachPairwise ApproachListwise Approach

4/23/2008 Tie-Yan Liu @ Renmin University 20

Page 20: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Overview of Pointwise Approach

• Reduce ranking to regression or classification on single documents.

• Examples:

– Discriminative model for IR.

– McRank: learning to rank using multi-class classification and gradient boosting.

4/23/2008 Tie-Yan Liu @ Renmin University 21

Page 21: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Discriminative Model for IR(R. Nallapati, SIGIR 2004)

• Features extracted for each document

• Treat relevant documents as positive examples, irrelevant documents as negative examples.

• Use SVM or MaxEnt to perform discriminative learning.

4/23/2008 Tie-Yan Liu @ Renmin University 22

Page 22: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

McRank(P. Li, et al. NIPS 2007)

• Loss in terms of DCG is upper bounded (although lossly) by multi-class classification error

• Use gradient boosting to perform multi-class classification

– K trees per boosting iteration

– Tree outputs converted to probabilities using logistic function

– Convert classification to ranking:

4/23/2008 Tie-Yan Liu @ Renmin University 23

Page 23: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Discussions

• Assumption

– Relevance is absolute and query-independent.

• However, in practice, relevance is query-dependent

– Relevance labels are assigned by comparing the documents associated with the same query.

– An irrelevant document for a popular query might have higher term frequency than a relevant document for a rare query.

4/23/2008 Tie-Yan Liu @ Renmin University 24

Page 24: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Learning to Rank for IR

OverviewPointwise Approach

Pairwise ApproachListwise Approach

4/23/2008 Tie-Yan Liu @ Renmin University 25

Page 25: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Overview of Pairwise Approach• No longer assume absolute relevance.

• Reduce ranking to classification on document pairs (i.e. pairwise classification) w.r.t. the same query.

• Examples

– RankNet, Frank, RankBoost, Ranking SVM, etc.

Transform

2,

3,

5,

)(

)(

2

)(

1

)(

)(

i

n

i

i

i

id

d

d

q

)()(

2

)()(

1

)(

2

)(

1

)(

)()( ,,...,,,, i

n

ii

n

iii

i

ii dddddd

q

5 > 3 5 > 2 3 > 2

264/23/2008 Tie-Yan Liu @ Renmin University

Page 26: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

RankNet(C. Burges, et al. ICML 2005)

• Target posteriors:

• Modeled posteriors:

• Define

• New loss function: cross entropy (CE loss)

• Model output probabilities using logistic function:

4/23/2008 27Tie-Yan Liu @ Renmin University

Page 27: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

RankNet (2)

• Use Neural Network as model, and gradient descent as algorithm, to optimize the cross-entropy loss

• Evaluate on single documents: output a relevance score for each document w.r.t. a new query

4/23/2008 Tie-Yan Liu @ Renmin University 28

Page 28: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

RankBoost(Y. Freund, et al. JMLR 2003)

• Given: initial distribution D1 over

• For t=1,…, T:– Train weak learning using distribution Dt

– Get weak ranking ht: X R

– Choose

– Update:

where

• Output the final ranking:

0 1 0 1

1 0 1

( , )exp ( ( ) ( ))( , ) t t t t

t

t

D x x h x h xD x x

Z

0 1

0 1 0 1

,

( , )exp ( ( ) ( ))t t t t t

x x

Z D x x h x h x

t R

1

( ) ( )T

t t

t

H x h x

4/23/2008 29Tie-Yan Liu @ Renmin University

Use AdaBoost to perform pairwise classification

Page 29: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Ranking SVM(R. Herbrich, et al. ICANN 1999)

0

1,

||||2

1min

)2()1(

2

,

i

iiii

iw

xxwz

Cw

xwwxf ,ˆ)ˆ;(

4/23/2008 30Tie-Yan Liu @ Renmin University

Use SVM to perform pairwise classification

x(1)-x(2) as instance of learning

Use SVM to perform binary classification on these instances,

to learn model parameter w

Use w for testing

Page 30: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Other Work in this Category

• Frank: A Fidelity Loss for Information Retrieval, (M. Tsai, T. Liu, et al. SIGIR 2007)

• Linear discriminant model for information retrieval (LDM), (J. Gao, H. Qi, et al. SIGIR 2005)

• A Regression Framework for Learning Ranking Functions using Relative Relevance Judgments (GBRank), (Z. Zheng, H. Zha, et al. SIGIR 2007)

• A General Boosting Method and its Application to Learning Ranking Functions for Web Search (QBRank), (Z. Zheng, H. Zha, et al. NIPS 2007)

• Magnitude-preserving Ranking Algorithms (MPRank), (C. Cortes, M. Mohri, et al. ICML 2007)

4/23/2008 Tie-Yan Liu @ Renmin University 31

Page 31: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Discussions

• Progress made as compared to pointwiseapproach

– No longer assume absolute relevance

• However,

– Unique properties of ranking in IR have not been fully modeled.

4/23/2008 Tie-Yan Liu @ Renmin University 32

Page 32: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Problems with Pairwise Approach (1)

• Number of instance pairs varies according to query

4/23/2008 Tie-Yan Liu @ Renmin University 33

Page 33: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Problems with Pairwise Approach (2)

• Negative effects of making errors on top positions

d: definitely relevant, p: partially relevant, n: not relevant

ranking 1: p d p n n n n one wrong pair Worseranking 2: d p n p n n n one wrong pair Better

4/23/2008 Tie-Yan Liu @ Renmin University 34

Page 34: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Problems with Pairwise Approach (3)

• Different rank pairs correspond to different ranking criteria

– Perfect:• Uniqueness,

• domain knowledge

• Authority

– Excellent / Good• Content match

– Bad• Detrimental

4/23/2008 Tie-Yan Liu @ Renmin University 35

Page 35: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

As a Result, …

• Users only see ranked list of documents, but not document pairs

• It is not clear how pairwiseloss correlates with IR evaluation measures, which are defined on query level.

4/23/2008 Tie-Yan Liu @ Renmin University 36

Pairwise loss vs. (1-NDCG@5)TREC Dataset

Page 36: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Incremental Solution to Problems (1)(2)

IRSVM: Change Loss of Ranking SVM(Y. Cao, J. Xu, T. Liu, et al. SIGIR 2006)

Implicit Position DiscountLarger weight for more critical pairs

4/23/2008 37Tie-Yan Liu @ Renmin University

Query-level normalizer

Page 37: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Incremental Solution to Problems (3)

Multi-Hyperplane Ranker(T. Qin, T. Liu, et al. SIGIR 2007)

• Suppose there are k-level ratings

– m=k*(k-1)/2 different types of document pairs: • (rank 1, rank 2), (rank 1, rank 3), (rank 2, rank 3), …

• Divide-and-conquer approach

– Train m hyperplanes using Ranking SVM, with each hyperplane only focusing on one type of document pairs.

– Combine ranking results given by these hyperplanes to compose final ranked list, by means of rank aggregation.

384/23/2008 Tie-Yan Liu @ Renmin University

Page 38: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

More Fundamental Solution

Listwise Approach to Ranking

• Directly operate on ranked lists– Directly optimize IR evaluation measures

• AdaRank,

• SVM-MAP,

• SoftRank,

• RankGP, …

– Define listwise loss functions• RankCosine,

• ListNet,

• ListMLE, …

4/23/2008 Tie-Yan Liu @ Renmin University 39

Page 39: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Learning to Rank for IR

OverviewPointwise Approach

Pairwise Approach

Listwise Approach

4/23/2008 Tie-Yan Liu @ Renmin University 40

Page 40: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Overview of Listwise Approach

• Instead of reducing ranking to regression and classification, perform learning directly on document list.

– Treats ranked lists as learning instances.

• Two major categories

– Directly optimize IR evaluation measures

– Define listwise loss functions

4/23/2008 Tie-Yan Liu @ Renmin University 41

Page 41: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Directly Optimize IR Evaluation Measures

• It is a natural idea to directly optimize what is used to evaluate the ranking results.

• However, it is non-trivial.

– Evaluation measures such as NDCG are usually non-smooth and non-differentiable.

– It is challenging to optimize such objective functions, since most of the optimization techniques were developed to work with smooth and differentiable cases.

424/23/2008 Tie-Yan Liu @ Renmin University

Page 42: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Tackle the Challenges

• Use optimization technologies designed for smooth and differentiable problems

– Fit the optimization of evaluation measure into the logic of existing optimization technique (AdaRank)

– Optimize a smooth and differentiable upper bound of evaluation measure (SVM-MAP)

– Soften (approximate) the evaluation measure so as to make it smooth and differentiable, and easy to optimize (SoftRank)

• Use optimization technologies designed for non-smooth and non-differentiable problems

– Genetic programming (RankGP)

4/23/2008 43Tie-Yan Liu @ Renmin University

Page 43: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

AdaRank(J. Xu, et al. SIGIR 2007)

• Use Boosting to perform the optimization.

• Embed IR evaluation measure in the distribution update of Boosting.

4/23/2008 Tie-Yan Liu @ Renmin University 44

Page 44: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Theoretical Analysis

• Training error will be continuously reduced during learning phase when <1

454/23/2008 Tie-Yan Liu @ Renmin University

Page 45: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Practical Solution

• In practice, we cannot guarantee <1, therefore, AdaRank might not converge.

• Solution: when a weak ranker is selected, we test whether <1, if not, the second best weak ranker is selected.

464/23/2008 Tie-Yan Liu @ Renmin University

Page 46: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

SVM-MAP(Y. Yue, et al. SIGIR 2007)

• Use SVM framework for optimization

• Incorporate IR evaluation measures in the listwise constraints.

• Sum of slacks ∑ξ upper bounds 1-E(y,y’).

4/23/2008 Tie-Yan Liu @ Renmin University 47

i

iN

Cw 2

2

1min

)),'(1()',,(),,( :' yyydydyy EqFqF

Page 47: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Challenges in SVM-MAP

• For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g.,

• An incorrect labeling would be any other ranking, e.g.,

• Exponential number of rankings, thus an exponential number of constraints!

4/23/2008 48Tie-Yan Liu @ Renmin University

Page 48: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Structural SVM Training

• STEP 1: Perform optimization on only current working set of constraints.

• STEP 2: Use the model learned in STEP 1 to find the most violated constraint from the exponential set of constraints.– This step can also be very fast since MAP is invariant on the order of

documents within a relevance class

• STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint in the working set, add it to the working set.

STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al. 2005]

4/23/2008 49Tie-Yan Liu @ Renmin University

Page 49: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

SoftRank(M. Taylor, et al. LR4IR 2007)

• Add additive noise to the ranking result, and thus make NDCG “soft” enough for optimization.

4/23/2008 Tie-Yan Liu @ Renmin University 50

Define the probability of a document being ranked at a particular position

Page 50: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

From Scores to Rank Distribution

• Considering the score for each document sj as random variable

• Probability of doci is ranked before docj

• Expected rank

• Rank Distribution pj(r)

4/23/2008 51Tie-Yan Liu @ Renmin University

Page 51: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Soft NDCG

• From hard to soft

• Neural networks for optimization

4/23/2008 Tie-Yan Liu @ Renmin University 52

Page 52: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

RankGP(J. Yeh, et al. LR4IR 2007)

• Define ranking function as a tree

• Use genetic programming to learn the optimal tree• GP is widely used to optimize non-smoothing non-

differentiable objective.• Crossover, mutation, reproduction, tournament selection.• Fitness function = IR evaluation measure.

4/23/2008 53Tie-Yan Liu @ Renmin University

Page 53: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Other Work in this Category

• Learning to Rank with Nonsmooth Cost Functions , (C.J.C. Burges, R. Ragno, et al. NIPS 2006 )

• A combined component approach for finding collection-adapted ranking functions based on genetic programming (CCA), (H. Almeida, M. Goncalves, et al. SIGIR 2007).

4/23/2008 Tie-Yan Liu @ Renmin University 54

Page 54: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Discussions

• Progress has been made as compared to pointwise and pairwise approaches

• Still many unsolved problems– Is the upper bound in SVM-MAP tight enough?– Is SoftNDCG highly correlated with NDCG?– Is there any theoretical justification on the correlation

between the implicit loss function used in LambdaRankand NDCG?

– Multiple measures are not necessarily consistent with each other: which to optimize?

4/23/2008 Tie-Yan Liu @ Renmin University 55

Page 55: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Define Listwise Loss Function

• Indirectly optimizing IR evaluation measures– Consider properties of ranking for IR.

• Examples– RankCosine: use cosine similarity between score vectors as

loss function– ListNet: use KL divergence between permutation

probability distributions as loss function– ListMLE: use negative log likelihood of the ground truth

permutation as loss function

• Theoretical analysis on listwise loss functions

4/23/2008 Tie-Yan Liu @ Renmin University 56

Page 56: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

RankCosine(T. Qin, T. Liu, et al. IP&M 2007)

• Loss function define as

• Properties

– Insensitive to number of documents / document pairs per query

– Emphasize correct ranking on by setting high scores for top in ground truth.

– Bounded:

4/23/2008 Tie-Yan Liu @ Renmin University 57

Page 57: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Optimization

• Additive model as ranking function

• Loss over queries

• Gradient descent for optimization

4/23/2008 58Tie-Yan Liu @ Renmin University

Page 58: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

More Investigation on Loss Functions

• A Simple Example:– function f: f(A)=3, f(B)=0, f(C)=1 ACB

– function h: h(A)=4, h(B)=6, h(C)=3 BAC

– ground truth g: g(A)=6, g(B)=4, g(C)=3 ABC

• Question: which function is closer to ground truth?– Based on pointwise similarity: sim(f,g) < sim(g,h).– Based on pairwise similarity: sim(f,g) = sim(g,h)

– Based on cosine similarity between score vectors?f: <3,0,1> g:<6,4,3> h:<4,6,3>

• However, according to position discount, f should be closer to g!

594/23/2008 Tie-Yan Liu @ Renmin University

sim(g,h)=0.93sim(f,g)=0.85

Closer!

Page 59: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Permutation Probability Distribution

• Question:

– How to represent a ranked list?

• Solution

– Ranked list ↔ Permutation probability distribution

– More informative representation for ranked list: permutation and ranked list has 1-1 correspondence.

604/23/2008 Tie-Yan Liu @ Renmin University

Page 60: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Luce Model: Defining Permutation Probability

• Probability of permutation 𝜋 is defined as

• Example:

)(sP

n

jn

jk k

j

s

s

1 )(

)(

)(

)(

)C(

)C(

)C()B(

)B(

)A()B()A(

)A(ABC

f

f

ff

f

fff

fPf

P(A ranked No.1)

P(B ranked No.2 | A ranked No.1)= P(B ranked No.1)/(1- P(A ranked No.1))

P(C ranked No.3 | A ranked No.1, B ranked No.2)

614/23/2008 Tie-Yan Liu @ Renmin University

Page 61: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Properties of Luce Model

• Nice properties

– Continuous, differentiable, and convex w.r.t. score s.

– Suppose g(A) = 6, g(B)=4, g(C)=3, P(ABC) is largest and P(CBA) is smallest; for the ranking based on f: ABC, swap any two, probability will decrease, e.g., P(ABC) > P(ACB)

– Translation invariant, when

– Scale invariant, when

)exp()( ss

sas )(

624/23/2008 Tie-Yan Liu @ Renmin University

Page 62: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Distance between Ranked Lists

dis(f,g) = 0.46

dis(g,h) = 2.56

φ = exp

634/23/2008

Using KL-divergenceto measure difference between distributions

Tie-Yan Liu @ Renmin University

Page 63: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

ListNet Algorithm(Z. Cao, T. Qin, T. Liu, et al. ICML 2007)

• Loss function = KL-divergence between two permutation probability distributions (φ = exp)

• Model = Neural Network

• Algorithm = Gradient Descent

Qq jjG

n

tn

tu jy

jyn

tn

tu jy

jy

ku

t

u

t

Xw

Xw

s

swL

),...,( 1 )(

)(

1 )(

)(

1 )exp(

)exp(log

)exp(

)exp()(

644/23/2008 Tie-Yan Liu @ Renmin University

Page 64: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Experimental Results

Pairwise (RankNet)

654/23/2008 Tie-Yan Liu @ Renmin University

Training Performance on TREC Dataset

Listwise (ListNet)

More correlated!

Page 65: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Limitations of ListNet

• Too complex in practice.

• Performance of ListNet depends on the mapping function from ground truth label to scores

– Ground truth are permutations (ranked list) or group of permutations.

– Ground truth labels are not naturally scores.

4/23/2008 Tie-Yan Liu @ Renmin University 66

Page 66: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Solution

• Never assume ground truth to be scores.

• Given the ranking model, compute the likelihood of a ground truth permutation, and maximize it to learn the model parameter.

4/23/2008 Tie-Yan Liu @ Renmin University 67

Page 67: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Likelihood Loss

• Top-k Likelihood

• Regularized Likelihood Loss ( )

4/23/2008 Tie-Yan Liu @ Renmin University 68

Page 68: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

ListMLE Algorithm(F. Xia, T. Liu, et al. ICML 2008)

• Loss function = likelihood loss

• Model = Neural Network

• Algorithm = Stochastic Gradient Descent

4/23/2008 Tie-Yan Liu @ Renmin University 69

TD2004 Dataset

Page 69: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Summary

• Ranking is a new application, and also a new machine learning problem.

• Learning to rank: from pointwise, pairwise to listwiseapproach.

• Many open questions.

4/23/2008 Tie-Yan Liu @ Renmin University 70

Page 70: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

4/23/2008 Tie-Yan Liu @ Renmin University 71

po

inw

ise

pai

rwis

elis

twis

e

2000 2001 2002 2003 2004 2005 2006 2007

MCRankDiscriminative IR model

RankNetRankingSVM

RankBoost

FRank

IRSVM

AdaRank

SVM-MAP

SoftRank

RankCosine

RankGP

MHR

ListNet* Papers published by our group

SRALambdaRank

ListMLE

LDMGBRank

MPRank

QBRank

CCA

Relational Ranking

Rank Refinement

Page 71: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

• R. Herbrich, T. Graepel, et al. Support Vector Learning for Ordinal Regression, ICANN1999.

• T. Joachims, Optimizing Search Engines Using Clickthrough Data, KDD 2002.

• Y. Freund, R. Iyer, et al. An Efficient Boosting Algorithm for Combining Preferences, JMLR 2003.

• R. Nallapati, Discriminative model for information retrieval, SIGIR 2004.

• J. Gao, H. Qi, et al. Linear discriminant model for information retrieval, SIGIR 2005.

• C.J.C. Burges, T. Shaked, et al. Learning to Rank using Gradient Descent, ICML 2005.

• I. Tsochantaridis, T. Joachims, et al. Large Margin Methods for Structured and Interdependent Output Variables, JMLR, 2005.

• F. Radlinski, T. Joachims, Query Chains: Learning to Rank from Implicit Feedback, KDD 2005.

• I. Matveeva, C.J.C. Burges, et al. High accuracy retrieval with multiple nested ranker, SIGIR 2006.

• C.J.C. Burges, R. Ragno, et al. Learning to Rank with Nonsmooth Cost Functions , NIPS 2006

• Y. Cao, J. Xu, et al. Adapting Ranking SVM to Information Retrieval, SIGIR 2006.

• Z. Cao, T. Qin, et al. Learning to Rank: From Pairwise to Listwise Approach, ICML 2007.

4/23/2008 72Tie-Yan Liu @ Renmin University

References

Page 72: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

• T. Qin, T.-Y. Liu, et al, Multiple hyperplane Ranker, SIGIR 2007.

• T.-Y. Liu, J. Xu, et al. LETOR: Benchmark dataset for research on learning to rank for information retrieval, LR4IR 2007.

• M.-F. Tsai, T.-Y. Liu, et al. FRank: A Ranking Method with Fidelity Loss, SIGIR 2007.

• Z. Zheng, H. Zha, et al. A General Boosting Method and its Application to Learning Ranking Functions for Web Search, NIPS 2007.

• Z. Zheng, H. Zha, et al, A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgment, SIGIR 2007.

• M. Taylor, J. Guiver, et al. SoftRank: Optimising Non-Smooth Rank Metrics, LR4IR 2007.

• J.-Y. Yeh, J.-Y. Lin, et al. Learning to Rank for Information Retrieval using Generic Programming, LR4IR 2007.

• T. Qin, T.-Y. Liu, et al, Query-level Loss Function for Information Retrieval, Information Processing and Management, 2007.

• X. Geng, T.-Y. Liu, et al, Feature Selection for Ranking, SIGIR 2007.

• Y. Yue, T. Finley, et al. A Support Vector Method for Optimizing Average Precision, SIGIR 2007.

4/23/2008 73Tie-Yan Liu @ Renmin University

References (2)

Page 73: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

References (3)

• Y.-T. Liu, T.-Y. Liu, et al, Supervised Rank Aggregation, WWW 2007.

• J. Xu and H. Li, A Boosting Algorithm for Information Retrieval, SIGIR 2007.

• F. Radlinski, T. Joachims. Active Exploration for Learning Rankings from Clickthrough Data, KDD 2007.

• P. Li, C. Burges, et al. McRank: Learning to Rank Using Classification and Gradient Boosting, NIPS 2007.

• Z. Zheng, H. Zha, et al. A Regression Framework for Learning Ranking Functions using Relative Relevance Judgments, SIGIR 2007.

• Z. Zheng, H. Zha, et al. A General Boosting Method and its Application to Learning Ranking Functions for Web Search, NIPS 2007.

• C. Cortes, M. Mohri, et al. Magnitude-preserving Ranking Algorithms, ICML 2007.

• H. Almeida, M. Goncalves, et al. A combined component approach for finding collection-adapted ranking functions based on genetic programming, SIGIR 2007.

• T. Qin, T.-Y. Liu, et al, Learning to Rank Relational Objects and Its Application to Web Search, WWW 2008.

• R. Jin, H. Valizadegan, et al. Ranking Refinement and Its Application to Information Retrieval, WWW 2008.

• F. Xia. T.-Y. Liu, et al. Listwise Approach to Learning to Rank – Theory and Algorithm, ICML 2008.

• Y. Lan, T.-Y. Liu, et al. On Generalization Ability of Learning to Rank Algorithms, ICML 2008.

4/23/2008 Tie-Yan Liu @ Renmin University 74

Page 74: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Advertisement

• SIGIR 2008 workshop on learning to rank

– http://research.microsoft.com/users/LR4IR-2008/

– Submission due: May 16.

• SIGIR 2008 tutorial on learning to rank

– Include more new algorithms

– Include more theoretical discussions

4/23/2008 Tie-Yan Liu @ Renmin University 75

Page 75: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

Advertisement (2)

• LETOR: Benchmark Dataset for Research on Learning to Rank– http://research.microsoft.com/users/LETOR/

– T. Liu, J. Xu, et al. LR4IR 2007

4/23/2008 Tie-Yan Liu @ Renmin University 76

Page 76: Learning to Rank for Information Retrieval to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/23/2008 Tie-Yan Liu @ Renmin University 1 Outline

4/23/2008 Tie-Yan Liu @ Renmin University 77

[email protected]://research.microsoft.com/users/tyliu/