Lecture: Dudu Yanay. Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’. Goal: To find a rank-prediction

Pranking with RankingKoby Crammer and Yoram Singer

Lecture: Dudu Yanay

Input:Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.

Goal:To find a rank-prediction rule which assigns to each instance a rank which is as close as possible to the instance true rank.

Similar problems:◦ Classifications.◦ Regression.

The Problem

Information Retrieval.

Collaborative filtering:Predict a user’s rating on new items (books, movies etc) given the user’s past rating of similar items.

Natural Setting For…

To cast a rating problem as a regression problem.

To reduce a total order into a set of preferences over pairs.◦ Time consuming since it might require to increase the sample

size from to .

Possible Solutions

n 2O n

Online Algorithm (Littlestone 1988):◦ Each can be computed

in polynomial time.

◦ If the problem is separable,after polynomial failures(no) the learner doesn’t makea mistake. Meaning:

Lets try another approach…

מורהלומד1x

)(1 xh1 1( )y f x

noyes /

)(2 xh2 2( )y f x

noyes /

2x

( )ih x

( )ih x f x

Animation from Nader Bshouty’s

Course.

The PERCEPTRON algorithm

1x

2x

1 2( , )w w

1 2( , )w w

Animation from Nader

Bshouty’sCourse.

The PERCEPTRON algorithm

0

1

1) w 0; 0;

2) Get ( , )

3) Predict ( ).

4) If Mistake ( )

4.2) ;

4.3) 1;

5) Goto 2.

Ti

i i

i

x y

z sign w x

z y

w w y x

i i

A slide from Nader Bshouty’s

Course.

1x

2x

2|| || 1w

| |Tw x

| |Tw x

2|| ||x R

R

w

2

#( )R

Mistakes

A slide from Nader Bshouty’s Course.

PRank algorithm - The model Input:

A sequence ◦ .

Output:A ranking rule where:◦ .◦ .◦ .

Ranking loss after T rounds is: where is the TRUE rank of the instance in round ‘t’ and .

1 1 2 2, , , ,..., ,t tx y x y x y

, 1, 2,..., with ">" as the order relationi n ix y k

, : 1, 2,...,nw bH k

nw 1 2 1 1 2 1, ,..., , ...k k kb b b b b b b b

, 1,2,..,min : 0rw b r kH x r w x b

1

T tt

t

y y

ty

,

tt

w bH x y

PRank algorithm - The update rule Given an input instance-rank pair , if:

◦ .◦ .

Lets represent the above inequalities by where

,x y ,w bH x y 1,..., 1 , rr y w x b ,.., 1 , rr y k w x b

1 1,..., ,..., 1,..., 1, 1,..., 1y ky y y

1 if else 1r ry r y y

, , 0r rw bH x y r y w x b The TRUE rank vector

PRank algorithm - The update rule Given an input instance-rank pair , if

.

So, let’s “move” the values of and towards each other:◦ .

◦ , where the sum is only over the indices ‘r’ for which there was a prediction error, i.e., .

,x y ,w bH x y : 0r rr y w x b

w xrb

r r rb b y

rr

w w y x

0r ry w x b

The update rule - Illustrasion

1 2 3 4 5

Predicted Rank

Correct interval

The PRank algorithm

0 0

1,2,..

1) w 0; ;

2) Get ( , )

3) Predict min : 0 .

4) If Mistake ( )

4.1) 1

1.

4.2) 0

0.

r

t trr k

tr

tr

t t t t tr r r r

tr

b

x y

z r w x b

z y

if y r then y

else y

if w x b y then y

else

1

1

4.3) ;

4.5) ;

4.4) t 1;

5) Goto 2.

tt t r

r

t t tr r r

w w x

b b

t

Building the TRUErank vector

Checking which thresholdprediction is wrong

Updating the hypothesis

First, we need to show that the output hypothesis of Prank is acceptable. Meaning, if and is the final ranking rule then .

Proof – By induction:Since the initialization of the thresholds is such that , then it suffices to show that the claim hold inductively.

Lemma 1 (Order Preservation):Let and be the current ranking rule, where and let be an instance-rank pair fed to Prank on round ‘t’. Denote by and the resulting ranking after the update of Prank, then

PRank Analysis – Consistent Hypothesis

tw tb1 2 1...t t t

kb b b ,t tx y1tw 1tb

1 1 11 2 1...t t t

kb b b

fw fb

1 2 1...f f fkb b b

0 0 01 2 1... kb b b

Lemma 1 – Proof

2 3 4 5 6

Predicted Rank

Correct interval

1

1Option 1

1 2 3 4 5

Correct interval

Predicted Rank

1

1Option 2

1

Theorem 2:Let be an input sequence for PRank where . and . Denote by . Assume that there is a ranking rule with of a unit norm that classifies the entire sequence correctly with margin . . Then, the rank loss of the algorithm , is at the most .

PRank Analysis – Mistake bound

1 1, ,..., ,T Tx y x y

t nx 1,...,ty k 22 max ttR x

* * *,v w b * * *1 2 1... kb b b

* *,min 0t t

r t r rw x b y

1

T tt

t

y y

2

2

1 1k R

Comparison between:◦ Prank.◦ MultiClass Perceptron – MCP.◦ Widrow-Hoff (online regression) – WH.

Datasets:◦ Synthetic.◦ EachMovie.

Experiments

Randomly generated points - uniformly at random.

Each point was assign a rank according to:

◦ - noise. Generated 100

sequences of instance-rank pairs, each of length 7000.

Synthetic Dataset 21 2, 0,1x x x

1,...,5y

1 2max 10 0.5 05 , , 1, 0.1,0.25,1r ry x x b whereb

0,0.125N

Collaborative filtering dataset. Contains ratings of movies provided by 61,265 people.

6 possible rating: 0, 0.2, 0.4, 0.6, 0.8, 1. Only people with at

least 100 rating whereconsidered.

Chose at random oneperson to be the TRUE rank and otherratings where used asfeatures(-0.5,-0.3,-0.1,0.1, 0.3, 0.5).

EachMovie Dataset

Batch setting Ran Prank over the training data as an online algorithm

and used its last hypothesis to rank the unseen data.

EachMovie Dataset – cont’

Thank You

PERCEPTRON משפט

2|| || 1w | |Tw x 2|| ||x R2

#( )R

Mistakes הוכחה

2

cos( )|| ||

Tt

tt

w w

w

1 ( )T Ti i i iw w w w y a

( )T Ti i iw w y w a

| |T Ti iw w w a

Tiw w

Ttw w t

PERCEPTRON משפט

2|| || 1w | |Tw x 2|| ||x R2

#( )R

Mistakes הוכחה

2

cos( )|| ||

Tt

tt

w w

w T

tw w t

21 2|| ||iw

22|| ||i i iw y a

2 22 2|| || || || 2 ( )T

i i i i i iw y a y w a 2 22 2|| || || ||i iw a 2 22|| ||iw R

2 22|| ||tw tR

PERCEPTRON משפט

2|| || 1w | |Tw x 2|| ||x R2

#( )R

Mistakes הוכחה

2

cos( )|| ||

Tt

tt

w w

w T

tw w t 2 22|| ||tw tR

2

1 cos( )|| ||

Tt

tt

w w

w t

tRR t

2

#( )R

Mistakes t

Documents

Lecture: Dudu Yanay. Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’. Goal: To find a rank-prediction