EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 [email protected] May 21 st, 2009 Nathan

EigenRank:A Ranking-Oriented Approach to Collaborative Fil-tering

IDS Lab. Seminar

Spring 2009

강 민 석[email protected]

May 21st, 2009

Nathan N. Liu & Qiang Yang

SIGIR 2008

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Contents

Introduction

Related Work

Rating Oriented Collaborative Filtering

Ranking Oriented Collaborative Filtering

Experiments

Conclusions

2

Copyright 2009 by CEBT

Introduction

Recommender Systems Content-based filtering

Analyze content information associated with items and users E.g. product descriptions, user profiles, etc.

Represent users and items using a set of features

Collaborative filtering

NOT require content information about items

Assumption that a user is interested in items preferred by other similar users

shirt

color

red blue black

brand size

User A

Item 1 Item 2 Item 3User B

Item 1 Item 2 Item 3

Content-based filtering collaborative filtering

3


Introduction

Collaborative Filtering Application Scenario Rating prediction

one individual item at a time with a predicted rating

Top-N recommended items

an ordered list of top-N recommended items

Rating Prediction (MovieLens) Top-N List (Amazon)

4


Introduction

Motivation In most CF, adopt rating-oriented approach

predict potential ratings first, then rank them

Higher accuracy in rating prediction does NOT necessarily lead to better ranking effectiveness

Example

Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect

Most existing methods predict ratingwithout considering user’s preferences regarding pair of items

5

Item i Item j error

True rating 3 4

Predicted 1 2 5

Predicted 2 4 3

2)45()32(22 2)43()34(

22


Introduction

Overview Ranking-oriented Approach to CF

directly address item ranking problem

Without inter-mediate step of rating prediction

Contribution Similarity measure for two user’s rankings

Kendall rank correlation coefficient

Methods for producing item rankings

Greedy order algorithm, Random walk model

6

Rating prediction Rank items

Contents

Introduction

Related Work

Neighborhood-based Approach

Model-based Approach



Experiments

Conclusions

7



User-based Model Estimate unknown ratings of a target user

based on ratings of neighboring users by using user-user similarity

Difficulties in User-based Model Raw ratings may contain biases

E.g. Some tends to give high ratings.

Use user-specific means

User-item ratings data is sparse

dimensionality reduction

data-smoothing methods

User u item User v

4 Item A 2

5 Item B 2

5 Item C 1

5 Item D 4

4 Item E 3

5 Item F 2

4.67 Mean 2.33

0.52 Stdev 1.03

8



Item-based Model similar, but use item-item similarity

Less sensitive to sparsity problem

# of items < # of users

Higher accuracy while allowing more efficient computations

Sarwar et al., 2001

Item-based model (Amazon)

9


Model-based Approach

Model-based Approach Use observed user-item ratings to train a compact model

Rating prediction via the model instead of directly manipulating data

Algorithms

Clustering methods

Aspect models

Bayesian networks

Learning to Rank Rank items represented in some feature space

Methods Try to

Learn an item scoring function

Learn a classifier for classifying item pairs

10

Contents

Introduction

Related Work


Similarity Measure

Rating Prediction


Experiments

Conclusions

11


Rating-based Similarity Measures

Pearson Correlation Coefficient Similarity between two users

normalize ratings using average

Vector Similarity Another way of user-user similarity

view each user as a vector

cosine of the angle between two vectors

Item-Item similarity

Adjusted cosine similarity most effective

12


Rating Prediction

User-based Model select a set of k most similar users

compute weighted average of ratings

Item-based Model similar to user-based model

Set of k items most similar to i

13

Contents

Introduction

Related Work



Similarity Measure – Kendall Rank Correlation Coefficient

Preference Functions – Greedy Order & Random Walk Model

Experiments

Conclusions

14


Similarity Measure

Motivation PCC and VS are rating-based measures

In ranking-based, similarity is determined by users’ preferences over items.

E.g. for user 1 and 2, rating values are different, but preferences are very close.

Kendall Rank Correlation Coefficient

Item A Item B Item C Ranking rating diff

User 1 2 3 4 C > B > A

User 2 3 4 5 C > B > A

15

3

2

2

different preference

same preference2)1(

2 nn

nC


Preference Functions

Modeling a user’s preference function Given two items i and j, which item is more preferable and how

much?

means item i is more preferable

indicates the strength of preference

Characteristics

For same item :

Anti-symmetric :

NOT transitive : do not imply

16



Derive Preference Function Key challenge is to get preference that have NOT been rated.

Use the same idea of neighborhood-based CF

Find the set of neighbors of target user who have rated both items

17



Produce Ranking Given preference function, we want to get a ranking of items.

Ranking that agree with pairwise preferences as much as possible

Ranking ρ : ranking of item in item set I

: item i is ranked higher than j

Value function

How ρ is consistent with the preference function Ψ

Our goal is to find that maximizes value function

Optimal solution

NP-Complete problem : Use Greedy algorithm

18


Greedy Order Algorithm

Motivation Find an approximately optimal ranking

Algorithm Input : item set I, preference function Ψ

Output : ranking

Complexity is O(n2), more than half of optimal

19

potential valuehigher when more items less preferred than i

find highest ranked item

remove highest one,then iterate


Random Walk Model for Item Ranking

Random Walk based on User Preferences Motivation

some rated i > j, others rated j > k, but only few rated all three i, j, k

want to infer preference between i and k (implicit relationships)

Use multi-step random walks

Markov chain model

Google PageRank

Random walk on Web pages based on hyperlink Surfer randomly pick hyperlink

Stationary distribution used to PageRank

Model for item ranking

Similarly, there are implicit links between two items less preferred item j link to more preferred item i

transitional probability

Stationary distribution used to item ranking

20

At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions … (Wikipedia)

page pagelink

item itempreference



Random Walk based on User Preferences Transitional probability

Probability of switching current item i to another item j

higher for items that are more preferred than i

depend on user’s preference function

21

Why exp function? non-nega-tive



Compute the Item Rankings Think of PageRank algorithm you may know

We can use matrix notations

P : transition matrix

entry : transition probability

: probability of being at item i after t walking steps

define

get these probabilities using power iteration method for solving eigenvec-tor

Stationary probabilities

It works?

Existence and uniqueness guaranteed iff P is irreducible entries of P are all non-negative

22



Personalization Vector (teleport) To avoid the reducibility of the stochastic matrix (Brin and Page,

1998)

Revised transition matrix

PageRank

Web surfer sometimes “teleport” to other pages.

Teleport according to probability distribution defined by personalization vector v

ε controls how often surfer teleport rather than following hyperlinks.

Our model

similar idea to define personalization vector Teleport to items with high ratings more often

Unrated items have equal probabilities

23

Contents

Introduction

Related Work



Experiments

Conclusions

24


Experiments

Issues

1. Is ranking-oriented approach better than rating-oriented?

2. Which is better, greedy order algorithm and random walk model?

3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?

25

Pearson’s / Vec-tor

Similarity

Kendall’s rankSimilarity

Rating User / Item

Ranking

Greedy

Random Walk

1

2

3


Experiments

Data Sets Two Movie ratings data sets

EachMovie and Netflix

Users rate >40 different movies

10,000 for training

100 for parameter tuning

500 for testing

Evaluation Protocol For each user in the test set,

50% for model construction

50% for hold-out data for evaluation

26

EachMovie Netflix

# of ratings 2.8 M → ? 100 M → ?

# of users72,000 → 10,600

480,000 → 10,600

# of movies 1.628 18,000 → 2.000

Rating scale 1 to 6 1 to 5

density 6.1 % 6.6 %


Evaluation Metric

Which metric to use? Rating-oriented CF

MAE (Mean Absolute Error) and RMSE (Root Mean Square Error)

Focus on difference between true rating and predicted rating

Ranking-oriented CF

Our emphasis is on improving item rankings.

NDCG (Normalized Discounted Cumulative Gain) Evaluate over the top-k items on ranked list

27

discounting factorIncrease with position in ranking


Impact of Parameters

Impact of Neighborhood Size size of neighborhood affect performance

Result

When neighbor size ↑, NDCG ↑ until 100because given more neighbors, preference function more accurate

But, start to decrease when exceed 100, due to many non-similar users

28


Impact of Parameters

Impact of ε How often “teleport” operation affect performance?

Result

When ε ↑, NDCG ↑

But, NOT too big (0.8~0.9)

29


Comparisons with Other Algorithms

30

Issues

1. Is ranking-oriented approach better than rating-oriented?

2. Which is better, greedy order algorithm and random walk model?

3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?

Comparison 4 rating oriented settings, 6 ranking oriented settingsPCC VS KRCC

RatingUser UPCC UVS

Item IPCC IVS

Rank-ing

Greedy GOPCC GOVS GOKRCC

Random Walk

RWPCC RWVS RWKRCC


Comparisons with Other Algorithms

Result Ranking-oriented is better than rating-oriented about 8.8% for

NDCG1

Random walk model outperformed all the rating-oriented

Random walk model is little better than greedy order

Kendall rank correlation coefficient is more effective for rank-ing-oriented

31


Kendall rank corr. coeff.

Conclusion

Ranking-oriented Framework for CF Item ranking w/o rating prediction as intermediate step

Extend neighborhood-based CF by identifying preferences

Two methods for computing item ranking

Greedy order algorithm

Random walk model

32

Similarity measure Preference function

Greedy order

Random walk model

33

Clustering the Tagged Web

Thank you~

Documents

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 [email protected] May 21 st, 2009 Nathan