Upload
hortense-fowler
View
221
Download
0
Embed Size (px)
Citation preview
EigenRank:A Ranking-Oriented Approach to Collaborative Fil-tering
IDS Lab. Seminar
Spring 2009
강 민 석[email protected]
May 21st, 2009
Nathan N. Liu & Qiang Yang
SIGIR 2008
Center for E-Business TechnologySeoul National UniversitySeoul, Korea
Contents
Introduction
Related Work
Rating Oriented Collaborative Filtering
Ranking Oriented Collaborative Filtering
Experiments
Conclusions
2
Copyright 2009 by CEBT
Introduction
Recommender Systems Content-based filtering
Analyze content information associated with items and users E.g. product descriptions, user profiles, etc.
Represent users and items using a set of features
Collaborative filtering
NOT require content information about items
Assumption that a user is interested in items preferred by other similar users
shirt
color
red blue black
brand size
User A
Item 1 Item 2 Item 3User B
Item 1 Item 2 Item 3
Content-based filtering collaborative filtering
3
Copyright 2009 by CEBT
Introduction
Collaborative Filtering Application Scenario Rating prediction
one individual item at a time with a predicted rating
Top-N recommended items
an ordered list of top-N recommended items
Rating Prediction (MovieLens) Top-N List (Amazon)
4
Copyright 2009 by CEBT
Introduction
Motivation In most CF, adopt rating-oriented approach
predict potential ratings first, then rank them
Higher accuracy in rating prediction does NOT necessarily lead to better ranking effectiveness
Example
Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect
Most existing methods predict ratingwithout considering user’s preferences regarding pair of items
5
Item i Item j error
True rating 3 4
Predicted 1 2 5
Predicted 2 4 3
2)45()32(22 2)43()34(
22
Copyright 2009 by CEBT
Introduction
Overview Ranking-oriented Approach to CF
directly address item ranking problem
Without inter-mediate step of rating prediction
Contribution Similarity measure for two user’s rankings
Kendall rank correlation coefficient
Methods for producing item rankings
Greedy order algorithm, Random walk model
6
Rating prediction Rank items
Contents
Introduction
Related Work
Neighborhood-based Approach
Model-based Approach
Rating Oriented Collaborative Filtering
Ranking Oriented Collaborative Filtering
Experiments
Conclusions
7
Copyright 2009 by CEBT
Neighborhood-based Approach
User-based Model Estimate unknown ratings of a target user
based on ratings of neighboring users by using user-user similarity
Difficulties in User-based Model Raw ratings may contain biases
E.g. Some tends to give high ratings.
Use user-specific means
User-item ratings data is sparse
dimensionality reduction
data-smoothing methods
User u item User v
4 Item A 2
5 Item B 2
5 Item C 1
5 Item D 4
4 Item E 3
5 Item F 2
4.67 Mean 2.33
0.52 Stdev 1.03
8
Copyright 2009 by CEBT
Neighborhood-based Approach
Item-based Model similar, but use item-item similarity
Less sensitive to sparsity problem
# of items < # of users
Higher accuracy while allowing more efficient computations
Sarwar et al., 2001
Item-based model (Amazon)
9
Copyright 2009 by CEBT
Model-based Approach
Model-based Approach Use observed user-item ratings to train a compact model
Rating prediction via the model instead of directly manipulating data
Algorithms
Clustering methods
Aspect models
Bayesian networks
Learning to Rank Rank items represented in some feature space
Methods Try to
Learn an item scoring function
Learn a classifier for classifying item pairs
10
Contents
Introduction
Related Work
Rating Oriented Collaborative Filtering
Similarity Measure
Rating Prediction
Ranking Oriented Collaborative Filtering
Experiments
Conclusions
11
Copyright 2009 by CEBT
Rating-based Similarity Measures
Pearson Correlation Coefficient Similarity between two users
normalize ratings using average
Vector Similarity Another way of user-user similarity
view each user as a vector
cosine of the angle between two vectors
Item-Item similarity
Adjusted cosine similarity most effective
12
Copyright 2009 by CEBT
Rating Prediction
User-based Model select a set of k most similar users
compute weighted average of ratings
Item-based Model similar to user-based model
Set of k items most similar to i
13
Contents
Introduction
Related Work
Rating Oriented Collaborative Filtering
Ranking Oriented Collaborative Filtering
Similarity Measure – Kendall Rank Correlation Coefficient
Preference Functions – Greedy Order & Random Walk Model
Experiments
Conclusions
14
Copyright 2009 by CEBT
Similarity Measure
Motivation PCC and VS are rating-based measures
In ranking-based, similarity is determined by users’ preferences over items.
E.g. for user 1 and 2, rating values are different, but preferences are very close.
Kendall Rank Correlation Coefficient
Item A Item B Item C Ranking rating diff
User 1 2 3 4 C > B > A
User 2 3 4 5 C > B > A
15
3
2
2
different preference
same preference2)1(
2 nn
nC
Copyright 2009 by CEBT
Preference Functions
Modeling a user’s preference function Given two items i and j, which item is more preferable and how
much?
means item i is more preferable
indicates the strength of preference
Characteristics
For same item :
Anti-symmetric :
NOT transitive : do not imply
16
Copyright 2009 by CEBT
Preference Functions
Derive Preference Function Key challenge is to get preference that have NOT been rated.
Use the same idea of neighborhood-based CF
Find the set of neighbors of target user who have rated both items
17
Copyright 2009 by CEBT
Preference Functions
Produce Ranking Given preference function, we want to get a ranking of items.
Ranking that agree with pairwise preferences as much as possible
Ranking ρ : ranking of item in item set I
: item i is ranked higher than j
Value function
How ρ is consistent with the preference function Ψ
Our goal is to find that maximizes value function
Optimal solution
NP-Complete problem : Use Greedy algorithm
18
Copyright 2009 by CEBT
Greedy Order Algorithm
Motivation Find an approximately optimal ranking
Algorithm Input : item set I, preference function Ψ
Output : ranking
Complexity is O(n2), more than half of optimal
19
potential valuehigher when more items less preferred than i
find highest ranked item
remove highest one,then iterate
Copyright 2009 by CEBT
Random Walk Model for Item Ranking
Random Walk based on User Preferences Motivation
some rated i > j, others rated j > k, but only few rated all three i, j, k
want to infer preference between i and k (implicit relationships)
Use multi-step random walks
Markov chain model
Google PageRank
Random walk on Web pages based on hyperlink Surfer randomly pick hyperlink
Stationary distribution used to PageRank
Model for item ranking
Similarly, there are implicit links between two items less preferred item j link to more preferred item i
transitional probability
Stationary distribution used to item ranking
20
At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions … (Wikipedia)
page pagelink
item itempreference
Copyright 2009 by CEBT
Random Walk Model for Item Ranking
Random Walk based on User Preferences Transitional probability
Probability of switching current item i to another item j
higher for items that are more preferred than i
depend on user’s preference function
21
Why exp function? non-nega-tive
Copyright 2009 by CEBT
Random Walk Model for Item Ranking
Compute the Item Rankings Think of PageRank algorithm you may know
We can use matrix notations
P : transition matrix
entry : transition probability
: probability of being at item i after t walking steps
define
get these probabilities using power iteration method for solving eigenvec-tor
Stationary probabilities
It works?
Existence and uniqueness guaranteed iff P is irreducible entries of P are all non-negative
22
Copyright 2009 by CEBT
Random Walk Model for Item Ranking
Personalization Vector (teleport) To avoid the reducibility of the stochastic matrix (Brin and Page,
1998)
Revised transition matrix
PageRank
Web surfer sometimes “teleport” to other pages.
Teleport according to probability distribution defined by personalization vector v
ε controls how often surfer teleport rather than following hyperlinks.
Our model
similar idea to define personalization vector Teleport to items with high ratings more often
Unrated items have equal probabilities
23
Contents
Introduction
Related Work
Rating Oriented Collaborative Filtering
Ranking Oriented Collaborative Filtering
Experiments
Conclusions
24
Copyright 2009 by CEBT
Experiments
Issues
1. Is ranking-oriented approach better than rating-oriented?
2. Which is better, greedy order algorithm and random walk model?
3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?
25
Pearson’s / Vec-tor
Similarity
Kendall’s rankSimilarity
Rating User / Item
Ranking
Greedy
Random Walk
1
2
3
Copyright 2009 by CEBT
Experiments
Data Sets Two Movie ratings data sets
EachMovie and Netflix
Users rate >40 different movies
10,000 for training
100 for parameter tuning
500 for testing
Evaluation Protocol For each user in the test set,
50% for model construction
50% for hold-out data for evaluation
26
EachMovie Netflix
# of ratings 2.8 M → ? 100 M → ?
# of users72,000 → 10,600
480,000 → 10,600
# of movies 1.628 18,000 → 2.000
Rating scale 1 to 6 1 to 5
density 6.1 % 6.6 %
Copyright 2009 by CEBT
Evaluation Metric
Which metric to use? Rating-oriented CF
MAE (Mean Absolute Error) and RMSE (Root Mean Square Error)
Focus on difference between true rating and predicted rating
Ranking-oriented CF
Our emphasis is on improving item rankings.
NDCG (Normalized Discounted Cumulative Gain) Evaluate over the top-k items on ranked list
27
discounting factorIncrease with position in ranking
Copyright 2009 by CEBT
Impact of Parameters
Impact of Neighborhood Size size of neighborhood affect performance
Result
When neighbor size ↑, NDCG ↑ until 100because given more neighbors, preference function more accurate
But, start to decrease when exceed 100, due to many non-similar users
28
Copyright 2009 by CEBT
Impact of Parameters
Impact of ε How often “teleport” operation affect performance?
Result
When ε ↑, NDCG ↑
But, NOT too big (0.8~0.9)
29
Copyright 2009 by CEBT
Comparisons with Other Algorithms
30
Issues
1. Is ranking-oriented approach better than rating-oriented?
2. Which is better, greedy order algorithm and random walk model?
3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?
Comparison 4 rating oriented settings, 6 ranking oriented settingsPCC VS KRCC
RatingUser UPCC UVS
Item IPCC IVS
Rank-ing
Greedy GOPCC GOVS GOKRCC
Random Walk
RWPCC RWVS RWKRCC
Copyright 2009 by CEBT
Comparisons with Other Algorithms
Result Ranking-oriented is better than rating-oriented about 8.8% for
NDCG1
Random walk model outperformed all the rating-oriented
Random walk model is little better than greedy order
Kendall rank correlation coefficient is more effective for rank-ing-oriented
31
Copyright 2009 by CEBT
Kendall rank corr. coeff.
Conclusion
Ranking-oriented Framework for CF Item ranking w/o rating prediction as intermediate step
Extend neighborhood-based CF by identifying preferences
Two methods for computing item ranking
Greedy order algorithm
Random walk model
32
Similarity measure Preference function
Greedy order
Random walk model
33
Clustering the Tagged Web
Thank you~