19
FABIO AIOLLI UNIVERSITY OF PADOVA (ITALY) F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 1 Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013 ACM RecSys 2013, Hong Kong

Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Embed Size (px)

DESCRIPTION

Paper presented in RecSys ACM conference 2013

Citation preview

Page 1: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

F A B I O A I O L L I

U N I V E R S I T Y O F P A D O V A ( I T A L Y )

F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 1

Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

16/10/2013

ACM RecSys 2013, Hong Kong

Page 2: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Abstract

Very large datasets: n users, m items, both in the order of millions

Top-N type of prediction

Implicit feedback: only information about what people have already rated

Efficiency:

Efficient MB-like scoring function tailored to implicit feedback that avoids the computation of the whole m x m (n x n) similarity matrix

Effectiveness:

Asymmetric similarity matrix

Asymmetric scoring function

Calibration

Ranking Aggregation

2 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 3: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

The MSD Challenge @kaggle

Very large scale, music recommendation challenge

Predict which songs a user will listen to given the listening history of the user

Based on the MSD (Million Song Dataset), a freely available collection of meta-data for one million of contemporary songs

The challenge was actually based on a subset (Taste Profile Subset) of more than 48 million rating pairs (user,song). Data consists of about 1.2 million users and covers more than 380.000 songs

User-Song matrix is very sparse (density 0.01%)

153 teams participating

We had full listening history for about 1M users, plus half of the listening history for 110K users, for which we were required to predict the missing half

3 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 4: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Why do not use Matrix Factorization?

MF is recognized as a state-of-the-art technique in CF but…

Model building is very expansive

Regression setting does not match exactly the implicit setting

Gradient descent issues: local minima and slow convergence rate

Too many parameters to optimize (n + m) x k, very sparse matrix, and no priori knowledge used -> overfitting

MF based solutions provided by organizers at the beginning of the challenge and MF based entries by other teams have shown really poor results on this task.

4 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 5: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Memory-based Models

In standard memory based NN models the entire matrix R is used to generate a prediction

Prediction is performed on-the-fly and no models have to be constructed

Independent predictions, can be easily parallelized!

Only few external parameters (lacks of flexibility)

Needs the complete computation of similarities for every user-user (or item-item) pair in order to compute NNs

5 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

pro

s co

ns

Page 6: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Memory-based Collaborative Filtering

We first define a modified version of the MB standard model tailored to CF with implicit feedback as it uses rated information only

User based: Item based:

q represents a locality parameter whose role is similar to taking the NNs. A bigger q corresponds to a fewer nearest neighbors considered. Note that, like in MF case, we can write

For each user, the N top-score items are recommended. User based: only U similarity computations (U = avg # of users x item) Item based: only I similarity computations (I = avg # of items x user)

6 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 7: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Asymmetric Cosine based CF

Given two variables and their (binary) vector representation, we define:

AsymC has a probabilistic interpretation as an asymmetric product of conditionals

7 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 8: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Locality effect: Item-based

IS (α=0) mAP@500

q=1 0.12224

q=2 0.16581

q=3 0.17144

q=4 0.17004

q=5 0.16830

IS (α=1/2) mAP@500

q=1 0.16439

q=2 0.16214

q=3 0.15587

q=4 0.15021

q=5 0.14621

8 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 9: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Locality effect: User-based

US (α=0) mAP@500

q=3 0.12479

q=4 0.13289

q=5 0.13400

q=6 0.13187

q=7 0.12878

US (α=1/2) mAP@500

q=3 0.12532

q=4 0.13779

q=5 0.14355

q=6 0.14487

q=7 0.14352

9 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 10: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

AsymC similarity effect

item-based varying α

user-based varying α

10 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 11: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Asymmetric Scoring function

Unfortunately, the norm of the weights term is inefficient to compute exactly and, whenever the number of items is very large, we suggest to estimated it from data

User based

Item based

11 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 12: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

AsymC scoring effect on user based recommandation

12

US, α=0. 5 mAP@500 Best β mAP@500

q=1 0.07679 0.3 0.14890

q=2 0.10436 0.5 0.15801

q=3 0.12532 0.6 0.16132

q=4 0.13779 0.7 0.16229

q=5 0.14355 0.8 0.16152

q=6 0.14487 0.9 0.15975

q=7 0.14352 0.9 0.15658

F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 13: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Calibration

Analyses of predicted score when items are actually rated on the training set

Different items could be map on different scales

13 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 14: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Calibration

The scores are calibrated by a simple piece-wise linear function

1.0

0.5

type parameters uncalibrated calibrated

IS mAP@500=0.1773 mAP@500=0.1811*

US mAP@500=0.1623 mAP@500=0.1649

14 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 15: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Ranking Aggregation

Assuming that the strategies are precision oriented, meaning that each one tends to make

good recommendations for songs on which they are more confident

different strategies are diverse and can recommend different songs

.. then aggregating different rankings can improve the results

Aggregating item-based and user-based strategies Stochastic aggregation: recommended items are chosen stochastically

from the lists

Linear aggregation: recommended items are chosen based on a combination of scores on different lists

Borda aggregation: recommended items are chosen based on a variant of the Borda Count algorithm

More details in the paper!

15 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 16: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Ranking aggregation results

16

IS, α=0.15,q=3 US, α=0.3,q=5 mAP@500

0.0 1.0 0.14098

0.1 0.9 0.14813

0.2 0.8 0.15559

0.3 0.7 0.16248

0.4 0.6 0.16859

0.5 0.5 0.17362

0.6 0.4 0.17684

0.7 0.3 0.17870

0.8 0.2 0.17896

0.9 0.1 0.17813

1.0 0.0 1.17732

F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 17: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Final MSD Challenge Results

17

RANK TEAM NAME mAP@500

1 Aio 0.17910

2 Learner 0.17196

3 Nohair 0.15892

4 Team Ubuntu 0.15695

5 TheMiner 0.15639

… … …

135 Songs by Popularity

0.02079

151 Random 0.00002

F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 18: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Final discussion

Best ranked teams all used approaches based on CF

The 2-nd ranked team used an approach similar to ours to create a set of features to use in a learning to rank algorithm

The 5-th ranked team used the Absorption algorithm by YouTube (graph based, random walks) to get their best pubblic score

Based on mine and other participant’s opinion and experiments Metadata did not help (given the very large implicit info contained in user history is

much more than explicit info in metadata)

Matrix factorization did not help too

Additional experiments on the MovieLens1M dataset can be found in the paper

In the future we want to study more on how to exploit rich metadata information, especially in a cold start setting

18 F. Aiolli – Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets 16/10/2013

Page 19: Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets

Thank you! Questions are welcome

The MSD competition (info and data): http://www.kaggle.com/c/msdchallenge

Python code I used for the challenge can be found in http://www.math.unipd.it/~aiolli/CODE/MSD/