37
IIR 2016, V, I C N L M C F S Daniel Valcarce, Javier Parapar, Álvaro Barreiro @dvalcarce @jparapar @AlvaroBarreiroG Information Retrieval Lab @IRLab_UDC University of A Coruña Spain

Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Embed Size (px)

Citation preview

Page 2: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Outline

1. Introduction to Recommender Systems

2. Neighbourhood-based Methods

3. Computing Neighbourhoods

4. Language Models for Neighbourhoods

5. Experiments

6. Conclusions and Future Directions

1/26

Page 3: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

INTRODUCTION TO RECOMMENDER SYSTEMS

Page 4: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Recommender Systems

Recommender systems provide personalised suggestions foritems that may be of interest to the users.

Top-N Recommendation: create a ranking of the N mostrelevant items for each user.

Different approaches:

# Content-based: exploit item description to recommenditems similar to those the target user liked in the past.

# Collaborative filtering: rely on the user feedback such asratings or clicks to generate recommendations.

# Hybrid: combination of content-based and collaborativefiltering approaches.

3/26

Page 5: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Recommender Systems

Recommender systems provide personalised suggestions foritems that may be of interest to the users.

Top-N Recommendation: create a ranking of the N mostrelevant items for each user.

Different approaches:

# Content-based: exploit item description to recommenditems similar to those the target user liked in the past.

# Collaborative filtering: rely on the user feedback such asratings or clicks to generate recommendations.

# Hybrid: combination of content-based and collaborativefiltering approaches.

3/26

Page 6: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Collaborative Filtering

Collaborative Filtering (CF) exploit feedback from users:

# Explicit: ratings or reviews.# Implicit: clicks or purchases.

Two main families of CF methods:

# Model-based: learn a model from the data and use it forrecommendation.

# Neighbourhood-based (or memory-based): computerecommendations using directly part of the ratings.

4/26

Page 7: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Collaborative Filtering

Collaborative Filtering (CF) exploit feedback from users:

# Explicit: ratings or reviews.# Implicit: clicks or purchases.

Two main families of CF methods:

# Model-based: learn a model from the data and use it forrecommendation.

# Neighbourhood-based (or memory-based): computerecommendations using directly part of the ratings.

4/26

Page 8: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

NEIGHBOURHOOD-BASED METHODS

Page 9: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Neighbourhood-based Methods

Two perspectives:

# User-based: recommend items that users with commoninterests with you liked.

# Item-based: recommend items similar to those you liked.Similarity between items is computed using common usersamong items (not the content!).

6/26

Page 10: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Weighted Sum Recommender (WSR)

Very simple but effective approach (Valcarce et al., ECIR 2016).

WSR computes a weighted sum of the ratings in theneighbourhood. Weights are calculated using cosine similarity.

Item-based version (WSR-IB):

r̂u ,i �∑j∈ Ji

cosine�i , j

�ru , j (1)

User-based version (WSR-UB):

r̂u ,i �∑v∈Vu

cosine (u , v) rv ,i (2)

The computation of neighbourhoods is crucial!

7/26

Page 11: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Weighted Sum Recommender (WSR)

Very simple but effective approach (Valcarce et al., ECIR 2016).

WSR computes a weighted sum of the ratings in theneighbourhood. Weights are calculated using cosine similarity.

Item-based version (WSR-IB):

r̂u ,i �∑j∈ Ji

cosine�i , j

�ru , j (1)

User-based version (WSR-UB):

r̂u ,i �∑v∈Vu

cosine (u , v) rv ,i (2)

The computation of neighbourhoods is crucial!

7/26

Page 12: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

COMPUTING NEIGHBOURHOODS

Page 13: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Computing Neighbourhoods with k-NN algorithm

The effectiveness of neighbourhood-based methods relieslargely on how neighbours are computed.

The most common approach is to compute the k nearestneighbours (k-NN algorithm) using a pairwise similarity.

# The most common similarities are Pearson’s correlationcoefficient or cosine similarity.

# Cosine provides important improvements over Pearson’scorrelation coefficient (Cremonesi et al., RecSys 2010).

Let’s study cosine similarity from the perspective ofInformation Retrieval.

9/26

Page 14: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Computing Neighbourhoods with k-NN algorithm

The effectiveness of neighbourhood-based methods relieslargely on how neighbours are computed.

The most common approach is to compute the k nearestneighbours (k-NN algorithm) using a pairwise similarity.

# The most common similarities are Pearson’s correlationcoefficient or cosine similarity.

# Cosine provides important improvements over Pearson’scorrelation coefficient (Cremonesi et al., RecSys 2010).

Let’s study cosine similarity from the perspective ofInformation Retrieval.

9/26

Page 15: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Cosine Similarity and the Vector Space Model

Recommendation Information Retrieval

Target user QueryRest of users Documents

Items Terms

Under this scheme, using cosine similarity for findingneighbours is equivalent to search in the Vector Space Model.

If we swap users and items, we can derive an analogousitem-based approach.

We can use sophisticated search techniques for findingneighbours!

10/26

Page 16: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Cosine Similarity and the Vector Space Model

Recommendation Information Retrieval

Target user QueryRest of users Documents

Items Terms

Under this scheme, using cosine similarity for findingneighbours is equivalent to search in the Vector Space Model.

If we swap users and items, we can derive an analogousitem-based approach.

We can use sophisticated search techniques for findingneighbours!

10/26

Page 17: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Cosine Similarity and the Vector Space Model

Recommendation Information Retrieval

Target user QueryRest of users Documents

Items Terms

Under this scheme, using cosine similarity for findingneighbours is equivalent to search in the Vector Space Model.

If we swap users and items, we can derive an analogousitem-based approach.

We can use sophisticated search techniques for findingneighbours!

10/26

Page 18: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Cosine Similarity and the Vector Space Model

Recommendation Information Retrieval

Target user QueryRest of users Documents

Items Terms

Under this scheme, using cosine similarity for findingneighbours is equivalent to search in the Vector Space Model.

If we swap users and items, we can derive an analogousitem-based approach.

We can use sophisticated search techniques for findingneighbours!

10/26

Page 19: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

LANGUAGE MODELS FOR NEIGHBOURHOODS

Page 20: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models

Statistical language models are a state-of-the-art framework fordocument retrieval.

Documents are ranked according to their posterior probabilitygiven the query:

p(d |q) � p(q |d) p(d)p(q)

rank� p(q |d) p(d)

The query likelihood, p(q |d), is based on a unigram model:

p(q |d) �∏t∈q

p(t |d)c(t ,d)

The document prior, p(d), is usually considered uniform.

12/26

Page 21: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models

Statistical language models are a state-of-the-art framework fordocument retrieval.

Documents are ranked according to their posterior probabilitygiven the query:

p(d |q) � p(q |d) p(d)p(q)

rank� p(q |d) p(d)

The query likelihood, p(q |d), is based on a unigram model:

p(q |d) �∏t∈q

p(t |d)c(t ,d)

The document prior, p(d), is usually considered uniform.

12/26

Page 22: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models

Statistical language models are a state-of-the-art framework fordocument retrieval.

Documents are ranked according to their posterior probabilitygiven the query:

p(d |q) � p(q |d) p(d)p(q)

rank� p(q |d) p(d)

The query likelihood, p(q |d), is based on a unigram model:

p(q |d) �∏t∈q

p(t |d)c(t ,d)

The document prior, p(d), is usually considered uniform.

12/26

Page 23: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models for Finding Neighbourhoods (I)

Information Retrieval:

p(d |q) rank� p(d)

∏t∈q

p(t |d)c(t ,d)

User-based collaborative filtering:

p(v |u) rank� p(v)

∏i∈Iu

p(i |v)rv ,i

Item-based collaborative filtering:

p( j |i) rank� p( j)

∏u∈Ui

p(u | j)ru , j

13/26

Page 24: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models for Finding Neighbourhoods (II)

User-based collaborative filtering:

p(v |u) rank� p(v)

∏i∈Iu

p(i |v)rv ,i

We assume a multinomial distribution over the count of ratings.The maximum likelihood estimate (MLE) is:

pmle(i |v) � rv ,i∑j∈Iv rv , j

However it suffers from sparsity. We need smoothing!

14/26

Page 25: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Language Models for Finding Neighbourhoods (II)

User-based collaborative filtering:

p(v |u) rank� p(v)

∏i∈Iu

p(i |v)rv ,i

We assume a multinomial distribution over the count of ratings.The maximum likelihood estimate (MLE) is:

pmle(i |v) � rv ,i∑j∈Iv rv , j

However it suffers from sparsity. We need smoothing!

14/26

Page 26: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Smoothing Methods for Language Models

Absolute Discounting (AD)

pδ(i |u) � max(ru ,i − δ, 0) + δ |Iu | p(i |C)∑j∈Iu ru , j

Jelinek-Mercer (JM)

pλ(i |u) � (1 − λ) ru ,i∑j∈Iu ru , j

+ λ p(i |C)

Dirichlet Priors (DP)

pµ(i |u) � ru ,i + µ p(i |C)µ +∑

j∈Iu ru , j

15/26

Page 27: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

EXPERIMENTS

Page 28: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Experimental settings

Baselines:

# Pearson’s correlation coefficient# RM1Sim: user-based similarity (Bellogín et al., RecSys ’13)# Cosine similarity

Our similarities are Language Models using:

# Absolute Discounting smoothing# Jelinek-Mercer smoothing# Dirichlet Priors smoothing

17/26

Page 29: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Parameter Sensibility of WSR-UB on MovieLens 100k

0.18

0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k

0.280.300.320.340.360.380.40

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

µ

nDC

G@

10

λ, δ

PearsonCosine

RM1Sim (λ)LM-Absolute Discounting (δ)

LM-Jelinek-Mercer (λ)LM-Dirichlet Priors (µ)

18/26

Page 30: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Parameter Sensibility of WSR-IB on R3-Yahoo!

0.0120.0140.0160.0180.0200.0220.0240.0260.0280.030

100 101 102 103 104 105 106

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

nDC

G@

10

µ

λ, δ

PearsonCosine

LM-Absolute Discounting (δ)LM-Jelinek-Mercer (λ)

LM-Dirichlet Priors (µ)

19/26

Page 31: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Precision (nDCG@10)

Algorithm ML 100k ML 1M R3-Yahoo LibraryThing

NNCosNgbr 0.1427 0.1042 0.0138 0.0550PureSVD 0.3595a 0.3499ac 0.0198a 0.2245a

Cosine-WSR 0.3899ab 0.3430a 0.0274ab 0.2476ab

LM-DP-WSR 0.4017abc 0.3585abc 0.0271ab 0.2464ab

LM-JM-WSR 0.4013abc 0.3622abcd 0.0276ab 0.2537abcd

Table: Values of precision in terms of normalised discountedcumulative gain at 10. Statistical significance is superscripted(Wilcoxon two-sided p < 0.01). Pink = best algorithm. Blue = notsignificantly different to the best.

20/26

Page 32: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Diversity (Gini@10)

Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing

Cosine-WSR 0.0549 0.0400 0.0902 0.1025LM-DP-WSR 0.0659 0.0435 0.1557 0.1356LM-JM-WSR 0.0627 0.0435 0.1034 0.1245

Table: Values of the complement of the Gini index at 10.Pink = best algorithm.

21/26

Page 33: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Novelty (MSI@10)

Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing

Cosine-WSR 11.0579 12.4816 21.1968 41.1462LM-DP-WSR 11.5219 12.8040 25.9647 46.4197LM-JM-WSR 11.3921 12.8417 21.7935 43.5986

Table: Values of novelty in terms of Mean Self Information at 10.Pink = best algorithm.

22/26

Page 34: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

CONCLUSIONS AND FUTURE DIRECTIONS

Page 35: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Conclusions

Statistical language models are a powerful tool for computingneighbourhoods in a collaborative filtering scenario. Combinedwith WSR, language models:

# Provide highly accurate recommendations.# Improve novelty and diversity figures compared to cosine.# Have low computational complexity.

24/26

Page 36: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

Future work

Explore other probability distributions:

# Multivariate Bernoulli.# Multivariate Poisson.

Evaluate the use of inverted indexes to computeneighbourhoods:

# Efficiency.# Scalability.

25/26

Page 37: Computing Neighbourhoods with Language Models in a Collaborative Filtering Scenario [IIR '16 Slides]

THANK YOU!

@DVALCARCEhttp://www.dc.fi.udc.es/~dvalcarce