38
Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Recommender Systems Part II Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Recommender Systems Part II 1 / 31

Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

  • Upload
    hadan

  • View
    251

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics

Big Data Analytics

Lucas Rego Drumond

Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science

University of Hildesheim, Germany

Recommender Systems Part II

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 1 / 31

Page 2: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 1 / 31

Page 3: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 1 / 31

Page 4: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Recommender Systems

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 1 / 31

Page 5: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Formalization

I U - Set of Users

I I - Set of Items

I Ratings data D ⊆ U × I × R

Rating data D are typically represented as a sparse matrix R ∈ R|U|×|I |

user

sitems

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 2 / 31

Page 6: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Example

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3

I Users U := {Alice,Bob, John}I Items I := {Titanic,Matrix,The Godfather,Once}I Ratings data D := {(Alice,Titanic, 4), (Bob,Matrix, 4), . . .}

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 3 / 31

Page 7: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

User Based Recommender - Prediction Function

r(u, i) := ru +

∑v∈Nu

sim(u, v)(rvi − rv )∑v∈Nu

|sim(u, v)|

Where:

I ru is the average rating of user u

I sim is a similarity function used to compute the neighborhood Nu

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 4 / 31

Page 8: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Item Based Recommender - Prediction Function

r(u, i) := ri +

∑j∈Ni

sim(i , j)(rui − ri )∑j∈Ni|sim(i , j)|

Where:

I ri is the average rating of item i

I sim is a similarity function used to compute the neighborhood Ni

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 5 / 31

Page 9: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Factorization modelsI Each item i ∈ I is associated with a latent feature vector qi ∈ Rk

I Each user u ∈ U is associated with a latent feature vector pu ∈ Rk

I Each entry in the original matrix can be estimated by

r(u, i) = p>u qi =k∑

f =1

pu,f qi ,f

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 6 / 31

Page 10: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 1. Review

Example

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3

a≈b xx

RR QQTTPP

TT

AliceAlice

BobBob

JohnJohn

4

4

4

2

3

5

3

MM GG OO

AliceAlice

BobBob

JohnJohn

TT MM GG OO

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 7 / 31

Page 11: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 2. More on factorization models

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 8 / 31

Page 12: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 2. More on factorization models 2.1 Adding bias terms

Biased Matrix Factorization

I Specific users tend to have specific rating behaviorI Some users may tend to give higher (or lower) ratings

I The same can be said about items

I This can be easily modeled through bias terms for users bu and foritems bi in the prediction function:

r(u, i) = bu + bi + p>u qi

I Additionally a global bias can be added:

r(u, i) = g + bu + bi + p>u qi

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 8 / 31

Page 13: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 2. More on factorization models 2.1 Adding bias terms

Effect of the Biases

Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques forrecommender systems. IEEE Computer, 42(8):30–37, 2009.

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 9 / 31

Page 14: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 3. SVD++

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 10 / 31

Page 15: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 3. SVD++

Integrating Implicit feedback

I In many situations we have information about items that the user hasconsumed but did not evaluate

I Videos watchedI Products boughtI Webpages visitedI ...

I The set of items N (u) cosumed by a user u (rated or not) providesuseful information about the tastes of the user

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 10 / 31

Page 16: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 3. SVD++

SVD++SVD++ (Koren 2008) incorporates information about implcit feedbackinto user factorsUser factors are represented as:

pu +1√|N (u)|

∑j∈N (u)

vj

The prediction function is then written as:

rui := bu + bi + qTi

pu +1√|N (u)|

∑j∈N (u)

vj

Where:

I vj ∈ Rk are item latent vectors used to construct user profile.I N (u) is the set of items consumed by the user u.

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 11 / 31

Page 17: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 3. SVD++

SVD++ Performance

Dataset: NetflixMeasure: RMSE

Model 50 factors 100 factors 200 factors

MF 0.9046 0.9025 0.9009SVD++ 0.8952 0.8924 0.8911

Source: Yehuda Koren. Factorization meets the neighborhood: amultifaceted collaborative filtering model, KDD 2008

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 12 / 31

Page 18: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 13 / 31

Page 19: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Item PredictionWhich will be the next items to be consumed by a user?

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 13 / 31

Page 20: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

FormalizationI U - Set of UsersI I - Set of ItemsI Positive implicit feedback data D ⊆ U × I × {1}

We have available only information about N (u) which items the user hasinteracted with

user

sitems

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 14 / 31

Page 21: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Considerations

I We do not know whether a user has liked an item or not (how herated it)

I The only information we have is which items the user has bought,watched, clicked, ...

I The task is to predict which will be the next items the user willinteract with next

I We can assume that items already evaluated (i ∈ N (u)) are preferredover the not evaluated ones (i /∈ N (u))

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 15 / 31

Page 22: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Item Prediction Task

Assuming that items already evaluated are preferred over the notevaluated ones

i >u j iff i ∈ N (u) and j /∈ N (u)

Given a dataset DS ⊆ U × I × I :

DS := {(u, i , j)|i ∈ N (u) ∧ j /∈ N (u)}

For each user, find a total order >u over items j /∈ N (u) that reflects userpreferences

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 16 / 31

Page 23: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Item Prediction Approach

I Learn a model r : U × I → RI Sort items according to scores predicted by the model such that:

i >u j iff r(u, i) > r(u, j)

In a probabilistic setting, be Θ the model parameters, then

p(i >u j |Θ) := σ(yuij)

Where:

I σ(x) := 11+e−x

I yuij := r(u, i)− r(u, j)

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 17 / 31

Page 24: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Bayesian Personalized Ranking (BPR)

The Maximum Likelihood Estimator:

arg maxΘ

p(Θ| >u) ∝ p(>u |Θ)p(Θ)

Prior:p(Θ) := N(0,ΣΘ)

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 18 / 31

Page 25: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

The Bayesian Personalized Ranking Optimization Criterion(BPR-Opt)

BPR-Opt := ln∏u∈U

p(Θ| >u)

= ln∏u∈U

p(>u |Θ)p(Θ)

= ln∏

(u,i ,j)∈DS

σ(yuij)p(Θ)

=∑

(u,i ,j)∈DS

lnσ(yuij) + ln p(Θ)

=∑

(u,i ,j)∈DS

lnσ(yuij)− λ||Θ||2

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 19 / 31

Page 26: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Optimizing a factorization model for BPR:Model:

r(u, i) = p>u qi =k∑

f =1

pu,f qi ,f

Loss Function:L :=

∑(u,i ,j)∈DS

lnσ(yuij)− λ||Θ||2

Gradients:

∂BPR-Opt

∂θ=−e−yuij

1 + e−yuij· ∂∂θ

yuij − λθ

∂θyuij =

(qif − qjf ) if θ = puf

puf if θ = qif

−puf if θ = qjf

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 20 / 31

Page 27: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 4. Item Prediction

Stochastic Gradient Descent Algorithm1: procedure LearnBPR

input: DTrainS , λ, α,Σ

2: (pu)u∈U ∼ N(0,Σ)3: (qi )i∈I ∼ N(0,Σ)4: repeat5: for (u, i , j) ∈ DTrain

S do . In a random order6: for f ∈ 1, . . . , k do

7: puf ← puf + α(−e−yuij

1+e−yuij· (qif − qjf )− 2λpuf

)8: qif ← qif + α

(−e−yuij

1+e−yuij· puf − 2λqif

)9: qjf ← qjf + α

(−e−yuij

1+e−yuij· (−puf )− 2λqjf

)10: end for11: end for12: until convergence13: return P,Q14: end procedure

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 21 / 31

Page 28: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs

Outline

1. Review

2. More on factorization models2.1 Adding bias terms

3. SVD++

4. Item Prediction

5. From Recommender Systems to Graphs5.1 Recommender Systems as a link prediction problem5.2 Link Prediction Approaches

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 22 / 31

Page 29: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs

Link Prediction

1

2

3

4

y(1, 2)

y(4, 2)

y(1, 2)

y(3, 2)

y(2, 4) =?

y(3, 1) =?

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 22 / 31

Page 30: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs

Link Prediction - Formalization

Given a graph G := (V ,E ) where

I V is a set of vertices

I E ⊆ V × V is a set of edges

predict the most likely edges E ∗ * E

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 23 / 31

Page 31: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs

Link Prediction - Examples

There are a lot of applications for Link Prediction Models:

I Finding friends in social networks

I Recommender Systems

I Predicting Protein interaction

I Predicting links between web pages

I ...

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 24 / 31

Page 32: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.1 Recommender Systems as a link predictionproblem

Recommender System Graph

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3

a b j

t g o m

rat = 4rag = 2 rao = 5

rbm = 4rbg = 3rjm = 4

rjo = 3

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 25 / 31

Page 33: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.1 Recommender Systems as a link predictionproblem

Recommender Systems - Rating Prediction

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3

a b j

t g o m

rat = 4rag = 2

rao = 5 rbm = 4rbg = 3rjm = 4

rjo = 3rbo =?

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 26 / 31

Page 34: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.1 Recommender Systems as a link predictionproblem

Recommender Systems - Item Prediction

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 1 ? 1 1Bob (b) ? 1 1 ?John (j) ? 1 ? 1

a b j

t g o m

?

?

?

??

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 27 / 31

Page 35: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.2 Link Prediction Approaches

Link Prediction Approaches

Given a graph G := (V ,E ),

I Determine a scoring function s : V × V → RI The scores should reflect the likelihood that there is a link between

the two vertices

I Rank possible pairs of vertices according to their scoresI Two basic streams of approaches:

I Compute the scores from graph statisticsI Learn a scoring function from the data

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 28 / 31

Page 36: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.2 Link Prediction Approaches

Link Prediction Approaches

Be kv the degree of node v and N (v) the set of neighbors of a node:

N (v) := {u|(u, v) ∈ E ∨ (v , u) ∈ E}

the different approaches based on graph statistics:

I Common Neighbors: sCN(v , u) := |N (v) ∩N (u)|I Salton Index: sSalton(v , u) := |N (v)∩N (u)|√

kv×ku

I Jaccard Index: sJaccard(v , u) := |N (v)∩N (u)||N (v)∪N (u)|

I Adamic-Adar Index: sAA(v , u) :=∑

z∈N (v)∩N (u)1

log kz

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 29 / 31

Page 37: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.2 Link Prediction Approaches

Link Prediction Approaches - Examples

v1

v5

v2

v3 v4

Finding possible links for v4:

Common Neighbors:sCN(v , u) := |N (v) ∩N (u)|

sCN(v1, v4) = {v5, v3, v2} ∩ {v3, v5} = 2

sCN(v2, v4) = {v1, v5} ∩ {v3, v5} = 1

Salton Index: sSalton(v , u) := |N (v)∩N (u)|√kv×ku

sSalton(v1, v4) = {v5,v3,v2}∩{v3,v5}√3×2

= 0.8165

sSalton(v2, v4) = {v1,v5}∩{v3,v5}√2×2

= 0.5

Jaccard Index:sJaccard(v , u) := |N (v)∩N (u)|

|N (v)∪N (u)|

sJaccard(v1, v4) = {v5,v3,v2}∩{v3,v5}{v5,v3,v2}∪{v3,v5} = 0.6667

sJaccard(v2, v4) = {v1,v5}∩{v3,v5}{v1,v5}∪{v3,v5} = 0.3333

Adamic-Adar:sAA(v , u) :=

∑z∈N (v)∩N (u)

1log kz

sAA(v1, v4) = 1log 2 + 1

log 3 = 2.3529

sAA(v2, v4) = 1log 3 = 0.9102

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 30 / 31

Page 38: Big Data Analytics - Universität Hildesheim · Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer

Big Data Analytics 5. From Recommender Systems to Graphs 5.2 Link Prediction Approaches

Link Prediction Approaches - Learning a scoring Function

I Any item recommendation approaches could be used here.

I Factorization models: factorize the adjacency matrix of the graph

I Associate each vertex v with latent factors ϕ(v) ∈ Rk

I Scoring function:

s(u, v) = ϕ(u)>ϕ(v)

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Recommender Systems Part II 31 / 31