New Directions in Mahout's Recommenders

New Directions in Mahout’s RecommendersSebastian Schelter, Apache Software FoundationRecommender Systems Get-together Berlin

New

Directions

inM

ahout’sRecom

menders

2/28

New Directions?

Mahout in Action is the prime source ofinformation for using Mahout in practice.

As it is more than two years old, itis missing a lot of recent developments.

This talk describes what has been added to the recommendersof Mahout since then.

Single machine recommenders

New

Directions

inM

ahout’sRecom

menders

4/28

MyMedialite, scientific library of recom-mender system algorithms

Mahout now features a couple of popular latent factor models,mostly ported by Zeno Gantner.

New

Directions

inM

ahout’sRecom

menders

5/28

New recommenders and factorizers

BiasedItemBasedRecommender, item-based kNN withuser-item-bias estimationKoren: Factor in the Neighbors: Scalable and Accurate Collaborative Filtering, TKDD ’09

RatingSGDFactorizer, biased matrix factorizationKoren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09

SVDPlusPlusFactorizer, SVD++Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08

ALSWRFactorizer, matrix factorization using AlternatingLeast SquaresZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08

Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08

New

Directions

inM

ahout’sRecom

menders

6/28

Batch Item-Similarities on a single machine

Simple but powerful way to deploy Mahout: Use item-basedcollaborative filtering with periodically precomputed itemsimilarities.

Mahout now supports multithreaded item similaritycomputation on a single machine for data sizes that don’trequire a Hadoop-based solution.

DataModel dataModel = new FileDataModel(new File(”movielens.csv”));ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel));ItemBasedRecommender recommender =

new GenericItemBasedRecommender(dataModel, similarity);BatchItemSimilarities batch =

new MultithreadedBatchItemSimilarities(recommender, k);batch.computeItemSimilarities(numThreads, maxDurationInHours,

new FileSimilarItemsWriter(resultFile));

Parallel processing

New

Directions

inM

ahout’sRecom

menders

8/28

Collaborative Filtering

idea: infer recommendations from patterns found in thehistorical user-item interactions

data can be explicit feedback (ratings) or implicit feedback(clicks, pageviews), represented in the interaction matrix A

item1 · · · item3 · · ·

user1 3 · · · 4 · · ·user2 − · · · 4 · · ·user3 5 · · · 1 · · ·· · · · · · · · · · · · · · ·

row ai denotes the interaction history of user i

we target use cases with millions of users and hundreds ofmillions of interactions

New

Directions

inM

ahout’sRecom

menders

9/28

MapReduce

I paradigm for data-intensive parallel processingI data is partitioned in a distributed file systemI computation is moved to dataI system handles distribution, execution, scheduling, failuresI fixed processing pipeline where user specifies two

functionsmap : (k1, v1)→ list(k2, v2)

reduce : (k2, list(v2))→ list(v2)

DFS

Input

Input

Input

map

map

map

reduce

reduce

DFS

Output

Output

shu

ffle

Scalable neighborhood methods

New

Directions

inM

ahout’sRecom

menders

11/28

Neighborhood MethodsItem-Based Collaborative Filtering is one of the mostdeployed CF algorithms, because:

I simple and intuitively understandableI additionally gives non-personalized, per-item

recommendations (people who like X might also like Y)I recommendations for new users without model retrainingI comprehensible explanations (we recommend Y because

you liked X)

New

Directions

inM

ahout’sRecom

menders

12/28

Cooccurrences

start with a simplified view:imagine interaction matrix A wasbinary

→ we look at cooccurrences only

item similarity computation becomes matrix multiplication

ri = (A>A) ai

scale-out of the item-based approach reduces to finding anefficient way to compute the item similarity matrix

S = A>A

New

Directions

inM

ahout’sRecom

menders

13/28

Parallelizing S = A>A

standard approach of computing item cooccurrences requiresrandom access to both users and items

foreach item f doforeach user i who interacted with f do

foreach item j that i also interacted with doSfj = Sfj + 1

→ not efficiently parallelizable on partitioned data

row outer product formulation of matrix multiplication isefficiently parallelizable on a row-partitioned A

S = A>A =∑i∈A

aia>i

mappers compute the outer products of rows of A, emit theresults row-wise, reducers sum these up to form S

New

Directions

inM

ahout’sRecom

menders

14/28

Parallel similarity computationreal datasets not binary and we want to use a variety ofsimilarity measures, e.g. Pearson correlation

express similarity measures by 3 canonical functions, whichcan be efficiently embedded into the computation (cf.,VectorSimilarityMeasure)

I preprocess adjusts an item rating vector

f = preprocess( f ) j = preprocess( j )

I norm computes a single number from the adjusted vector

nf = norm( f ) nj = norm( j )

I similarity computes the similarity of two vectors from thenorms and their dot product

Sfj = similarity( dotfj , nf , nj )

New

Directions

inM

ahout’sRecom

menders

15/28

Example: Jaccard coefficient

I preprocess binarizes the rating vectors

if =

3−5

j =

441

f = bin(f ) =

101

j = bin(j) =

111

I norm computes the number of users that rated each item

nf = ‖ f ‖1 = 2 nj = ‖ j ‖1 = 3

I similarity finally computes the jaccard coefficient fromthe norms and the dot product of the vectors

jaccard(f , j) = |f ∩ j ||f ∪ j | =

dotfjnf + nj − dotfj

=2

2 + 3− 2 =23

New

Directions

inM

ahout’sRecom

menders

16/28

Implementation in Mahout

o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJobcomputes the top-k pairwise similarities for each row of amatrix using some similarity measure

o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJobcomputes the top-k similar items per item usingRowSimilarityJob

o.a.m.cf.taste.hadoop.item.RecommenderJobcomputes recommendations and similar items usingRowSimilarityJob

New

Directions

inM

ahout’sRecom

menders

17/28

MapReduce pass 1

I data partitioned by items (row-partitioned A>)I invokes preprocess and norm for each item vectorI transposes input to form A

reduce shuffle combine map

1----

1----

-1---

21---

1,

2,

2,

-1,

--1--

--1--

--1--

---1-

0,

1,

2,

1,

---1-

----1

--321

2,

0,

-1,

1----

11---

21---

1,

2,

-1,

--11-

11---

--11-

,

2, ,

--1-1

1----

0,

1,

21---

--321

-1, ,

0 1 2 3 4

0 - - 1 - 1

1 1 - 1 1 -

2 1 1 1 1 -

binarized A pointing from users to items

AT pointing from items to users

21321

item „norms“

0 1 2

0 - 1 2

1 - - 1

2 3 1 5

3 - 2 4

4 1 - -

--1-1

--11-

--11-

0,

1,

2,

--321 -1,

New

Directions

inM

ahout’sRecom

menders

18/28

MapReduce pass 2

I data partitioned by users (row-partitioned A)I computes dot products of columnsI loads norms and invokes similarityI implementation contains several optimizations

(sparsification, exploit symmetry and thresholds)

reduce shuffle combine map

0 1 2 3 4

0 - - 1 - 1

1 1 - 1 1 -

2 1 1 1 1 -

-122-

--11-

---2-

0,

1,

2,

binarized A

----1 2,

--11-

---1-

-111-

--11-

0,

2,

0,

1,

---1- 2,

----1 2,

-122-

--11-

0,

1,

---2-

, ----1 2,

0 1 2 3 4

0 - 1 2 2 3 1 -

1 - - 1 3 1 2 -

2 - - - 2 3 1 3

3 - - - - -

4 - - - - -

“ATA“ holding item similarities

21321

item „norms“

New

Directions

inM

ahout’sRecom

menders

19/28

Cost of the algorithm

major cost in our algorithm is the communication in thesecond MapReduce pass: for each user, we have to process thesquare of the number of his interactions

S =∑i∈A

aia>i

→ cost is dominated by the densest rows of A(the users with the highest number of interactions)

distribution of interactions per user is usually heavy tailed→ small number of power users with an unproportionallyhigh amount of interactions drastically increase the runtime

I if a user has more than p interactions, only use a randomsample of size p of his interactions

I saw negligible effect on prediction quality for moderate p

New

Directions

inM

ahout’sRecom

menders

20/28

Scalable Neighborhood Methods: Experiments

Setup

I 26 machines running Java 7 and Hadoop 1.0.4I two 4-core Opteron CPUs, 32 GB memory and four 1 TB

disk drives per machine

Results

Yahoo Songs dataset (700M datapoints, 1.8M users, 136Kitems), 26 machines, similarity computation takes less than 40minutes

Scalable matrix factorization

New

Directions

inM

ahout’sRecom

menders

22/28

Latent factor models: idea

interactions are deeply influenced by a set of factors that arevery specific to the domain (e.g. amount of action orcomplexity of characters in movies)

these factors are in general not obvious, we might be able tothink of some of them but it’s hard to estimate their impacton the interactions

need to infer those so called latent factors from theinteraction data

New

Directions

inM

ahout’sRecom

menders

23/28

low-rank matrix factorization

approximately factor A into the product of two rank r featurematrices U and M such that A ≈ UM.

U models the latent features of the users, M models the latentfeatures of the items

dot product u>i mj in the latent feature space predicts strength

of interactions between user i and item j

to obtain a factorization, minimize regularized squared errorover the observed interactions, e.g.:

minU,M

∑(i ,j)∈A

(aij − u>i mj)

2 + λ

∑i

nui

∥∥∥ui∥∥∥2

+∑

jnmj

∥∥∥mj∥∥∥2

New

Directions

inM

ahout’sRecom

menders

24/28

Alternating Least Squares

ALS rotates between fixing U and M. When U is fixed, thesystem recomputes M by solving a least-squares problem peritem, and vice versa.

easy to parallelize, as all users (and vice versa, items) can berecomputed independently

additionally, ALS is able to solve non-sparse models fromimplicit data

≈ ×

Au × i

Uu × k

Mk × i

New

Directions

inM

ahout’sRecom

menders

25/28

Implementation in Mahout

o.a.m.cf.taste.hadoop.als.ParallelALSFactorizationJobcomputes a factorization using Alternating Least Squares, hasdifferent solvers for explicit and implicit dataZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08

Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08

o.a.m.cf.taste.hadoop.als.FactorizationEvaluator computesthe prediction error of a factorization on a test set

o.a.m.cf.taste.hadoop.als.RecommenderJob computesrecommendations from a factorization

New

Directions

inM

ahout’sRecom

menders

26/28

Scalable Matrix Factorization: ImplementationRecompute user feature matrix U using a broadcast-join:

1. Run a map-only job using multithreaded mappers2. load item-feature matrix M into memory from HDFS to

share it among the individual mappers3. mappers read the interaction histories of the users4. multithreaded: solve a least squares problem per user to

recompute its feature vector

user histories A user features U

item features M

MapHash-Join + Re-computation

local fw

dlo

cal fwd

local fw

d



broadcast

mac

hin

e 1

mac

hin

e 2

mac

hin

e 3

New

Directions

inM

ahout’sRecom

menders

27/28

Scalable Matrix Factorization: Experiments

Setup

I 26 machines running Java 7 and Hadoop 1.0.4I two 4-core Opteron CPUs, 32 GB memory and four 1 TB

disk drives per machineI configured Hadoop to reuse JVMs, ran multithreaded

mappers

Results

Yahoo Songs dataset (700M datapoints), 26 machines, singleiteration (two map-only jobs) takes less than 2 minutes

Thanks for listening!Follow me on twitter at http://twitter.com/sscdotopen

Join Mahout’s mailinglists at http://s.apache.org/mahout-lists

picture on slide 3 by Tim Abott, http://www.flickr.com/photos/theabbott/picture on slide 21 by Crimson Diabolics, http://crimsondiabolics.deviantart.com/

http://twitter.com/sscdotopen

http://s.apache.org/mahout-lists

Education

New Directions in Mahout's Recommenders