33
Building a Building a Recommender Recommender System System in Pyspark in Pyspark

Recommender Systems with Apache Spark's ALS Function

Embed Size (px)

Citation preview

Page 1: Recommender Systems with Apache Spark's ALS Function

Building aBuilding aRecommenderRecommenderSystemSystemin Pysparkin Pyspark

Page 2: Recommender Systems with Apache Spark's ALS Function

Will JohnsonWill Johnson- Uline- Uline- DePaul- DePaul

LearnBy Marketing.com

Page 3: Recommender Systems with Apache Spark's ALS Function

AGENDAAGENDA- RecSys- RecSys * Basics* Basics * MF* MF * Evaluation* Evaluation * Advanced* Advanced- PySpark- PySpark * Basics* Basics * ALS* ALS

Page 4: Recommender Systems with Apache Spark's ALS Function
Page 5: Recommender Systems with Apache Spark's ALS Function
Page 6: Recommender Systems with Apache Spark's ALS Function

User Based Collaborative Filtering

4.5

4.0

5.0

4.5

3.0

4.0

2.0

1.0 2.0

1.5

4.5

Page 7: Recommender Systems with Apache Spark's ALS Function

User Based Collaborative Filtering

4.5

4.0

5.0

4.5

3.0

4.0

3.8 2.0

1.0 2.0

1.5

4.5

Page 8: Recommender Systems with Apache Spark's ALS Function

Item Based Collaborative Filtering

Page 9: Recommender Systems with Apache Spark's ALS Function

Item Based Collaborative Filtering

Page 10: Recommender Systems with Apache Spark's ALS Function

Matrix Factorization

Page 11: Recommender Systems with Apache Spark's ALS Function

Matrix Factorization

Page 12: Recommender Systems with Apache Spark's ALS Function

Evaluation

RMSE = √∑ (Predicted−Actual)2

nPrecision Recall

|hitsu||RecoSet u|

|hitsu||TestSetu|

Expert Review: Novelty, Context

Page 13: Recommender Systems with Apache Spark's ALS Function
Page 14: Recommender Systems with Apache Spark's ALS Function
Page 15: Recommender Systems with Apache Spark's ALS Function
Page 16: Recommender Systems with Apache Spark's ALS Function
Page 17: Recommender Systems with Apache Spark's ALS Function

CRISP-DM

Page 18: Recommender Systems with Apache Spark's ALS Function

Data Understanding

movielens = sc.textFile("../in/ml-100k/u.data")

Page 19: Recommender Systems with Apache Spark's ALS Function

Data Understanding

movielens.first()

movielens.count() 100,000

u'196\t242\t3\t881250949'

Page 20: Recommender Systems with Apache Spark's ALS Function

Data Understanding

clean_data = movielens.map(lambda x:x.split('\t'))rate = clean_data.map(lambda y: int(y[2]))

rate.mean() 3.529863

users = clean_data.map(lambda y: int(y[0]))

users.distinct().count() 943

clean_data.map(lambda y: int(y[1])).\ distinct().count() 1,682

Page 21: Recommender Systems with Apache Spark's ALS Function

Data Preparation

from pyspark.mllib.recommendation\ import ALS, MatrixFactorizationModel, Rating

mls = movielens.map(lambda l: l.split('\t'))ratings = mls.map(lambda x:\ Rating(int(x[0]), int(x[1]), float(x[2])))

Rating(user=196, product=242, rating=3.0)

Page 22: Recommender Systems with Apache Spark's ALS Function

Data Preparation

train, test = ratings.randomSplit([0.7,0.3],7856)

train.count()

70,005

test.count()

29,995

train.cache()test.cache()

Page 23: Recommender Systems with Apache Spark's ALS Function

Modeling

rank = 5 # Latent Factors to be made

numIterations = 10 # Times to repeat process

#Create the model on the training datamodel = ALS.train(train, rank, numIterations)

Page 24: Recommender Systems with Apache Spark's ALS Function

Modeling / Evaluation

model.userFeatures()

model.productFeatures()

Page 25: Recommender Systems with Apache Spark's ALS Function

Modeling / Evaluation

# For Product X, Find N Users to Sell Tomodel.recommendUsers(242,100)

# For User Y Find N Products to Promotemodel.recommendProducts(196,10)

#Predict Single Product for Single Usermodel.predict(196, 242)

Page 26: Recommender Systems with Apache Spark's ALS Function

Modeling / Evaluation

# Predict Multi Users and Multi Products# Pre-Processingpred_input = train.map(lambda x:(x[0],x[1]))

# Lots of Predictionspred = model.predictAll(pred_input) #Returns Ratings(user, item, prediction)

(196, 242)

Rating(user=894, product=1560, rating=3.845)

Page 27: Recommender Systems with Apache Spark's ALS Function

Evaluation

User Item Actual Pred

196 242 3.0 3.91

186 302 3.0 3.29

22 377 1.0 1.09

244 51 2.0 3.66

298 474 4.0 4.11

TRAINING RMSE: 0.763

Page 28: Recommender Systems with Apache Spark's ALS Function

Evaluation

#Organize the data to make (user, product) the key)true_reorg = train.map(lambda x:((x[0],x[1]), x[2]))pred_reorg = pred.map(lambda x:((x[0],x[1]), x[2]))

#Do the actual jointrue_pred = true_reorg.join(pred_reorg)

from math import sqrtMSE = true_pred.map(lambda r: (r[1][0] - r[1][1])**2).mean()RMSE = sqrt(MSE)#Results in 0.7629908117414474

((582, 1014), (4.0, 3.397))

((196, 242), 3.0)

Page 29: Recommender Systems with Apache Spark's ALS Function

Evaluation

test_input = test.map(lambda x:(x[0],x[1])) pred_test = model.predictAll(test_input)test_reorg = test.map(lambda x:((x[0],x[1]), x[2]))pred_reorg = pred_test.map(lambda x:\ ((x[0],x[1]), x[2]))test_pred = test_reorg.join(pred_reorg)test_MSE = test_pred.map(lambda r:\ (r[1][0] - r[1][1])**2).mean()test_RMSE = sqrt(test_MSE)

TEST RMSE: 1.0145

Page 30: Recommender Systems with Apache Spark's ALS Function

CRISP-DM

Page 31: Recommender Systems with Apache Spark's ALS Function

RECAP

RecSys are Nearest Neighbors or MF Based

ALS is Implemented in Spark

Page 32: Recommender Systems with Apache Spark's ALS Function

RECAP

rank = 5; numIterations = 10;#Create the model on the training datamodel = ALS.train(train, rank, numIterations)# Lots of Predictionspred = model.predictAll(pred_input)#Examine Model Featuresmodel.productFeatures()# Save your model!model.save(sc,"../out/ml-model")

Page 33: Recommender Systems with Apache Spark's ALS Function

Questions?Questions?

LearnBy Marketing.com