39
Recommender System Experiments with MyMediaLite Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask) Zeno Gantner <[email protected]> Nokia Location & Commerce, Berlin

Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Recommender System Experiments with MyMediaLite

Or: Everything you always wanted to know about offline experiments* (*but were afraid to ask)

Zeno Gantner <[email protected]>

Nokia Location & Commerce, Berlin

Page 2: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

HERE Maps by Nokia … in Berlin

● ca. 800 people● HERE Maps platform

– mobile apps● HERE Drive● HERE Maps● HERE Transit (public transport)

– customers● Yahoo Maps● Bing Maps● major car companies: BMW, VW,

Toyota, ...

Page 3: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

HERE Maps by Nokia … in Berlin

Maps Search Team● #bbuzz regulars● 3 of us contributed to

Lucene 4.3.0 ;-)

http://2011.berlinbuzzwords.de/content/improving-search-ranking-through-ab-tests-case-studyhttp://2012.berlinbuzzwords.de/sessions/efficient-scoring-lucenehttp://2012.berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoophttp://2012.berlinbuzzwords.de/sessions/relevance-optimization-check-candidate-listshttps://issues.apache.org/jira/browse/LUCENE-4930https://issues.apache.org/jira/browse/LUCENE-4571

Page 4: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

(C) Paul L. Dineen; license: CC by; source http://www.flickr.com/photos/pauldineen/4529216647/sizes/o/in/photostream/

Page 5: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)
Page 6: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

+ = ?

Page 7: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Data + Software/Algorithms = ???

(c) Joon Han, license: CC by-sa 3.0, source: http://en.wikipedia.org/wiki/File:Groundhog_day_tip_top_bistro.jpg(c) Diliff; license CC by-3.0

Real-world deployments

Page 8: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Data mining competitions

Page 9: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Research

Page 10: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

+ = ?

Page 11: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

RecSys Experiments with MyMediaLite

1. Interaction Data

2. Baseline Methods

3. Apples and Oranges

4. Metrics

5. Hyperparameter Tuning

6. Reproducibility

Page 12: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Running Example: MyMediaLite

● RecSys toolkit and evaluation framework

● written in C#/Mono● C#, Python, Ruby, F#● 2 Java ports

(RapidMiner plugin)● regular releases (every

2-3 months) since 2010

● simple● choice● free● documented● tested

http://mymedialite.net/http://github.com/zenogantner/MyMediaLite

Page 13: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Running Example: MyMediaLite

command-line tools● rating_prediction

● item_recommendation

Find all examples here:

http://github.com/zenogantner/mml-eval-examples

Page 14: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

1. Interaction Data

Explicit feedback

Not always there.

Implicit feedback● views● clicks● purchases

Often positive-only.

Page 15: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

1. Interaction Data

User ID Item ID Timestamp

196 242 881250949

186 302 891717742

22 377 878887116

244 51 880606923

... ... ...

item_recommendation --training-file=F1 --test-file=F2

IDs can be (almost) arbitrary strings

optional

Separator: whitespace,tab, comma, :: Alternative format:

yyyy-mm-dd

Page 16: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Random Splits

item_recommendation … --test-ratio=0.25

Shuffle and split:

Simple, but:● Does not take temporal trends into account.● Does not use all data for testing.

Page 17: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

k-fold Cross-Validation

item_recommendation … --cross-validation=4

Shuffle and split:

● Uses each data point for evaluation.● Does not take temporal trends into account.

Page 18: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Chronological Splits

rating_prediction … --chronological-split=0.25

rating_prediction … --chronological-split=01/01/2002

Sort chronologically and split:

● Use the past to predict the “future”.● Takes trends in the data into account.

– time of day, day of week

– season

– trending products

Page 19: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

(c) Serolillo, license: CC by 2.5

Page 20: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

2. Baseline Methods

Why compare against baselines?● Absolute numbers have no meaning.

– … well, at least here.

– Relative numbers may also have no meaning.● … if you compare to the wrong things.

Good baselines:● the strongest solution that is still simple● the existing solution● standard solutions

– coll. filtering: kNN, vanilla matrix factorization

Page 21: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

2. Baseline Methods

item_recommendation … --recommender=Random

item_recommendation … --recommender=MostPopular

item_recommendation …

--recommender=MostPopularByAttributes

--item-attributes=ARTISTS

Item recommendation baselines:● random● popular items (by attribute/category)

Page 22: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

(c) Michael Collins; license: CC by-2.0

Page 23: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

3. Apples and Oranges

Always check if you measure on the same splits.

It happens quite often …

Page 24: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

3. Apples and Oranges

Always check if you measure on the same splits.

It happens quite often … e.g. this ICML 2013 paper:

Page 25: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

3. Apples and Oranges

Page 26: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

3. Apples and Oranges● On chronological splits of the Netflix dataset,

matrix factorization (“SVD”) models usually do not perform below 0.9 RMSE.

● Chronological splits can be much harder than random splits!

Lessons:● Baselines are important – they can also help us

to “debug” experiments.● Do not compare between simple splits and

chronological splits.

Page 27: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

(c) Pastorius; license: CC by 3.0; source: http://commons.wikimedia.org/wiki/File:Plastic_tape_measure.jpg

Page 28: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

4. Metrics

What is the right metric?● Know your goal.

– It always depends on what you want to achieve.

– What to measure?

● Criticize your metrics.– They may ignore important aspects of your problem.

– They are just approximations of user behavior.

● Eyeball the results.– Your metrics may fail to catch WTF results.

http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/

Page 29: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

4. Metricsitem_recommendation ... --measures=”prec@5,NDCG”

Precision at k● number of “correct” items in the top k results● The choice of k is specific to your application.● very simple● easy to understand and explain

More ranking measures: NDCG, MAP, ERR

Page 30: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

4. MetricsPrecision at k

recommendations precision at 4

bad 0

good 1

bad 0

bad 0

bad --

good --

bad --

1/4

Page 31: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

5. Hyperparameter Tuningitem_recommendation … --recommender=WRMF

--recommender-options=”reg=0.01 alpha=2”

● Hyperparameters, e.g.– regularization to control overfitting

– learning rate (for gradient descent methods)

– stopping criterion

● You have to do it. Also for your baselines.● Don't get too fancy.

– Grid search will do it in most cases.

● More advanced:– Nelder-Mead/Simplex

– Particle swarm optimization

Page 32: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

5. Hyperparameter Tuning

rating_prediction … --search-hp

Grid search● simple● brute force● embarrassingly parallel

“A practical guide to SVM classification”

http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Page 33: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

6. Reproducible Experiments

item_recommendation … --random-seed=1

Random seed● “random” splitting● training initialization● debugging

Page 34: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

6. Reproducible Experiments

item_recommendation … --random-seed=1

Besides random seed:● Put everything in version control.

– data, software

– scripts and configuration

● Use build tools like make for automation.– Knows when to re-run your data preprocessing steps.

http://bitaesthetics.com/posts/make-for-data-scientists.html

Page 35: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

6. Reproducible Experiments

item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE”

Re-use evaluation code.

Create predictions using external software. Use MyMediaLite for evaluation.

Page 36: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

6. Reproducible Experiments

item_recommendations … --recommender=ExternalItemRecommender --recommender-options=”prediction_file=FILE”

Why re-use evaluation code?● Evaluation protocols (splitting+candidate

selection+metrics) are not easy to get right.● Ensures comparability.

– more configuration kept fixed => less risk of accidental differences

● Laziness!

Page 37: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

(c) by Caucas; license: CC by-nc-nd 2.0; source: http://www.flickr.com/photos/thecaucas/2597813380/sizes/o/

Page 38: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

Summary1. Split your data appropriately.2. Do not compare apples and oranges.3. Compare against simple and strong

baselines.4. Precision at k is a metric that is easy to

explain.5. Grid search is a simple method for

hyperparameter tuning.6. Make your experiments reproducible.7. MyMediaLite can help you with some of these

things ;-). Try it out!

Page 39: Recommender System Experiments with MyMediaLite fileHERE Maps by Nokia … in Berlin ca. 800 people HERE Maps platform – mobile apps HERE Drive HERE Maps HERE Transit (public transport)

http://github.com/zenogantner/mml-eval-exampleshttp://mymedialite.net/http://github.com/zenogantner/MyMediaLite

(c) Michael Sauers; license CC by-nc-sa 2.0