Upload
conor-duke
View
154
Download
1
Embed Size (px)
Citation preview
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Fabrikatyr AnalyticsUncover tangible truths amidst the noise of modern media
Recommendation service using Factorisation MachinesPyCon Dublin - 2016@Conr
@fabrikatyr
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Agenda The Problem Factorisation machines as a method Getting the data right Modelling and Deployment
Further Research
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Business ProblemIncrease User Engagement by displaying content which is both personalised and interesting
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Content can be User Generated or Taken from 3rd party content provider;
Users Communities
A user can be in MANY communities
Behaviours are consistent across communities,
Content consumption is not
Interesting Content will generate● Likes● Comments● Share
Content can be ‘EverGreen’
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Optimisation Problem
5
General recommender
Speed & Scalability
Sparse data set
How to select an method to deploy which can answer the challenges?
Accuracy is important, but the goal is to generate recommendations which are consumed
The system needs to respond quickly to trends and topics across communities
Lot of ‘hidden’ behavioursMeasurably engagement is Low so the data set is very sparse
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machines appeared to be the method which answered the challenge
Factorisation Machines General accuracy Quick Designed for it
Accuracy Speed Sparsity
Collaborative Filter Too Accurate Suitable Suitable
Support Vector Machines Too Accurate Suitable Unsuitable
Random Forest / CART General Accuracy Unsuitable Unsuitable
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Factorisation machines as a method
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine - The Equation
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine - The Equation
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine - The Equation
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Limits of Factorisation Machines Need to understand your features as the model Not good with ‘dense’ data with binary outcomes Relatively newer method, but supported by most languages
General model, so predictions are also general
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Getting the data right
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines 13
The data from the systems needs to be examined and structured before executing the model
3 groups of information
Users
Content
Context
Time was NOT a feature
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Important
Unimportant
● Is the user and Admin / Moderator● Has the User ‘logged-in’
Not all USER behaviours are important when using a generalised model
● User behaviour● Engagement● Count of Community membership
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Engagement
Keywords
● Did the user ‘Like’ the content● Did the user ‘comment’ on the content● Did the user ‘share’ the content
Content needs to be given ‘Context’ to be worked with effectively
● Which keywords does the content have?
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
The general behaviour is that a set of users and content generate most of the activity
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Final result was a ‘wide’ dataset per user event with many columns● Each time a user either saw content or it engaged
with it a row must be added to the data set
● Keywords, likes, etc. all receive a 1 or a 0 for ALL the events
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Modelling and Deployment
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
We used an application which could deploy our python model at scale
Turi Predictive Services supports model predictions, hosting and managing machine learning models as low-latency RESTful services. Turi was acquired by Apple Inc. for $200 millDomino Data labs is an alternative
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
sFrame versus Pandas - works for Factorisation MachinesSFrame is an scalable, out-of-core dataframe, whichAllows you to work with datasets that are larger than the amount of RAM on your system.
Similar to Spark RDD
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Content store
Solution Architecture
Guest data
Factorisation Machinemodel
Scoring Engine&
Recommendations
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Machine Learning Service
Content store
URL’s
Guest behaviour
Balance between online / offline calcuations
Guest data
Factorisation Machinemodel
Content consumption
Guest classification
Scoring Engine&
Recommendations Content URL
C# content server
(Offline/batch)
(Online)
Model weights
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Further analysis Injecting content into the model Consumption tracking using A/B testing Presentation Bias - does rank affect consumption
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Fabrikatyr AnalyticsUncover tangible truths amidst the noise of modern media
Thank you - Any Questions?PyCon Dublin - 2016@Conr
@fabrikatyr
Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
References
● http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
● https://github.com/ibayer/fastFM ● www.libfm.org● https://github.com/zhengruifeng/spark-libFM ● https://github.com/scikit-learn-contrib/polylearn