30
Personalisation and Recommendations using Drupal • Keywords: Personalisation Recommendations Scalable machine learning Predictions Similarity Data Mining Big Data Trend Spotting Clustering Drupal Developer Days Barcelona 2012.06.16

Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Embed Size (px)

DESCRIPTION

This is a presentation on making content and user recommendations using Drupal, Apache Mahout, and other machine learning technology, from Drupal DevDays Barcelona 2012.

Citation preview

Page 1: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Personalisation and Recommendations using Drupal

• Keywords:– Personalisation– Recommendations– Scalable machine learning– Predictions– Similarity– Data Mining– Big Data– Trend Spotting– Clustering

Page 2: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Kendra Initiative

• Mission– Foster an Open Distributed Marketplace for Digital

Media• EU funded– P2P-Next• http://www.p2p-next.org

– SARACEN = Socially Aware, collaboRative, scAlable Coding mEdia distributioN• http://www.saracen-p2p.eu

Page 3: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Deliverables

• Kendra Signpost– Metadata interoperability, mapping and transformation

• Smart Filters– Portable preferences and filters

• Kendra Social, Kendra Hub– Social networking management tools

• Standards work– OpenSocial extension– Social API – see Abstracting Social Networking functionality in

Drupal sprint• Kendra Match

– Searching and recommendation

Page 4: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Components

• Drupal Recommender API module• Recommender helper modules• async_command module• Apache Mahout or cloud service• Hadoop cluster (optional)

Page 5: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Industry Examples

• Amazon• Netflix• Spotify, Pandora• Facebook, LinkedIn• OKCupid• iTunes: Genius; app store - not so much

Page 6: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Machine learning

• Collaborative Filtering– AKA recommender engines

• Clustering• Classification

Page 7: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Collaborative Filtering

• Input: preference data• Output: predictions• Preference = <uid1, (nid1 or uid2), w1>– w1 = signed integer representing weight of uid1-

nid1 or uid1-uid2 correlation (affinity)

• Prediction = <uid1, (nid1or uid2), w2>– w2 = float representing strength of uid1-nid1 or

uid1-uid2 correlation

Page 8: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Enter Mahout

• Apache Mahout is a scalable machine learning library that supports large data sets.

• Launched Spring 2010• Grew from the Apache Lucene project (basis

for Apache Solr)• Merged with Taste project

Page 9: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Use Cases

• Recommendation mining• Clustering• Classification• Frequent itemset mining

Page 10: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Out-of-box algorithms• Recommendation

– User-based recommender– Item-based recommender– Slope-One recommender– Distributed Item-Based Collaborative Filtering– Collaborative Filtering using parallel matrix factorisation

• Clustering– Canopy Clustering– K-Means Clustering– Fuzzy K-Means– Mean Shift Clustering– Dirichlet Process Clustering– Latent Dirichlet Allocation– Spectral Clustering– Minhash Clustering

• Model combination– Naive Bayes algorithm

Page 11: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Hadoop

• Provides clustering capabilities• Not trivial to set up• Not yet implemented in Recommender API

(issue #1206840)

Page 12: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Recommender API

• Drupal 7 (alpha) & 6 (beta)• Can run either on same server as Apache web

server or on a remote server• Java helper program (was PHP)• Uses JDBC and Java Persistence API (JPA) • Drupal helper modules

Page 13: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Recommender API helper modules

• Browsing History Recommender• OG Similar groups module• Ubercart Products Recommender• Fivestar Recommender• Points Voting Recommender• Flag Recommender

Page 14: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Asynchronous operation

• Async_command module– Talks to Mahout– Typically run via cron

• Results are stored directly in Drupal db– Recommender tables– Via JDBC

Page 15: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Hosting Solutions

• Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons

• Recommender API Cloud Service - looking for beta testers

• Amazon Elastic MapReduce (EMR)

Page 16: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Installing Mahout

• Prerequisites:– Dedicated VM if possible– Linux, Mac OSX Leopard 10.5.6 or later, Windows

(Cygwin)– Java JDK 1.6– Maven 2.0.11 or higher (maven.apache.org)

Page 17: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Installing Mahout

• Building– Follow instructions– https://cwiki.apache.org/MAHOUT/

buildingmahout.html• Use maven to build examples

Page 18: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Installing Mahout

• Testing: Grouplens– On a single 2GHz server:• 100K ratings (1000 users, 1700 items): 9 minutes. 1M

ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit

– Using 6 concurrent 2GHz processing units:• 100K ratings (1000 users, 1700 items): 2 minutes. 1M

ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours.

Page 19: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Installing Recommender API

• See http://drupal.org/node/1207634• Configuration– sites/all/modules/async_command/

config.properties should match settings.php• Download and enable async_command• Check

/admin/config/search/recommender/admin

Page 20: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Usage

• Making recommendations– User-user– User-item– Item-item

• Predictions/similarity feeds back into Drupal• Blocks• Views

Page 21: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Case study: Data Mining and Recommendations in SARACEN

• SARACEN: http://www.saracen-p2p.eu/• Feedback loop to measure subjective quality of the

recommendations– Limited set of data, small user base– API provides an initial set of recommended videos– User can then watch a recommended video– User’s actions are incorporated into their implicit

profile, feeds back to the recommender API– Recommender API generates new predictions based on

the complete set of implicit profile metadata

Page 22: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

SARACEN: Prototype

Page 23: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Recommender data sources

• Explicit data– SARACEN account data, including location and language– Linked accounts and profiles

• e.g. Facebook user profile, “likes”, connections, metadata

• Implicit data– Activity history recorded during the user’s sessions– Searches– Shared content– Viewed content– Albums (media containers)– Content ratings

Page 24: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Scalability

• Don’t need Hadoop if– Number of users is orders of magnitude larger

than the number of items– Users browse anonymously most of the time– Few users log in and need personalised

recommendations– Item churn rate is relatively low

Page 25: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Worth Considering

• Decreased Transparency• Decreased Serendipity• Sleep deprivation

Page 26: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Resources: Recommender API

• http://drupal.org/project/recommender• http://recommenderapi.com/cloud• https://cwiki.apache.org/confluence/display/

MAHOUT

Page 27: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Resources: Mahout

• http://mahout.apache.org/• Mahout in Action– http://www.manning.com/owen/– ISBN 9781935182689.

• The Optimality of Naive Bayes, Harry Zhang.• http://aws.amazon.com/elasticmapreduce/

Page 28: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Acknowledgements

• Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN)– http://www.saracen-p2p.eu– Funded within the European Union’s Seventh

Framework Programme (FP7/2007-2013) under grant agreement 248474

Page 29: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16

Questions/comments…

• Kendra Initiative– @kendra– http://www.kendra.org.uk– https://github.com/kendrainitiative

• Klokie Grossfeld– @klokie– [email protected]– http://www.linkedin.com/in/klokie

• Daniel Harris– @dahacouk– [email protected]

Page 30: Recommendations in Drupal (Drupal DevDays Barcelona 2012)

Drupal Developer Days Barcelona 2012.06.16