Recommendation engines : Matching items to users

Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011

Jobin [email protected]

Who am I ?

• Architect @ Flytxt (Big Data Analytics & Automation)

• Passionate about data, distributed computing , machine learning

• Previously

•Virtualization & Cloud Lifecycle Management(BMC)

• Designed and Implemented Cloud Life Cycle Management Interface@BMC

• Large Scale Data Centre Automation(AOL)

• Implemented Centralized Data Center Management Framework for AOL

•Workflow Systems & Automation (Accenture)

• Implemented Service Management Suit for various customers

Session Agenda!

3

• Recommendation Engines – What's the big deal?

• Conceptual Overview

• Collaborative Filtering

• Engineering Challenges

• Apache Mahout

• Getting your recommender to production

• Q&A

What's the big deal?

Ooh Ads too!

Big deal?

Content

Ads

Users

Content Publishers

Ad Network

Advertisers

ML Algorithms

Recommend Best Ads

User Behavior ModellingMaximization Criteria

BTW, What was the challenge?

User Base : 2 billion+ users world wide

Content Base : 12.51 billion+ indexed pages

Advertiser Base : millions of active advertisers

Real-time nature : Responses in < 200 ms

Multi –objective optimization problem

Noisy Data

Recommendation Engines: Overview

A specific type of information filtering systemtechnique that attempts to recommend informationitems or social elements that are likely to be of interestto the user.

Technologies that can help us sift through all theavailable information to predict products or servicesthat could be interesting to us.

Applying knowledge discovery techniques to theproblem of making personalized recommendations forinformation, products or services, usually during a liveinteraction.

We need a crystal ball to predict ?

We all have opinions/tastes which we express as our likes or dislikes.

Our tastes follow some patterns.

We tend to like things which are similar to things which we already like(e.g. Songs)

We tend to like things which are liked by people who are similar to us(e.g. Movies)

From fancy research to mainstream

Collaborative Filtering

Problem : We have U users and I items in the system, a user Uk need to be recommended with a set of m items which are yet un-picked by him which he might be interested in picking up.

Solution :

Maintain a database of users’ ratings of a variety of items.

For a given user, find other similar users whose ratings strongly correlate with the current user - User Neighborhood

Recommend items rated highly by these similar users, but not rated by the current user.

E.g. Amazon, Filpkart etc

Utility Matrix

Matrix of values representing each user’s level of affinity to each item. Sparse matrix

Recommendation engine needs to predict the values for the empty cells based on available cell values

Denser the matrix, better the quality of recommendation

User | Item i1 i2 i3 i4 i5

u1 r12 r14 r15

u2 r21 r22 r25

u3 r32 r34

u4 r43 r45

Engineering Challenges

Massive Data Volume : how do I deal with TBs of raw data to build my recommendations?

Hadoop and Map-Reduce shines!

How can I make it work in ‘Real-Time’ ?

Batch pre-compute and store in HBase could help!

Will my solution scale? soon my user base is going to double!.

Sure, you can make it scale!

Engineering Challenges

Do I need a cloud based infrastructure?

Depends!

Hadoop compatible Machine Learning library?

Mahout would help!

How can I represent/transform my input data appropriately?

Pig/Hive might help!, if not ,map-reduce is always there!

Apache Mahout Overview

Scalable machine learning library

core algorithms for clustering, classification and batch based collaborative filtering implemented over Hadoop

Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA etc

Vibrant community support.

Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on)

[email protected]

mailto:[email protected]





Taking Recommendation Engines to production

Analyzing the input data, what kind of info I can collect from users

Selecting the appropriate recommender (e.g. user based, Item based )

Strategy to recommend to anonymous users(or first time users)

Strategy for distributed computing, modeling the problem as map-reduce

Choosing the deployment model

Monitoring the system

Conclusion

Very popular field of research and implementation

More and more products and services are leveraging the concept

From fancy research to live production systems at scale

Making peoples lives easier by assisting in making decisions

Some more concepts.…

Concept of similarity – distance measure etc

Pearson Correlation

User neighborhood computation


THANK YOUContact : [email protected]

http://www.flytxt.com/community/



http://www.flytxt.com/community/

Technology

Recommendation engines : Matching items to users