-
Recommender Systems
♣ Applica;on areas
- 3 -
In the Social Web
- 4 -
Even more …
♣ Personalized search
♣ "Computa;onal adver;sing"
- 5 -
About the speaker
♣ Ramzi Alqrainy -‐ Ramzi is one of the well recognized experts within Ar8ficial Intelligence and
Informa8on Retrieval fields in Middle East. Ac8ve researcher and technology blogger
with the focus on informa8on retrieval.
- 6 -
Agenda ♣ What are recommender systems for? -‐ Introduc8on
♣ How do they work (Part I) ? -‐ Collabora8ve Filtering
♣ How to measure their success? -‐ Evalua8on techniques
♣ How do they work (Part II) ? -‐ Content-‐based Filtering
-‐ Knowledge-‐Based Recommenda8ons -‐ Hybridiza8on Strategies
♣ Advanced topics -‐ Explana8ons
-‐ Human decision making
- 7 -
-
Why using Recommender Systems? ♣ Value for the customer -‐ Find things that are interes8ng
-‐ Narrow down the set of choices -‐ Help me explore the space of op8ons -‐ Discover new things -‐ Entertainment -‐
…
♣ Value for the provider -‐ Addi8onal and probably unique personalized service for the customer
-‐ Increase trust and customer loyalty -‐ Increase sales, click trough rates, conversion etc. -‐ Opportuni8es for promo8on, persuasion -‐ Obtain more knowledge about customers -‐
…
- 9 -
Real-‐world check
♣ Myths from industry -‐ Amazon.com generates X percent of their sales through the recommenda8on
lists (30 < X < 70) -‐ NeVlix (DVD rental and movie streaming) generates X percent of their sales through the recommenda8on lists (30 < X < 70)
♣ There must be some value in it -‐ See recommenda8on of groups, jobs or people on LinkedIn
-‐ Friend recommenda8on and ad personaliza8on on Facebook -‐ Song recommenda8on at last.fm -‐ News recommenda8on at Forbes.com (plus 37% CTR)
♣ Academia
-‐ A few studies exist that show the effect ♣ increased sales, changes in sales behavior
- 10 -
Problem domain
♣ Recommenda;on systems (RS) help to match users with items -‐ Ease informa8on overload
-‐ Sales assistance (guidance, advisory, persuasion,…)
RS are so)ware agents that elicit the interests and preferences of individual consumers […] and make recommenda<ons accordingly. They have the poten<al to support and improve the quality of the decisions consumers make while searching for and selec<ng products online. » [Xiao & Benbasat, MISQ, 2007]
♣ Different system designs / paradigms -‐ Based on availability of exploitable data
-‐ Implicit and explicit user feedback -‐ Domain characteris8cs
- 11 -
Recommender systems
♣ RS seen as a func;on [AT05]
♣ Given:
-‐ User model (e.g. ra8ngs, preferences, demographics, situa8onal context) -‐ Items (with or without descrip8on of item characteris8cs)
♣ Find:
-‐ Relevance score. Used for ranking.
♣ Finally:
-‐ Recommend items that are assumed to be relevant
♣ But:
-‐ Remember that relevance might be context-‐dependent -‐ Characteris8cs of the list itself might be important (diversity)
- 12 -
Paradigms of recommender systems
Recommender systems reduce informa;on overload by es;ma;ng relevance
- 13 -
Paradigms of recommender systems
Personalized recommenda;ons
- 14 -
Paradigms of recommender systems
Collabora;ve: "Tell me what's popular among my peers"
- 15 -
Paradigms of recommender systems
Content-‐based: "Show me more of the same what I've liked"
- 16 -
Paradigms of recommender systems
Knowledge-‐based: "Tell me what fits based on my needs"
- 17 -
Paradigms of recommender systems
Hybrid: combina;ons of various inputs and/or composi;on of different mechanism
- 18 -
Recommender systems: basic techniques
Pros
Cons Collabora8ve
No knowledge-‐
Requires some form of ra8ng engineering effort,
feedback, cold start for new users serendipity of results,
and new items learns market segments
Content-‐based
No community required,
Content descrip8ons necessary, comparison between
cold start for new users, no items possible
surprises
Knowledge-‐based
Determinis8c recommenda8ons, assured quality, no cold-‐ start, can resemble sales dialogue
Knowledge engineering effort to bootstrap, basically sta8c, does not react to short-‐term trends
- 19 -
- 20 -
Collabora;ve Filtering (CF)
♣ The most prominent approach to generate recommenda;ons -‐ used by large, commercial e-‐commerce sites
-‐ well-‐understood, various algorithms and varia8ons exist -‐ applicable in many domains (book, movies, DVDs, ..)
♣ Approach
-‐ use the "wisdom of the crowd" to recommend items
♣ Basic assump;on and idea
-‐ Users give ra8ngs to catalog items (implicitly or explicitly) -‐ Customers who had similar tastes in the past, will have similar tastes in the future
- 21 -
User-‐based nearest-‐neighbor collabora;ve filtering (1)
♣ The basic technique: -‐ Given an "ac8ve user" (Alice) and an item I not yet seen by Alice
-‐ The goal is to es<mate Alice's ra<ng for this item, e.g., by ♣ find a set of users (peers) who liked the same items as Alice in the past and
who have rated item I ♣ use, e.g. the average of their ra8ngs to predict, if Alice will like item I ♣ do this for all items Alice has not seen and recommend the best-‐rated
Item1
Item2
Item3
Item4
Item5 Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1 - 24 -
User-‐based nearest-‐neighbor collabora;ve filtering (2)
♣ Some first ques;ons -‐ How do we measure similarity?
-‐ How many neighbors should we consider? -‐ How do we generate a predic8on from the neighbors' ra8ngs?
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 25 -
Measuring user similarity
♣ A popular similarity measure in user-‐based CF: Pearson correla;on
a, b : users ra,p
: ra8ng of user a for item p P
: set of items, rated both by a and b Possible similarity values between -‐1 and 1;
ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽, ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽
= user's average ra8ngs
Item1
Alice 5 User1 3
Item2 Item3 3 4 1 2
Item4 Item5 4 ? 3 3
sim = 0,85 sim = 0,70 sim = -‐0,79 User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 26 -
Pearson correla;on
♣ Takes differences in ra;ng behavior into account
6
5
4
Ratings 3
2
1
0 Item1 Item2
Alice
User1
User4
Item3 Item4
♣ Works well in usual domains, compared with alterna;ve measures -‐ such as cosine similarity
- 27 -
Making predic;ons
♣ A common predic8on func8on:
♣ Calculate, whether the neighbors' ra8ngs for the unseen item i are higher or lower than their average
♣ Combine the ra8ng differences -‐ use the similarity as a weight ♣ Add/subtract the neighbors' bias from the ac8ve user's average and use
this as a predic8on
- 28 -
Making recommenda;ons
♣ Making predic;ons is typically not the ul;mate goal
♣ Usual approach (in academia)
-‐ Rank items based on their predicted ra8ngs
♣ However
-‐ This might lead to the inclusion of (only) niche items -‐ In prac;ce also: Take item popularity into account
♣ Approaches
-‐ "Learning to rank" ♣ Op8mize according to a given rank evalua8on metric (see later)
- 29 -
Improving the metrics / predic;on func;on
♣ Not all neighbor ra;ngs might be equally "valuable" -‐ Agreement on commonly liked items is not so informa8ve as agreement on
controversial items -‐ Possible solu;on: Give more weight to items that have a higher variance
♣ Value of number of co-‐rated items
-‐ Use "significance weigh8ng", by e.g., linearly reducing the weight when the number of co-‐rated items is low
♣ Case amplifica;on -‐ Intui8on: Give more weight to "very similar" neighbors, i.e., where the
similarity value is close to 1.
♣ Neighborhood selec;on -‐ Use similarity threshold or fixed number of neighbors
- 30 -
Memory-‐based and model-‐based approaches
♣ User-‐based CF is said to be "memory-‐based" -‐ the ra8ng matrix is directly used to find neighbors / make predic8ons
-‐ does not scale for most real-‐world scenarios -‐ large e-‐commerce sites have tens of millions of customers and millions of items
♣ Model-‐based approaches -‐ based on an offline pre-‐processing or "model-‐learning" phase
-‐ at run-‐8me, only the learned model is used to make predic8ons -‐ models are updated / re-‐trained periodically -‐ large variety of techniques used -‐ model-‐building and upda8ng can be computa8onally expensive
- 31 -
Item-‐based collabora;ve filtering
♣ Basic idea: -‐ Use the similarity between items (and not users) to make predic8ons
♣ Example:
-‐ Look for items that are similar to Item5 -‐ Take Alice's ra8ngs for these items to predict the ra8ng for Item5
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 33 -
The cosine similarity measure
♣ Produces beber results in item-‐to-‐item filtering -‐ for some datasets, no consistent picture in literature
♣ Ra;ngs are seen as vector in n-‐dimensional space ♣ Similarity is calculated based on the angle between the vectors
♣ Adjusted cosine similarity -‐ take average user ra8ngs into account, transform the original ra8ngs
- 34 -
Pre-‐processing for item-‐based filtering
♣ Item-‐based filtering does not solve the scalability problem itself
♣ Pre-‐processing approach by Amazon.com (in 2003)
-‐ Calculate all pair-‐wise item similari8es in advance -‐ The neighborhood to be used at run-‐8me is typically rather small, because only items are taken into account which the user has rated
-‐ Item similari8es are supposed to be more stable than user similari8es
♣ Memory requirements
-‐ Up to N2 pair-‐wise similari8es to be memorized (N = number of items) in theory
-‐ In prac8ce, this is significantly lower (items with no co-‐ra8ngs) -‐ Further reduc8ons possible ♣ Minimum threshold for co-‐ra8ngs (items, which are rated at least by n users)
♣ Limit the size of the neighborhood (might affect recommenda8on accuracy)
- 35 -
More on ra;ngs
♣ Pure CF-‐based systems only rely on the ra;ng matrix
♣ Explicit ra;ngs
-‐ Most commonly used (1 to 5, 1 to 7 Likert response scales) -‐ Research topics ♣ "Op8mal" granularity of scale; indica8on that 10-‐point scale is berer accepted in
movie domain ♣ Mul8dimensional ra8ngs (mul8ple ra8ngs per movie) -‐ Challenge
♣ Users not always willing to rate many items; sparse ra8ng matrices ♣ How to s8mulate users to rate more items? ♣ Implicit ra;ngs
-‐ clicks, page views, 8me spent on some page, demo downloads … -‐ Can be used in addi8on to explicit ones; ques8on of correctness of interpreta8on
- 36 -
Data sparsity problems
♣ Cold start problem -‐ How to recommend new items? What to recommend to new users?
♣ Straigheorward approaches
-‐ Ask/force users to rate a set of items -‐ Use another method (e.g., content-‐based, demographic or simply non-‐ personalized) in the ini8al phase
♣ Alterna;ves -‐ Use berer algorithms (beyond nearest-‐neighbor approaches)
-‐ Example: ♣ In nearest-‐neighbor approaches, the set of sufficiently similar neighbors might
be to small to make good predic8ons ♣ Assume "transi8vity" of neighborhoods
- 37 -
Example algorithms for sparse datasets
♣ Recursive CF -‐ Assume there is a very close neighbor n of u who however has not rated the
target item i yet. -‐ Idea: ♣ Apply CF-‐method recursively and predict a ra8ng for item i for the neighbor
♣ Use this predicted ra8ng instead of the ra8ng of a more distant direct neighbor
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? sim = 0,85
User1
3
1
2
3
?
User2 4 3
4 3 5 Predict
User3 3 3 User4 1 5
1 5 4 ra8ng for User1
5 2 1 - 38 -
Graph-‐based methods
♣ "Spreading ac;va;on" (sketch) -‐ Idea: Use paths of lengths > 3
to recommend items -‐ Length 3: Recommend Item3 to User1 -‐ Length 5: Item1 also recommendable
- 39 -
More model-‐based approaches
♣ Plethora of different techniques proposed in the last years, e.g., -‐ Matrix factoriza8on techniques, sta8s8cs
♣ singular value decomposi8on, principal component analysis -‐ Associa8on rule mining
♣ compare: shopping basket analysis -‐ Probabilis8c models
♣ clustering models, Bayesian networks, probabilis8c Latent Seman8c Analysis -‐ Various other machine learning approaches
♣ Costs of pre-‐processing -‐ Usually not discussed
-‐ Incremental updates possible?
- 40 -
Matrix factoriza;on
• SVD:
Uk Dim1
Alice 0.47
M
k
Dim2 -‐0.30
T =U×Σ×V
k k k
T
Vk Dim1
-‐0.44 -‐0.57 0.06 0.38 0.57 Bob -‐0.44 0.23
Dim2 0.58 -‐0.66
0.26 0.18 -‐0.36 Mary 0.70 -‐0.06
Sue 0.31 0.93
Σ Dim1 Dim2
k
• Predic;on:
r
ui =
r +U (Alice)×Σ u k
T ×V (EPL)
k k
Dim1 5.63 0 Dim2 0 3.23
= 3 + 0.84 = 3.84
- 43 -
Associa;on rule mining
♣ Commonly used for shopping behavior analysis -‐ aims at detec8on of rules such as
"If a customer purchases baby-‐food then he also buys diapers in 70% of the cases"
♣ Associa;on rule mining algorithms -‐ can detect rules of the form X => Y (e.g., baby-‐food => diapers) from a set of
sales transac8ons D = {t1, t2, … tn} -‐ measure of quality: support, confidence
- 44 -
Probabilis;c methods
♣ Basic idea (simplis;c version for illustra;on): -‐ given the user/item ra8ng matrix
-‐ determine the probability that user Alice will like an item i -‐ base the recommenda8on on such these probabili8es
♣ Calcula;on of ra;ng probabili;es based on Bayes Theorem -‐ How probable is ra8ng value "1" for Item5 given Alice's previous ra8ngs?
-‐ Corresponds to condi8onal probability P(Item5=1 | X), where ♣ X = Alice's previous ra8ngs = (Item1 =1, Item2=3, Item3= … )
-‐ Can be es8mated based on Bayes' Theorem
♣ Usually more sophis;cated methods used
-‐ Clustering -‐ pLSA …
- 45 -
Summarizing recent methods
♣ Recommenda;on is concerned with learning from noisy observa;ons (x, y), where
f(x)=ŷ
2
has to be determined such that is minimal.
∑ ˆ
( ˆ -y)
♣ A variety of different learning strategies have been applied trying to es;mate f(x)
-‐ Non parametric neighborhood models -‐ MF models, SVMs, Neural Networks, Bayesian Networks,…
- 48 -
Collabora;ve Filtering Issues
♣ Pros: -‐ well-‐understood, works well in some domains, no knowledge engineering required
♣ Cons: -‐ requires user community, sparsity problems, no integra8on of other knowledge sources,
no explana8on of results
♣ What is the best CF method? -‐ In which situa8on and which domain? Inconsistent findings; always the same domains
and data sets; differences between methods are o|en very small (1/100)
♣ How to evaluate the predic;on quality? -‐ MAE / RMSE: What does an MAE of 0.7 actually mean?
-‐ Serendipity: Not yet fully understood
♣ What about mul;-‐dimensional ra;ngs?
- 49 -
- 50 -
Recommender Systems in e-‐Commerce
♣ One Recommender Systems research ques;on -‐ What should be in that list?
- 51 -
Recommender Systems in e-‐Commerce
♣ Another ques;on both in research and prac;ce -‐ How do we know that these are good
recommenda8ons?
- 52 -
Recommender Systems in e-‐Commerce
♣ This might lead to … -‐ What is a good recommenda8on?
-‐ What is a good recommenda8on strategy? -‐ What is a good recommenda8on strategy for my
business?
These have been in stock for quite a while now …
- 53 -
What is a good recommenda;on? What are the measures in prac;ce? ♣ Total sales numbers ♣ Promo;on of certain items ♣
…
♣ Click-‐through-‐rates ♣ Interac;vity on plaeorm ♣
…
♣ Customer return rates ♣ Customer sa;sfac;on and loyalty
- 54 -
You have Ques8ons and we have Answers