BehavioralDataMiningtoProduceNovelandSerendipitousFriendRecommenda;onsinaSocial
BookmarkingSystem
BarcelonaOctober21st,2015
LudovicoBora8o SalvatoreCartaMa8eoManca
Introduction [email protected]
Social Media Systems (SMS)
Internet based applications: • public or semi-public
profile; • a list of other users with
whom they share a connection (Social Network);
• view and traverse their
list of connections and those made by others within the system;
user-centered design
Allowuserstousekeywords(tags)todescribewebpagesthatareofinterestforthem
Helptoorganizeandsharetheresourceswithotherusersinthenetwork
Intheexample,177DelicioususerssavedthisbookmarkaMeritwassharedbythisuser
Introduction [email protected]
SocialBookmarkingSystems
User
Web page
Set of tags
Introduction [email protected]
Social Media Systems
user-centered design
Social Interaction Overload problem
h8p://www.personalizemedia.com/garys-social-media-count/
Introduction [email protected]
Social Interaction Overload
Social Interaction Overload
Social Recommender Systems: Information filtering in social media systems
Solution
Graph Analysis Content Mining
Introduction [email protected]
Social User Recommender Systems classification
+
State of the art
• Systems based on the analysis of social graphs (Example: “People you may know” on Facebook) • Systems that analyze the interactions of the users with the content of the system (Example: TF-IDF) [Chen et al., 2009]
• Hybrid Systems) [Hannon et al., 2010]
Introduction [email protected]
SRS - Information Filtering limitations
• Scalability issues; • Memory limitations (few features can be
exploited by graph algorithms)
• Complex algorithms in this specific domain(for example TF-IDF vectors );
• Problems to update the user preferences in order to produce up-to-date recommendations;
Graph Analysis
Content Mining
Introduction [email protected]
SRS - Information Filtering limitations
Serendipity Problem: Recommended items too similar to those already considered.
Items already considered by the user
Recommended item
Introduction [email protected]
follow
SRS - Information Filtering limitations
• Serendipity Problem • Scalability issues • Memory limitations (few features can be exploited by graph algorithms) • Complex algorithms (for example TF-IDF vectors); • Problems to update the user preferences in order to produce recommendations;
Mining of the user behavior
R1tag1
R2tag2
R3
like
R4
like
Userswithsimilarinterestsusesimilartagsandsavethesamebookmarks
Notehowuserswithsimilarinterestshavesimilartaggingbehaviors
Mineuserbehaviorandexploitthesephenomenaonalargescale,toconnectsimilarusers
Introduction [email protected]
IntuiWon
Example: 5 of the 363 users who saved the previous bookmark also saved the following one
Introduction [email protected]
Ourproposal• Friendrecommendersystemforthesocialbookmarkingdomain• Mainfeatures:
– UseofalimitedamountofinformaWon• Onlythebookmarkstaggedbyauser• Nosocialgraph
– Miningdonewithalgorithmsspecificallydesignedtooperateinasocialcontext
• Outcome:– ReducedcomputaWonalcost– ExploitthepossibilityofquicklyupdaWngtheinterestsoftheusersfrequently– ImprovedrecommendaWonaccuracy
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ Userinterest(ui)computaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ UserinterestcomputaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
Vu1 Vu2 Vu3 .. Vun
apple iPhone
apple
tablet
tj = apple
vuj =25
apple
apple
iPad tablet
Tag based user profile
• Eachuserisrepresentedbyavector
• vuiistherelaWvefrequencyofeachtagti∈T(T=setoftagsintheSBS)
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ UserinterestcomputaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
Vu1 Vu2 Vu3 .. Vun
vuj =10
!"#
$#
if the resource ri was bookmarked by u otherwise
iPhoneapple Resource r1
Resource r2
Vu1 = 1 Vu2 = 0
Resource based user profile
Each user is represented by a binary vector
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ UserinterestcomputaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
vu1 vu2 vu3 vu4 … vun
User u
vm1 vm2 vm3 vm4 … vmn
User m
ts(u,m) =(vui − vu )(vmi − vm )i⊂Tum
∑(vui − vu )
2
i⊂Tum∑ (vmi − vm )
2
i⊂Tum∑
Tag-based user profiles
Pearson’s correlation
UserSimilarityComputaWonStep3–Tag-basedsimilaritycomputaWon
Introduction [email protected]
UserSimilarityComputaWonStep3–Tag-basedsimilaritycomputaWon
• the average value has a strong influence (smallchanges do not influence the coefficient), so it canbecomputedoffline
• Anefficientalgorithm thatexploitsa support-basedupperboundhasbeendeveloped[Xiongetal.,KDD04]
ts(u,m) =(vui − vu )(vmi − vm )i⊂Tum
∑(vui − vu )
2
i⊂Tum∑ (vmi − vm )
2
i⊂Tum∑
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ UserinterestcomputaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
UserinterestComputaWonStep4–Resource-basedsimilaritycomputaWon
percentageofcommonbookmarksUserinterest(ui)towardsanotheruser
ui=%commonbookmarks!ui(u,m)≠ui(m,u)¤ ui(u,m)=(2/4)*100=50%¤ ui(m,u)=(2/2)*100=100%
User u User m b1
b2
b3
b4
b1
b2
Algorithm [email protected]
Algorithm
① Tag-baseduserprofiling
② Resource-baseduserprofiling;
③ Tag-basedsimilaritycomputaWon;
④ UserinterestcomputaWon;
⑤ RecommendaWonsSelecWon;
Algorithm [email protected]
Step5–RecommendaWonsselecWon
User User ts ui1 ui2
u1 u2 ts12 ui12 ui21
u1 … … … …
u1 un ts1n ui1n uin1u2 u3 ts23 ui23 ui32
… … … … …
Similarities between each pair of users
CS(ui) = set of users to recommend to ui
everyuj|ts(ui,uj)>αAND
(ui(ui,uj)>βORui(uj,ui)>β)
¨ CombinethepreviouslycomputedsimilariWesinaquickandefficientway¤ ArethesimilariWesaboveathreshold?¤ YesèRanktheuserspersimilarityandrecommend
Experimental Framework [email protected]
Dataset
¨ DeliciousdatasetdistributedfortheHetRec2011workshop
1867Users
69226URLs
7668bi-direcWonaluserrelaWons
53388tags
437593tagassignments[user,tag,URL]
104799bookmarks[user,URL]
¨ Preprocessing:removedusersthatusedlessthan5tagsandlessthen5URLs.
Experimental Framework [email protected]
Behavioral analysis
• the behavior of two users in a social bookmarking system is related both by the use of the tags and by the use of the resources;
• the use of tags represents a stronger form of connection with respect to the amount of common resources between two users.
Tags
apple pizza mac tech nature
tablet screen
wine
apple
pasta mac car water smart screen
book
resources
R1 R2 R5 R6 R9 R9 R12 R13
R1
R3 R4 R7 R8 R10 R11 R14
Experimental Framework [email protected]
Experiments ¨ EvaluaWonoftheaccuracy
¤ Comparisonwithastateoftheartapproach¤ Howmanyrecommenda6onsarecorrect?
¨ Evalua6onofthepercentageofsa6sfiedusers¨ Evalua6onofthenoveltyandserendipity
¤ Howmanyresourcesoftherecommendedusersarenovelorserendipitous?
User u User m
Experimental Framework [email protected]
Experiments Evaluation of the accuracy
precision = true positivetrue positive+ false positive
True positive (Correct
recommendations)
¨ Precision: ratio of correct recommendations among all recommendations
Experimental Framework [email protected]
Experiments:metricsEvalua6onofthesa6sfiedusers
PercentageofsaWsfiedusers:subsetofuserswhoreceivedacorrectrecommendaWonwithrespecttothesetofthosewhoreceivedarecommendaWon
¨ %saWsfiedusers=#Y/#X*100
X = set of users who received a
recommendation
Y = subset of users who received a
correct recommendation
Experimental Framework [email protected]
Experiments Evaluation of the novelty and serendipity
novelty = #∪N(ut )#∪R(ut )
∀ut ∈ Y
R(ui) = Resources of the users ui recommended to ut
N(ut) = R(ui) \ R(ut)
R(ut) =Resources of the target user ut
¨ Novelty: How many recommended items were unknown for the target user that receives the recommendations;
Y is the subset of users for which a correct recommendation was produced
Experimental Framework [email protected]
Experiments Evaluation of the novelty and serendipity
serendipity = #∪B(ut )#∪R(ut )
∀ut ∈ Y
R(ui) = Resources of the users ui recommended to ut
N(ut) = R(ui) \ R(ut)
R(ut) =Resources of the target user ut
¨ Serendipity: How surprising the successful recommendations are
Y is the subset of users for which a correct recommendation was produced
B(ut)
Serendipitous resources B(ut) = all resources ri such
that ∀rt ∈ R(ut), sim(ri, rt) < 0.5
Highlydissimilar
Experimental Framework [email protected]
Strategy
Everyuj|tb(ui,uj)>αAND
(ui(ui,uj)>βORui(uj,ui)>β)
α= 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
β= 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Experimentsrepeatedwithdifferentvaluesofαandβ
Results [email protected]
Results Evaluation of the accuracy
• HighersimilariWesleadstomoreaccuraterecommendaWons
• Whenusershavesimilartaggingbehaviors(ts>0.4)precisionisalmost1,evenwhenuserinterestislow(ui>0.1)
• Thisisconfirmedbytheexperimentinwhichweavoidconsideringthetag-basedsimilarity(ts=0),whereprecisiondrops
¨ Precisionvalueswithrespecttotheuserinterestui¤ Alineforeachvalueofthetag-
basedsimilarityts
Results [email protected]
ResultsEvalua6onoftherecommenda6ons
• N.B:nocoupleofusershasauserinterestui>0.5– Usershavealmost50%ofbookmarksnotsavedbyanyotheruser
– SpacetoproduceserendipitousfriendrecommendaWons(recommendaWonoffriendswithnovelandunexpectedbookmarks)
¨ Precisionvalueswithrespecttouserinterestui
¨ Alineforeachvalueofthetag-basedsimilarityts
Results [email protected]
ResultsEvalua6onoftherecommenda6ons
• Thelineinwhichuserinterestisnotconsidered(ui=0)leadstoalowprecision
• ThismeansthattheinterestofausertowardsanotheruserhasaninfluenceintheaccuracyoftherecommendaWons
• EvenasmalluserinterestleadstoveryaccuraterecommendaWons
¨ Precisionvalueswithrespecttotag-basedsimilarityts
¨ Alineforeachvalueoftheuserinterestui
Results [email protected]
ResultsEvalua6onofthesa6sfiedusers
• Dividedtheprecisionrangeinto0.1intervals
• SelectedthevaluethatleadstothehighestpercentageofsaWsfiedusers
¨ Whenmovingfroma0.53toa0.65precision(0.08improvement),theamountofsaWsfiedusersincreasesofalmost20%
¨ Samehappensinthe0.75–0.8range
¨ Theseresultsareextremelyusefulinthedesignofasystem
¨ Takehomemessage:withasmallimprovementintermsofprecisionyoucanstronglyimproveusersaWsfacWon
Results [email protected]
Results Evaluation of the novelty and serendipity
Interval Precision Novelty Serendipity
[0,0–0,1) 0,03 0,96 0,92
[0,1–0,2) 0,12 0,93 0,81
[0,2–0,3) - - -
[0,3–0,4) 0,36 0,90 0,65
[0,4–0,5) - - -
[0,5–0,6) 0,53 0,89 0,54
[0,6–0,7) 0,65 0,83 0,69
[0,7–0,8) 0,75 0,74 0,59
[0,8–0,9) 0,88 0,79 0,61
[0,9–1) 0,97 0,79 0,53
[1] 1,00 0,67 0,47
¨ Novelty and Serendipity
Conclusions [email protected]
ConclusionsandfutureworkAfriendrecommendersystemforthesocialbookmarkingdomaintolinkuserswithsimilarinterests
reduceduseoftheavailableinformaWonnouseofcomplexalgorithms
Results:HighprecisionevenusingonlythetagsandthebookmarksusedbyusersUserswithsimilartaggingbehaviorsaregoodcandidatesforfriendrecommendaWonsEvenasmallpercentageofsharedbookmarks(userinterest)leadstoveryaccuraterecommendaWons
Futurework:Extendthestudywithotherdatasets
Bibliography [email protected]
Bibliography
• [Facebook]F.RaWu,“Facebook:peopleyoumayknow,”May2008.[Online].Available:h8ps://blog.facebook.com/blog.php?post=15610312130
• [TwiVer]J.L.PankajGupta,AshishGoel,A.Sharma,D.Wang,andR.Zadeh,“W�:Thewhotofollowserviceattwi8er,”inProceedingsofwww2013Conference,2013.
• [Chenetal.,2009]J.Chen,W.Geyer,C.Dugan,M.Muller,andI.Guy,“Makenewfriends,butkeeptheold:recommendingpeopleonsocialnetworkingsites,”inProceedingsoftheSIGCHIConferenceonHumanFactorsinCompu6ngSystems,2009.
• [Hannonetal.2010]J.Hannon,M.Benne8,andB.Smyth,“Recommendingtwi8eruserstofollowusingcontentandcollaboraWvefilteringapproaches,”inProceedingsofthefourthACMconferenceonRecommendersystems,2010.
• [Saltonetal.,1975]G.Salton,A.Wong,andC.S.Yang.“Avectorspace• modelforautomaWcindexing”.Commun.ACM,1975.• [Xiongetal.,KDD04]Xiong,H.,Shekhar,S.,Tan,P.N.,&Kumar,V.(2004,August).
ExploiWngasupport-basedupperboundofPearson'scorrelaWoncoefficientforefficientlyidenWfyingstronglycorrelatedpairs.InProceedingsofthetenthACMSIGKDDinterna6onalconferenceonKnowledgediscoveryanddatamining.ACM.
Ques6ons?
BarcelonaOctober21st,2015
Thankyouforyoura8enWon!
[Published on Information System Frontiers journal (Springer US)]
BehavioralDataMiningtoProduceNovelandSerendipitousFriendRecommenda6onsinaSocialBookmarkingSystem