20
Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media Charalampos Chelmis , Viktor K Prasanna [email protected] MSM 2013, Paris, France

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Embed Size (px)

DESCRIPTION

Workshop paper part of the Modeling Social Media 2013 workshop at Hypertext 2013 conference presented in Paris, France on May 1, 2013

Citation preview

Page 1: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Charalampos Chelmis, Viktor K Prasanna [email protected]

MSM 2013, Paris, France

Page 2: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Introduction • Structure of Tripartite Graphs • Generative Models of Tripartite Graphs • Social Link Classification Schemes • Evaluation • Conclusion

Overview

2

Page 3: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Social Networking is used for Content organization Content sharing

• Multiple media types • Users' activities

Reveal interests and tastes Hidden structure

• Description of Resources Text Tags / Hashtags

• Social Annotation Collective characterization of resources Use of synonyms for similar recourses Same keywords for different recourses

Introduction

3

Page 4: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• How to address issues of synonymy and polysemy? Deal with space size explosion

• How to discover emergent structure in online tagging systems? Hidden topics

• How to capture users’ latent interests? Which subjects a user is mostly interested in? Which users have similar interests?

• How to model the process of social generation of annotations? How to capture the semantics of collaboration

• Why is this useful? Recommend people Recommend Tags / resources Clustering …

Research Questions

4

Page 5: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Set of actors (e.g. users) A={a1, ...,ak} • Set of concepts (e.g. tags) C = {c1, ..., cl} • Set of resources (e.g. photos) R ={r1, ..., rm}

Structure of Tripartite Graphs

5

Page 6: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• The User-Concept Model Users are modeled based on their tag usage φ denotes the matrix of topic distributions

− multinomial distribution over N concepts − T topics being drawn independently

θ: the matrix of user-specific mixture weights for these T topics

• Captures users’ latent interests • Ignores Resources • Ignores the social aspect of tagging

• The User-Resource Model Resources become vocabulary terms

• Tags are ignored • Ignores the social aspect of tagging

Reducing the Tripartite Graph to Bipartite Structures

6

Page 7: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Topic-based representation • Model both resources & users’ interests • Multiple users may annotate resource r

• For each tag a user is chosen uniformly at random • Each user is associated with a distribution over

latent topics ɵ • A topic is chosen from a distribution over topics

specific to that user • The tag is generated from the chosen topic

φt: probability distribution of tags for topic t

The User-Resource-Concept Model

7

Page 8: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Tag Recommendation Automatic annotation enhancement Search improvement

• Clustering Community detection Organization of resources/tags in categories

• Navigation and Visualization Social browsing

• Next we focus on recommending people

Recommendation

8

Page 9: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Classification Based on Latent Interests Measure “tastes” distance with respect to latent topics distribution Pointwise squared distance between feature vectors of users u and v Other measures to consider

− Kullback Leibler (KL) divergence − Cosine similarity

• Objective: Minimize the distance between linked users

• Focus on topical homophily Ignore network effects

• Prior work uses network proximity as indicator of link formation

Social Link Recommendation Using Latent Semantics & Network Structure

9

]v))(k,-u)(k,(,,v))(1,-u)(1,[( v)F(u, 22 ΘΘΘΘ=

F(u,v) = 0 => u,v have identical distributions

F(u,v) > 0 => distributions diverge

Page 10: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Latent Topics & Local Structure CN(u,v) = common neighbors between users u and v

− Simplicity and computational efficiency

Latent topics similarity

• Latent Topics & Global Structure SD(u,v) = shortest distance between users u and v

• Non separable training set => inefficient classifiers • Aggregation Strategy

Reduce the number of training samples Produce more efficient classifiers Average latent similarity of user pairs with k common

neighbors:

Social Link Recommendation Using Latent Semantics & Network Structure

10

v)]CN(u, v),(u,[ v)F(u, σ=

∑==

=k k : pp p

(p)|k k : p|

1 (k) avg σσ

v)]SD(u, v),(u,[ v)F(u, σ=

22 ),(),(

),(),(),(

∑∑∑

ΘΘ

ΘΘ=

tt

t

vtut

vtutvuσ

Page 11: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Objectives Ability to uncover subliminal collective knowledge Evaluate performance of “people” recommendation

• Setting 2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7

• Real-world Dataset Last.fm online music system

− social relationships − tagging information − music artist listening information

Statistics − 1,892 users − 25,434 directed user friend relations

− 17,632 artists UR Model vocabulary size − 92,834 user-listened-artist relations

− 11,946 unique tags UC and URC vocabulary size − 186,479 annotations (tuples <user, tag, artist>)

Experimental Analysis

11

Page 12: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Sample Topics

12

Page 13: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Evaluate ability to predict tags/resources on new users Perplexity

• Split dataset into two disjoint sets 90% for training

• Lower perplexity indicates better generalization

• URC better overall Exploits more information

• UC Organizes tags in “clusters”

• UR Inferior quality due to noise

Predictive Power

13

Page 14: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Split dataset into two disjoint sets 10%, 25%, 50%, 75% for training, rest for testing

• Evaluation process Randomly sample 12,716 pairs of users 50% true links, 50% negative samples Compute similarity of user pairs Sort users in decreasing order of similarity Add links between users with highest similarity

Recommendation of Social Ties

14

Page 15: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Latent Topics & Shortest Distance Aggregates all true links training similarity values in a single point Least effective

• Ensemble achieves best precision • Over fitting for training size > 50% • Recall drops as dataset size increases

Recommendation of Social Ties

15

[Latent Topics & Local Structure]

[Latent Topics]

[Ensemble]

Page 16: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• In social media number of true links << absent links • High performance for both classes

True negatives easier to classify correctly Degradation in performance for true positives

• Reasonable results for practical purposes

How about High Class Imbalance?

16

[Latent Topics & Local Structure]

[Latent Topics]

[Ensemble]

Page 17: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Baselines Cosine Similarity (CS) Maximal Information Path (MIP)

• Evaluation Criterion Area under the receiver-operating characteristic curve (AUC)

• Baselines AUC Computed over the complete dataset Biases the evaluation in favor of the baselines CS AUC = 0.6087 MIP AUC = 0.6256

• Same evaluation process as before • Compute performance lift

% change over best performing baseline Positive % denotes improvement

Comparison to Tag-based similarity metrics

17

Page 18: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Not all schemes can beat the baseline For 10% training data ≤10% AUC loss But, significant speedup due to minimal training dataset

• Latent Topics & Local Structure Scheme consistently better

Comparison to Tag-based similarity metrics

18

Training dataset size

[Latent Topics & Local Structure]

[Latent Topics]

Page 19: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Three generative models of tripartite graphs in social tagging systems

• Modeling of users’ interests in a latent space over resources and metadata

• Limitations Ignore several aspects of real-world annotation process, such as topic

correlation and user interaction

• Achieve great performance in the recommendation task Accurate predictors of social ties in conjunction with structural

evidence Proposed aggregation strategy to reduce number of training samples

• Future work Incorporate other types of resources Automatically identify most discriminative latent topics and discard

uninformative resources and metadata

Concluding Remarks

19

Page 20: Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

• Questions? [email protected]

Thank you!

20