Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer...

Preview:

Citation preview

Modeling Relationship Strength in Online Social Networks

Rongjing Xiang: Purdue UniversityJennifer Neville: Purdue University

Monica Rogati: LinkedInWWW 2010

Presenter: Chenghui RENSupervisors: Dr Ben Kao, Prof David Cheung

Why do we care aboutRelationship Strength?

• Various aspects of online social networks (OSNs) are based on relationship strength:– Link prediction

• Suggesting new people with top relationship strength to users

– Item recommendation• Items may be groups to join, articles to read…

– Newsfeeds• Real-time updates about status change, activities, new posts…

– People search– …

What has been done onRelationship Strength?

• Previous work analyzing OSNs has focused on binary friendship relations – E.g., friends or not

• Low cost of link formation leads to networks with different relationship strengths– E.g. close friends and acquaintances

• Treating all relationships as equal will increase the level of noise in a learned model and degrade performance.

Problem• Typically, an OSN contains:– Profiles– Interaction activities

To propose a method to infer a continuous-valued relationship strength for links based on the factors above

Roadmap

• Motivation• Latent Variable Model• Experimental Evaluation• Conclusions

Latent Variable Model: Introduction

• The homophily is common in OSNs– People tend to form ties with other people who have similar

characteristics– The stronger the tie, the higher the similarity

• Relationship strength is modeled as a hidden effect of nodal profile similarities– E.g. the schools and companies the users attended– E.g. the online groups they joined– E.g. the geographic locations that they belong to

• Relationship strength is modeled as a hidden cause of user interactions– E.g. profile viewing activities– E.g. picture tagging

Model Introduction (Cont’d)

Profile attributes

Relationship strength

User interactions

Have effect on

Cause of

Visible

Visible

Invisible

Model: Introduction (Cont’d)

Goal: Estimate z to maximize the overall observed data likelihood

Figure 1: Graphical model representation of the general relationship strength model

Model Specification

Profile attributes

Relationship strength

Affect

Visible

Invisible

First model this part

Model Specification (Cont’d)Using Gaussian distribution to model the conditional probability of z given profile similarities:

Model Specification (Cont’d)

Relationship strength

User interactions

Cause of

Visible

Invisible

Then model this part

Model Specification (Cont’d)

Using a logistic function to model the conditional probability of y given u: Figure 2: Graphical model

representation of the specific instantiation

Model Specification (Cont’d)To avoid over-fitting, L2 regularizers are put on the parameters w and θ, which can be regarded as Gaussian priors:

Model InferenceTwo ways to estimate a latent variable model

Future work

Accepted!

Model Inference (Cont’d)

Roadmap

• Motivation• Latent Variable Model• Experimental Evaluation• Conclusions

Experimental EvaluationDataset:Purdue facebook data#nodes: 4500#links: 144,712

Three profile similarity measures:

Two types of user interactions: Auxiliary variables: #people whose wall i has posted i has tagged in pictures

Experiment Evaluation (Cont’d)

• Use the proposed latent variable model to estimate the relationship strengths for the 144,712 pairs of users

How to evaluate the estimated weighted graph? Apply the estimated weighted

graph in a number of collective classification tasks.

Gender: Male? Relationship status: Single? Political views: Conservative? Religious views: Christian?

Classification Algorithm

• Gaussian Random Field Model– Autocorrelation is present in the graph– Information is propagated from the labeled

portion of the graph to infer the values for unlabeled nodes

• Vary the proportion of labeled nodes in the graph from 30% to 90%

• Measure the resulting classification rankings using area under the ROC curve

ROC curve

x-axis: False positive rate

y-axis: True positive rateThe larger the area

under the ROC curve, the higher the overall accuracy of the classification

Comparisons to Six Graphs

• Four observed graphs– Friendship graph– Top-friend graph– Wall graph– Picture graph

• Two additional graphs– Profile-similarity graph, which weights each link by – Interaction-count graph, which sums the links in

the wall

Results

Collective classification performance on various Facebook graphsCurves for the wall graph and the picture graph lie well below other curves, and are then omitted

Roadmap

• Motivation• Latent Variable Model• Experimental Evaluation• Conclusions

Conclusions

• A latent variable model was proposed to estimate relationship strength in OSNs

• The weighted graph formed by the estimated relationship strengths give rise to higher autocorrelation and better classification

• The model can facilitate many graph learning and social network mining tasks

Q&AThanks!