Suggesting Friends Using the Implicit Social Graph
Maayan Roth1, Assaf Ben-David1, David Deutscher2, Guy Fisher1, Ilan Horn2, Ari Leichtberg2, Naty Leiser2, Yossi Matias1, Ron Merom1
1Google, Inc., Tel Aviv, Israel 2Google, Inc., Haifa, Israel
SIGKDD 2010
2010. 11. 01.
Summarized and Presented by Kim Chung Rim, IDS Lab., Seoul National University
Copyright 2010 by CEBT
Contents
Introduction
Problem Definition
Concept Definition
Goal
Various Score Measuring Algorithms
Experiment
Applications
Don’t Forget Bob!
Got the Wrong Bob?
Conclusion & Discussion
2
Copyright 2010 by CEBT
Introduction
Group communication is prevalent
10% of e-mails are sent to more than one recipient, and 4% of e-mails are sent to 5 or more recipients.
Within enterprise domains, 40% of e-mails are sent to more than one recipient, and nearly 10% of e-mails are sent to 5 or more recipients.
User study show that they tend to communicate repeat-edly with the same groups of contacts.
3
Copyright 2010 by CEBT
Problem Definition
However, users do not take the time to create and main-tain custom contact groups.
The work of ‘creating groups manually’ is tedious and time-consuming.
Even if users create contact groups, it is likely to change dynamically over time.
4
Copyright 2010 by CEBT
Goal
The goal of this paper is to
Introduce the concept of Implicit social graph
Suggest a measurement to quantify interaction between users and contact group
Present a friend-suggestion algorithm that assists users in the creation of custom contact groups
Evaluate the friend-suggestion algorithm
Apply this novel friend-suggestion algorithm to practical use.
5
Copyright 2010 by CEBT
Concept Definition – Implicit Social Graph
A graph, where
each node is an email address
each edge has weight and direction (incoming and outgoing mail)
each edge is a set of nodes (group of contacts)
6
a directed weighted Hypergraph
v6
v5
v4
v3
v1
v2
Copyright 2010 by CEBT
Concept Definition – Egocentric Net-work
Hypergraph composed of all the edges leading into or out of a single user node
No friend-of-friend hyeperedges are considered
Each hyperedge is defined as implicit group
7
v6
v5
v4
v3
v1
v2
v4
v3
v2
Copyright 2010 by CEBT
Concept Definition – Interactions Rank
Interactions Rank
A metric to compute the weight of hyperedge
The weight has to satisfy following criteria
Frequency
– groups with frequent interactions are more important
Recency
– Interactions Rank is dynamic over time
Direction
– Interactions that the user initiates are more significant than in-teractions that the user does not initiate
8
Copyright 2010 by CEBT
Concept Definition – Interactions Rank
Interactions Rank (IR)
: the set of outgoing interactions
: the set of incoming interactions
: current time
: timestamp of an Interaction
: half-life
: relative importance of outgoing vs. incoming interac-tion
9
Copyright 2010 by CEBT
Core Routine of Friend Suggest
Returns a set of scores for contacts
10
S : a small set of contacts
G : a set of contact groups
g : a set of contacts with whom u has interactions
F : a set of scores for each contact [0,1]
Copyright 2010 by CEBT
Scoring Functions – base functions
Intersecting Group Count
Simply counts the number of groups that have intersection with the seed S and contains contact c at the same time.
Does not consider IR value of groups
11
Copyright 2010 by CEBT
Scoring Functions – base functions
Top Contact Score
Sums up all the IR values of the implicit groups containing each contact
Ignores seed and always suggests the top-ranked contacts
12
Copyright 2010 by CEBT
Scoring Functions
Intersecting Group Score
Sums up all the IR values of the implicit groups that have a non-empty intersection with the seed set and contains con-tact c at the same time
Finds all the context in which contact c exchanged emails or was a co-recipient with at least one seed group member
13
Copyright 2010 by CEBT
Scoring Functions
Intersection Weighted Score
However, more contacts in g intersect with S means higher degree of similarity
Taking this intuition into account, Intersection Weighted Score returns IR multiplied with a constant k and the size of intersection of g and S
14
Copyright 2010 by CEBT
Evaluation
Methodology
10,000 email interactions with between 3 and 25 recipients are randomly sampled
All sampled email interactions are interactions by active user
– A user who has minimum 5 implicit groups, sent at least one email within 7 days before sampled interaction
Each recipient list is a group of contacts that were implicitly clustered by the user
From that recipient list, few contact addresses are sampled and tested as seeds to see how well the rest addresses are recreated
15
Copyright 2010 by CEBT
Evaluation metric
Precision & Recall
Precision is the percent of correct suggestions out of the to-tal number of contacts suggested for each seed group
Recall is the percent of correct suggestions out of the total number of email recipients who were not already members of the seed group
16
Copyright 2010 by CEBT
Results
17
Copyright 2010 by CEBT
Applications: Don’t Forget Bob!
Don’t Forget Bob uses the Friend Suggest Algorithm
Once user has added at least two contact addresses, that user’s egocentric network is fetched from the implicit social graph
Friend Suggest generates up to 4 contacts who best ex-pands the seed set of existing contacts.
18
Copyright 2010 by CEBT
Applications: Got The Wrong Bob?
Got The Wrong Bob is implemented to fix the auto-com-pletion errors
For each contact in the current recipient list L, Wrong Bob excludes and builds a new seed set
When Friend suggest can restore , Wrong Bob stops to find a replacement
However, when cannot be restored, Wrong Bob searches for a replacement of
19
icic
ic
icic
Copyright 2010 by CEBT
Applications: Got The Wrong Bob?
20
Copyright 2010 by CEBT
Conclusion & Discussion
Introduce implicit social graph and Interactions Rank
Define Friend Suggest Algorithm
Propose two applications of the Friend Suggest algorithm
Applicable to other types of communication
21