Link Recommendation In P2P Social NetworksYusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy
Bilkent University, Ankara, Turkey
VLDB WOSS 2012
Outline
• Introduction• Motivation for P2P Social Networks• Link Recommendation• P2P Top-k Common Neighbor• Experiments• Discussion• Future Work
2/23
VLDB WOSS 2012
Introduction
• Social networks are mostly based on centralized infrastructure (“fat server thin client”).
• However, P2P infrastructure is a natural alternative for social networks.• Problems with centralized
infrastructure.
3/23
VLDB WOSS 2012
Problems with Centralized Systems
• Privacy: Social network providers can misuse users’ data.• Censorship: Social network provider can censor users’ shares.• Scalability: Data can be distributed over
network.• These can be avoided in P2P Social networks.
4/23
VLDB WOSS 2012
Advantages of P2P Systems
• Data can be maintained by peers, no need for another computer.
• Level of privacy can be defined according to user.
• Misuse of both linkage and user data is prevented.
• Accordingly, significant amount of research is needed for algorithms and systems of P2P Social Networks.
5/23
VLDB WOSS 2012
P2P Social Network Challenges
• Algorithm Perspective– Distributed graph algorithms– P2P Performance
• Systems Perspective– Storage– Robustness– Security
• SOWHOO: Our open source implementation» https://github.com/yusufaytas/sowhoo
6/23
VLDB WOSS 2012
Social Network Algorithms on P2P Environment
• In a P2P Social Network, peers have limited information about the network.
• Known algorithms like link prediction, community detection, information diffusion should be revisited.
• Efficiency of overlay network should be taken into account as well as algorithm accuracy.
• In this context, we propose a new approach “Link Recommendation”.
7/23
VLDB WOSS 2012
Problem Background
• Common Neighbor : A node is more likely to interact with another node if number of their shared neighbors is high.
• Top-K Query Processing: Finding k objects that have highest scores.
Id S1
a 0.9
d 0.85
e 0.83
h 0.75
. .
. .
Id S2
e 0.96
f 0.84
b 0.83
d 0.56
. .
. .
Id S1 S2a 0.9e 0.83 0.96d 0.85 0.56f 0.84b 0.83h 0.75
0.23
0.34
0.41
0.27
8/23
VLDB WOSS 2012
Problem Background
• Zhang proposed a Common Neighbor algorithm (NCNP) to predict links in a distributed graph.
• Kermarrec proposed a distributed social graph embedding algorithm (SocS) for link prediction.
• We consider P2P environment settings.• Our approach uses P2P Top-k retrieval to
enhance performance.• Scoring methods improve network overlay.
9/23
VLDB WOSS 2012
Link Recommendation
• Link recommendation: suggesting new links by considering both neighborhood information and network performance.
• To measure social information and P2P network, we use node scoring.
• We adapted Common Neighbors to distributed environment using Fagin’s and Threshold Algorithm.
10/23
VLDB WOSS 2012
Link Recommendation(Cont’d)
2
23
9
5
11/23
VLDB WOSS 2012
Node Scoring
• Node Importance• Reputation Scoring• P2P Systems Measures• Composite Measures– Trusted Centrality– Available Authority
• Our weighting strategy may suggest friendships that improve P2P Topology
12/23
VLDB WOSS 2012
Top-K Common NeighborE
A
F
D
B
C
Node A requests new Recommended Node.
Each node returns
recommended node.
Node A evaluates returned nodes and terminates if algorithm converges.
13/23
VLDB WOSS 2012
Top-K FA and TA Common Neighbor
• Top-K FA Common Neighbor algorithm stops if it receives k recommended nodes from all neighbors.– It generally results in worst case scenario.
• Top-K TA Common Neighbor algorithm stops if it has k recommended nodes greater than the threshold(approximated).– Threshold calculated at each iteration.
14/23
VLDB WOSS 2012
Setup For Experiments
• Synthetic and real data • For real data– Gnutella (6301 nodes and 20777 edges)– Wikipedia (7115 nodes and 103689 edges)
• For synthetic data, we implemented: – Uniformly distributed model,– Small world model of Watts and Strogatz,– Clustering model of Holme and Kim.
• We plan to use data from SOWHOO. 15/23
VLDB WOSS 2012
Experiments(Performance)
• We have evaluated algorithms’ efficiency as number of interactions vs. number of edges.
• An interaction/access is to retrieve recommended node information, i.e. weight and address from a peer.
• Assigned weights to network globally and locally according to power-law and uniform distribution.
• Global weights are single and do not change according to a node. Local weights are assigned by each node and differ.
16/23
VLDB WOSS 2012
Top-K TA vs. Top-K FA
17/23
VLDB WOSS 2012
Experiments (Accuracy)
• We evaluated algorithms according to recommended nodes by considering regular Common Neighbor as baseline.
• Also need to evaluate by using:– Rank of recommended nodes. – Sum of weights for recommended nodes.
• Performance measure(ω) for accuracy and efficiency trade-off:
18/23
VLDB WOSS 2012
Top-K TA vs. Top-K FA
19/23
VLDB WOSS 2012
SOWHOO
• We are building a P2P Social Network application to test our algorithms.
Super Peer
Super Peer
20/23
VLDB WOSS 2012
SOWHOO(Cont’d)
• SOWHOO has 3 layers : application layer, system layer, and network layer.
Network Layer
Application Layer
System Layer• Application Layer handles
user requests and provides user interface.
• System Layer provides mechanisms like pub/sub, notify/update and so on.
• Network layer provides messaging infrastructure between peers.
21/23
VLDB WOSS 2012
Discussion
• We presented ongoing work on Link Recommendation.
• P2P Top-K FA and TA Common Neighbors to find recommended links for a node.
• P2P Top-k TA is significantly better than P2P Top-k FA Common Neighbors in terms of efficiency.
• We also presented weighting methods and proposed combined weights.
22/23
VLDB WOSS 2012
Future Work
• We are planning to improve Top-K TA Common Neighbor algorithm to Top-K TA Common Neighbor+.
• Test our algorithms according to accuracy measures we have discussed.
• We are planning to complete implementation of SOWHOO.
• Test our algorithms on data generated by SOWHOO.
23/23