Presented by: Maria Stylianou Coworker: Anis Uddin
Supervisor: Šarūnas Girdzijauskas
KTH - Royal Institute of TechnologyImplementation of Distributed Systems
December 6th, 2012
Scaling Online Social Networks (OSNs)
2
Outline
● Motivation● Current Algorithms
– SPAR
– JA-BE-JA
● Contributions– Challenges
– Solution
● Evaluation & Conclusions
3
Outline
● Motivation● Current Algorithms
– SPAR
– JA-BE-JA
● Contributions– Challenges
– Solution
● Evaluation & Conclusions
4
“Pandora's box”Online Social Networks
Motivation-Algorithms-Contribution-Evaluation
Source: http://technorati.com/social-media/article/social-networks-theyre-what-every-local/
5
Easy to maintain...Online Social Networks
Motivation-Algorithms-Contribution-Evaluation
Source: http://mastersofmedia.hum.uva.nl/2009/09/14/a-review-of-taken-out-of-context/
6
...or not!Online Social Networks
Motivation-Algorithms-Contribution-Evaluation
Source: http://mastersofmedia.hum.uva.nl/2009/09/14/a-review-of-taken-out-of-context/
7
Scaling Approaches
Vertical Scaling● Full Replication
● Data Locality
● But:
– Expensive
– Saturation
Motivation-Algorithms-Contribution-Evaluation
Horizontal Scaling● Adding servers
● Clean & Disjoint Partitions
● But:
– Not applicable in OSNs
8
Scaling Approaches
Vertical Scaling● Full Replication
● Data Locality
● But:
– Expensive
– Saturation
Motivation-Algorithms-Contribution-Evaluation
Inefficient
Horizontal Scaling● Adding servers
● Clean & Disjoint Partitions
● But:
– Not applicable in OSNs
9
Existing 'Solutions' for OSNs
Relational Databases
Motivation-Algorithms-Contribution-Evaluation
Key-Value Stores
10
Existing 'Solutions' for OSNs
Relational Databases
Motivation-Algorithms-Contribution-Evaluation
Inefficient
Key-Value Stores
11
Outline
● Motivation● Current Algorithms
– SPAR
– JA-BE-JA
● Contributions– Challenges
– Solutions
● Evaluation & Conclusions
12
SPAR
Social Partitioning & Replication middle-ware● Transparent OSN scalability avoids ● Data Locality performance● Load Balancing bottlenecks
● Fault Tolerance● Stability● Replication Overhead Minimization
Motivation-Algorithms-Contribution-Evaluation
13
SPAR
Events● Nodes – Add/Remove● Edges – Add/Remove● Servers – Add/Remove
Motivation-Algorithms-Contribution-Evaluation
14
SPAR Algorithm
Motivation-Algorithms-Contribution-Evaluation
2
3
4
1
M1
M3
M2
5
5'
5
6'
1'
5'
6
Create Edge (1,6)
Master Node
Replica Node
15
SPAR Algorithm
Motivation-Algorithms-Contribution-Evaluation
2
3
4
1
M1
M3
M2
5
5'
5
6'
1'
5'
6
Create Edge (1,6)
C1: Create 6' in M1 Create 1' in M3
Master Node
Replica Node
6'
1'
16
SPAR Algorithm
Motivation-Algorithms-Contribution-Evaluation
2
3
4
1M1
M3
M2
5
6'
1'
5'
6
Create Edge (1,6)
C2: Move 1 to M3
Master Node
Replica Node
1'
4'
3'
2'
17
SPAR Algorithm
Motivation-Algorithms-Contribution-Evaluation
2
3
4
1
M1
M3
M2
5
5'
5
6'
1'
6
Create Edge (1,6)
C3: Move 6 to M1
Master Node
Replica Node
18
JA-BE-JA
● Distributed Partitioning Algorithm● K-way Partitioning● Load Balancing● Gossip Learning
Motivation-Algorithms-Contribution-Evaluation
19
JA-BE-JA - Policies
● Sampling– Local
● Select neighbors
– Random● Select from random
walk
– Hybrid● Local & Random
Motivation-Algorithms-Contribution-Evaluation
● Swapping– Energy Function
● Reach minimum
– Simulated Annealing● Escape from local
optima
Source: http://socialnetworking.lovetoknow.com/Growth_of_Online_Social_Networking_in_Business
20
Outline
● Motivation● Current Algorithms
– SPAR
– JA-BE-JA
● Contributions– Challenges
– Solution
● Evaluation & Conclusions
21
Challenges
Motivation-Algorithms-Contribution-Evaluation
SPAR
Global View requirement
Replication Overhead
Partition Manager→ Single Point of Failure
SPAR
22
Our Solution
Motivation-Algorithms-Contribution-Evaluation
SPAR&
JA-BE-JA
Global View requirement
Replication Overhead
Partition Manager→ Single Point of Failure Local View
Distributed PartitionManager
23
Our Solution (wait for it...)
Motivation-Algorithms-Contribution-Evaluation
SPAR Client Requests
Data StoreServers
24
Our Solution
Motivation-Algorithms-Contribution-Evaluation
JABEJA
SPAR&
JA-BE-JA
Client Requests
Data StoreServers
25
Outline
● Motivation● Current Algorithms
– SPAR
– JA-BE-JA
● Contributions– Challenges
– Solution
● Evaluation & Conclusions
26
Implementation
● SPAR● SPAR-JA
Motivation-Algorithms-Contribution-Evaluation
This is SPARJA!
27
Datasets
● Facebook Graphs
by Stanford Network Analysis Project
– #nodes: 150 #edges: ~3000
– #nodes: 224 #edges: ~6000
– #nodes: 786 #edges: ~60000
Source: http://snap.stanford.edu/
Motivation-Algorithms-Contribution-Evaluation
28
Datasets
● Synthesized Graphs– using our own Graph Generator
– #nodes: 1000, #degree: 10
Motivation-Algorithms-Contribution-Evaluation
ClusteredRandomized Highly Clustered
Graph Visualization Toolhttps://gephi.org/
29
ExperimentsReplication Overhead on Different Datasets
Motivation-Algorithms-Contribution-Evaluation
Synthesized Graphs10000 edges
synth-r: Randomizedsynth-c: Clusteredsynth-hc:
Highly Clustered
Facebook Graphsfcbk-1: ~3000 edgesfcbk-2: ~6000 edgesfcbk-3: ~60000 edges
#k-replicas: 0 (fault tolerance) #Servers: 4
30
ExperimentsReplication Overhead vs Replication Factor
Motivation-Algorithms-Contribution-Evaluation
K=0K=2
31
ExperimentsReplication Overhead on both algorithms
Motivation-Algorithms-Contribution-Evaluation
Fault ToleranceK=2
synth-hc: - Highly Clustered- Synthesized Graph- 10000 edges
32
ExperimentsReplication Overhead on both algorithms
Motivation-Algorithms-Contribution-Evaluation
Fault ToleranceK=2
fcbk-3: - 3rd facebook graph- 60,000 edges
33
Conclusions
● SPAR + JA-BE-JA = SPAR-JA– Highly clustered nodes
– Achieves fault tolerance 'by-default'
– Better than SPAR in case of high clusterization
● Future Work– More datasets
– Bigger datasets
Motivation-Algorithms-Contribution-Evaluation
Presented by: Maria Stylianou Coworker: Anis Uddin
Supervisor: Šarūnas Girdzijauskas
KTH - Royal Institute of TechnologyImplementation of Distributed Systems
December 6th, 2012
Scaling Online Social Networks (OSNs)