Scaling Online Social Networks (OSNs)

Preview:

DESCRIPTION

The final presentation of a semester project. Course: Implementation of Distributed Systems (KTH Royal Institute of Technology)

Citation preview

Presented by: Maria Stylianou Coworker: Anis Uddin

Supervisor: Šarūnas Girdzijauskas

KTH - Royal Institute of TechnologyImplementation of Distributed Systems

December 6th, 2012

Scaling Online Social Networks (OSNs)

2

Outline

● Motivation● Current Algorithms

– SPAR

– JA-BE-JA

● Contributions– Challenges

– Solution

● Evaluation & Conclusions

3

Outline

● Motivation● Current Algorithms

– SPAR

– JA-BE-JA

● Contributions– Challenges

– Solution

● Evaluation & Conclusions

4

“Pandora's box”Online Social Networks

Motivation-Algorithms-Contribution-Evaluation

Source: http://technorati.com/social-media/article/social-networks-theyre-what-every-local/

5

Easy to maintain...Online Social Networks

Motivation-Algorithms-Contribution-Evaluation

Source: http://mastersofmedia.hum.uva.nl/2009/09/14/a-review-of-taken-out-of-context/

6

...or not!Online Social Networks

Motivation-Algorithms-Contribution-Evaluation

Source: http://mastersofmedia.hum.uva.nl/2009/09/14/a-review-of-taken-out-of-context/

7

Scaling Approaches

Vertical Scaling● Full Replication

● Data Locality

● But:

– Expensive

– Saturation

Motivation-Algorithms-Contribution-Evaluation

Horizontal Scaling● Adding servers

● Clean & Disjoint Partitions

● But:

– Not applicable in OSNs

8

Scaling Approaches

Vertical Scaling● Full Replication

● Data Locality

● But:

– Expensive

– Saturation

Motivation-Algorithms-Contribution-Evaluation

Inefficient

Horizontal Scaling● Adding servers

● Clean & Disjoint Partitions

● But:

– Not applicable in OSNs

9

Existing 'Solutions' for OSNs

Relational Databases

Motivation-Algorithms-Contribution-Evaluation

Key-Value Stores

10

Existing 'Solutions' for OSNs

Relational Databases

Motivation-Algorithms-Contribution-Evaluation

Inefficient

Key-Value Stores

11

Outline

● Motivation● Current Algorithms

– SPAR

– JA-BE-JA

● Contributions– Challenges

– Solutions

● Evaluation & Conclusions

12

SPAR

Social Partitioning & Replication middle-ware● Transparent OSN scalability avoids ● Data Locality performance● Load Balancing bottlenecks

● Fault Tolerance● Stability● Replication Overhead Minimization

Motivation-Algorithms-Contribution-Evaluation

13

SPAR

Events● Nodes – Add/Remove● Edges – Add/Remove● Servers – Add/Remove

Motivation-Algorithms-Contribution-Evaluation

14

SPAR Algorithm

Motivation-Algorithms-Contribution-Evaluation

2

3

4

1

M1

M3

M2

5

5'

5

6'

1'

5'

6

Create Edge (1,6)

Master Node

Replica Node

15

SPAR Algorithm

Motivation-Algorithms-Contribution-Evaluation

2

3

4

1

M1

M3

M2

5

5'

5

6'

1'

5'

6

Create Edge (1,6)

C1: Create 6' in M1 Create 1' in M3

Master Node

Replica Node

6'

1'

16

SPAR Algorithm

Motivation-Algorithms-Contribution-Evaluation

2

3

4

1M1

M3

M2

5

6'

1'

5'

6

Create Edge (1,6)

C2: Move 1 to M3

Master Node

Replica Node

1'

4'

3'

2'

17

SPAR Algorithm

Motivation-Algorithms-Contribution-Evaluation

2

3

4

1

M1

M3

M2

5

5'

5

6'

1'

6

Create Edge (1,6)

C3: Move 6 to M1

Master Node

Replica Node

18

JA-BE-JA

● Distributed Partitioning Algorithm● K-way Partitioning● Load Balancing● Gossip Learning

Motivation-Algorithms-Contribution-Evaluation

19

JA-BE-JA - Policies

● Sampling– Local

● Select neighbors

– Random● Select from random

walk

– Hybrid● Local & Random

Motivation-Algorithms-Contribution-Evaluation

● Swapping– Energy Function

● Reach minimum

– Simulated Annealing● Escape from local

optima

Source: http://socialnetworking.lovetoknow.com/Growth_of_Online_Social_Networking_in_Business

20

Outline

● Motivation● Current Algorithms

– SPAR

– JA-BE-JA

● Contributions– Challenges

– Solution

● Evaluation & Conclusions

21

Challenges

Motivation-Algorithms-Contribution-Evaluation

SPAR

Global View requirement

Replication Overhead

Partition Manager→ Single Point of Failure

SPAR

22

Our Solution

Motivation-Algorithms-Contribution-Evaluation

SPAR&

JA-BE-JA

Global View requirement

Replication Overhead

Partition Manager→ Single Point of Failure Local View

Distributed PartitionManager

23

Our Solution (wait for it...)

Motivation-Algorithms-Contribution-Evaluation

SPAR Client Requests

Data StoreServers

24

Our Solution

Motivation-Algorithms-Contribution-Evaluation

JABEJA

SPAR&

JA-BE-JA

Client Requests

Data StoreServers

25

Outline

● Motivation● Current Algorithms

– SPAR

– JA-BE-JA

● Contributions– Challenges

– Solution

● Evaluation & Conclusions

26

Implementation

● SPAR● SPAR-JA

Motivation-Algorithms-Contribution-Evaluation

This is SPARJA!

27

Datasets

● Facebook Graphs

by Stanford Network Analysis Project

– #nodes: 150 #edges: ~3000

– #nodes: 224 #edges: ~6000

– #nodes: 786 #edges: ~60000

Source: http://snap.stanford.edu/

Motivation-Algorithms-Contribution-Evaluation

28

Datasets

● Synthesized Graphs– using our own Graph Generator

– #nodes: 1000, #degree: 10

Motivation-Algorithms-Contribution-Evaluation

ClusteredRandomized Highly Clustered

Graph Visualization Toolhttps://gephi.org/

29

ExperimentsReplication Overhead on Different Datasets

Motivation-Algorithms-Contribution-Evaluation

Synthesized Graphs10000 edges

synth-r: Randomizedsynth-c: Clusteredsynth-hc:

Highly Clustered

Facebook Graphsfcbk-1: ~3000 edgesfcbk-2: ~6000 edgesfcbk-3: ~60000 edges

#k-replicas: 0 (fault tolerance) #Servers: 4

30

ExperimentsReplication Overhead vs Replication Factor

Motivation-Algorithms-Contribution-Evaluation

K=0K=2

31

ExperimentsReplication Overhead on both algorithms

Motivation-Algorithms-Contribution-Evaluation

Fault ToleranceK=2

synth-hc: - Highly Clustered- Synthesized Graph- 10000 edges

32

ExperimentsReplication Overhead on both algorithms

Motivation-Algorithms-Contribution-Evaluation

Fault ToleranceK=2

fcbk-3: - 3rd facebook graph- 60,000 edges

33

Conclusions

● SPAR + JA-BE-JA = SPAR-JA– Highly clustered nodes

– Achieves fault tolerance 'by-default'

– Better than SPAR in case of high clusterization

● Future Work– More datasets

– Bigger datasets

Motivation-Algorithms-Contribution-Evaluation

Presented by: Maria Stylianou Coworker: Anis Uddin

Supervisor: Šarūnas Girdzijauskas

KTH - Royal Institute of TechnologyImplementation of Distributed Systems

December 6th, 2012

Scaling Online Social Networks (OSNs)

Recommended