MSc. Thesis Defense · Introduction: Communities 7 • In file sharing P2P networks: – Node_A interested in music and software – Node_B interested in music and movies – Node_A

Decentralized and Dynamic Community Formation in

P2P Networks & Performance of Community Based

Caching

Chepchumba S. Limo

May 6, 2015

1

Committee Members:

Anura Jayasumana (Advisor)

Liuiqing Yang

Christos Papadopoulos

MSc. Thesis Defense

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

2

Introduction: Peer-to-Peer Networks

3

• Example of overlay network

• File transfer P2P networks

– Lookup/resource discovery

• P2P messaging

– PUT <key, value>

– GET(key) value• GET(key) node ID

• GET(key) data

• Mitigate resource discovery

– Distributed Hast Tables (DHTs)

– Caching Schemes

Key Value

123 Jack & Jill

A34 Avengers

BC5 Spiderman

24F Beyoncé

GET(123)

Jack & Jill

Introduction: Caching

4

• Distributed Hash Tables (DHT)

– Efficient

– Highly scalable

– Self organizing

• Caching

– Favor popular resources relative to entire network

– But traffic modeled by Zipf’sdistribution

Introduction: Caching

5

• Subset of nodes that share similar interests are said to

form a community

• Communities exist naturally

• Community Based Caching (CBC) algorithm proposed

– Exploits existence of communities when caching

– More nodes benefit from caching

Introduction: Communities

6

Structured P2P Network (Chord) Unstructured P2P Networks

Introduction: Communities

7

• In file sharing P2P networks:– Node_A interested in music and software

– Node_B interested in music and movies

– Node_A and Node_B form music community (community appears)

– Node_C with music and software interest joins network

– Node_C should join music community with Node_A and Node_B (community grows)

– Node_A, Node_B, Node_C leave network (community disappears)

• Properties of communities:1. Naturally occurring

2. Dynamic

3. Nodes/users can belong to multiple communities

4. Nodes/users can join/leave communities at-will

Overview

• Introduction






8

Motivation

9

1. Community Based Caching algorithm (CBC) tested under the

limiting conditions, i.e.:

i. Static community assignment

ii. Nodes/users couldn’t change membership

iii. Nodes limited to being members of only 1 community

iv. Community membership based on websites queried (arguably weak

similarity)

Motivation

10

2. Limitation of existing dynamic community formation algorithms:i. Centralized node for maintenance

ii. Complicated computations

iii. Additional messaging to establish community membership

iv. Limited to being members of one community at a time

3. Basis to established similarities for community formationi. Website queried – weak

ii. Personal interests

iii. Acquired interests

Problem Statement

11

• Motivation summary:i. CBC tested under stringent conditions

ii. Existing algorithms have limitations

iii. Consider other basis of community formation

• Contribution– Decentralized community discovery algorithm

• Considers community properties i.e. naturally occurring, dynamic, members of multiple communities & join/leave communities at-will

• Overcomes limitations of existing algorithms

• Utilize already existing PUT and GET messages – no additional messaging needed

– Special key generation technique• Dissemination group information

– Test CBC under more realistic conditions• Network with churn

Overview

• Introduction






12

Dynamic Group Discovery (DGD)

13

• Key has embedded meta-data on the type of resource it represents

• No additional security risk added if algorithm is publicly known


14

• Key generation– Last 12 bits of the key needed for the three levels of group

identification:• Level 1 (mandatory): general classification e.g., music, movies etc.

• Level 2 (optional): specify geographical location e.g., U.S.A, Canada etc.

• Level 3 (optional): specify genre e.g., comedy, jazz etc.

Key Group ID Level 1 Group ID Level 2 Group ID Level 3 Final Key

0123456789abcdef music => 1 Canada => 2 hip-hop => 3 0123456789abcdef 123

a123456789bcdef0 music => 1 N/A=> 0 blues=> 9 a123456789bcdef0 109

b123456789bcdef0 movies => 3 USA => 3 comedy => 6 b123456789bcdef0 236


15

• Built on top of structured P2P

– Guaranteed performance compared to unstructured

– Chord used


16

• Goal of DGD is to allow community formation in structured

P2P networks


17

• Establish group interest– Personal interests

• Two thresholds used:

– λ = for personal interests

– μ = for acquired interests

• λ << μ

void forward(key, msg, nextHop*)

{

if msg type = GET

{

extract group ID from key;

if group ID finger table and not pointing to me // is there interest

{

if hops < (𝑙𝑜𝑔2𝑁)/2{

set nextHop using entries in group ID finger table;

}

else

{

use chord to set nextHop;

}

}

else

{

keep track of specific GET message;

use chord to set nextHop;

if specific GET messages received λ OR μ times

send FIND GROUP request;

}

}

}

Group IDFinger To Group

Member

Frequency of

Use

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1


18

• Maintaining group ID finger table

– Limited resources

• σ = max number of fingers

Group IDFinger To Group

Member

Frequency of

Use

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1

void handle_FINDGROUP_response(groupID, finger)

{

if <group ID, finger> pair already exist // done to avoid duplicates

return;

if group ID finger table is at capacity

{

if one least used finger can be identified

delete it;

else // i.e. multiple fingers with same low frequency use number

pick one at random and delete;

}

add new found finger;

}σ = 3

Overview

• Introduction






19

Simulation Implementation

20

• Oversim

– Flexible network simulation framework

– Popular event driven simulator for P2P networks

• Keys and queries generated external to the simulator

– Able to indirectly control community size and symmetry

• Keys and queries generation:

1. Determine desired symmetry and size

2. Generate random keys with desired symmetry

3. Sort keys based level 1 identification

4. Assign Zipf’s α parameter per community

5. Generate queries


21

void handle PUT event

{

key = read from key file;

if key is unspecified // i.e. all keys have been read from key file

{

schedule GET event;

}

else

{

extract group ID information from key;

add entry to group ID finger table;

create PUT message and send it out to network;

schedule next PUT event;

}

}


22

void handle GET event

{

select group ID file to read from; // Based on personal interest

query = read from key from query file;

if query is unspecified // i.e. all keys have been read from query file

{

return; // do nothing

}

else

{

create GET message with query;

send out GET message;

schedule next GET event;

}

}


23

void hand_REMOVE_request(group id, finger, maxHops, curHops)

{

if <group ID, finger> pair exist

delete entry

if curHops < maxHops

{

curHops ++;

forward message to all nodes in group ID finger table

}

}

Overview

• Introduction






24

Case 1: Varying Number of Nodes

25

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9

• Maximum group ID finger table per node = 160

• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 1: Varying Number of Nodes

26

Case 2: Varying Community Size

27

• 2,000 nodes

• 40,000 keys divided into 2 section i.e., 80% section 1 and 20% section 2

– Run 1: one community in section 1 and eight communities in section 2

– Run 2: two communities in section1 and seven communities in section 2

– Run 3: …


• λ = 2

• μ = 20

• σ = 3

Case 2: Varying Community Size

28

Case 3: Introducing Churn

29

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9


• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 3: Introducing Churn

30

Overview

• Introduction






31

Conclusion

32

1. Decentralized dynamic group discovery algorithm

2. Key generation with embedded group ID information

3. Improve lookup performance for queries resolved using cache data

• Stronger community basis i.e. personal and acquired interests

• Without churn

4. Easy implementation of dynamic group discovery • Utilize already existing messages

• Additional computation – extracting group ID information

5. Great potential for robust caching solution

Future Work

33

1. Optimize entries in group ID finger table

• Consider distance of new finger relative to other fingers in other tables

2. Consider location of next hop of group member

• Applicable for structured P2P networks

3. DGD need to know exactly how many nodes in network• if finger not pointing to me and hops < (𝑙𝑜𝑔2𝑁)/2

• Find solution to determine number of nodes in network with churn

4. Introduce churn in measurable manner

• Better characterize DGD’s performance

5. Test DGD with other type of P2P networks

Thank you!

• Dr. Jayasumana

• Liuiqing Yang

• Christos Papadopoulos

• Friend and co-workers at CSU and Dot Hill

• Family

34

QUESTIONS?

35

Documents

MSc. Thesis Defense · Introduction: Communities 7 • In file sharing P2P networks: – Node_A interested in music and software – Node_B interested in music and movies – Node_A