Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Decentralized and Dynamic Community Formation in
P2P Networks & Performance of Community Based
Caching
Chepchumba S. Limo
May 6, 2015
1
Committee Members:
Anura Jayasumana (Advisor)
Liuiqing Yang
Christos Papadopoulos
MSc. Thesis Defense
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
2
Introduction: Peer-to-Peer Networks
3
• Example of overlay network
• File transfer P2P networks
– Lookup/resource discovery
• P2P messaging
– PUT <key, value>
– GET(key) value• GET(key) node ID
• GET(key) data
• Mitigate resource discovery
– Distributed Hast Tables (DHTs)
– Caching Schemes
Key Value
123 Jack & Jill
A34 Avengers
BC5 Spiderman
24F Beyoncé
GET(123)
Jack & Jill
Introduction: Caching
4
• Distributed Hash Tables (DHT)
– Efficient
– Highly scalable
– Self organizing
• Caching
– Favor popular resources relative to entire network
– But traffic modeled by Zipf’sdistribution
Introduction: Caching
5
• Subset of nodes that share similar interests are said to
form a community
• Communities exist naturally
• Community Based Caching (CBC) algorithm proposed
– Exploits existence of communities when caching
– More nodes benefit from caching
Introduction: Communities
6
Structured P2P Network (Chord) Unstructured P2P Networks
Introduction: Communities
7
• In file sharing P2P networks:– Node_A interested in music and software
– Node_B interested in music and movies
– Node_A and Node_B form music community (community appears)
– Node_C with music and software interest joins network
– Node_C should join music community with Node_A and Node_B (community grows)
– Node_A, Node_B, Node_C leave network (community disappears)
• Properties of communities:1. Naturally occurring
2. Dynamic
3. Nodes/users can belong to multiple communities
4. Nodes/users can join/leave communities at-will
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
8
Motivation
9
1. Community Based Caching algorithm (CBC) tested under the
limiting conditions, i.e.:
i. Static community assignment
ii. Nodes/users couldn’t change membership
iii. Nodes limited to being members of only 1 community
iv. Community membership based on websites queried (arguably weak
similarity)
Motivation
10
2. Limitation of existing dynamic community formation algorithms:i. Centralized node for maintenance
ii. Complicated computations
iii. Additional messaging to establish community membership
iv. Limited to being members of one community at a time
3. Basis to established similarities for community formationi. Website queried – weak
ii. Personal interests
iii. Acquired interests
Problem Statement
11
• Motivation summary:i. CBC tested under stringent conditions
ii. Existing algorithms have limitations
iii. Consider other basis of community formation
• Contribution– Decentralized community discovery algorithm
• Considers community properties i.e. naturally occurring, dynamic, members of multiple communities & join/leave communities at-will
• Overcomes limitations of existing algorithms
• Utilize already existing PUT and GET messages – no additional messaging needed
– Special key generation technique• Dissemination group information
– Test CBC under more realistic conditions• Network with churn
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
12
Dynamic Group Discovery (DGD)
13
• Key has embedded meta-data on the type of resource it represents
• No additional security risk added if algorithm is publicly known
Dynamic Group Discovery (DGD)
14
• Key generation– Last 12 bits of the key needed for the three levels of group
identification:• Level 1 (mandatory): general classification e.g., music, movies etc.
• Level 2 (optional): specify geographical location e.g., U.S.A, Canada etc.
• Level 3 (optional): specify genre e.g., comedy, jazz etc.
Key Group ID Level 1 Group ID Level 2 Group ID Level 3 Final Key
0123456789abcdef music => 1 Canada => 2 hip-hop => 3 0123456789abcdef 123
a123456789bcdef0 music => 1 N/A=> 0 blues=> 9 a123456789bcdef0 109
b123456789bcdef0 movies => 3 USA => 3 comedy => 6 b123456789bcdef0 236
Dynamic Group Discovery (DGD)
15
• Built on top of structured P2P
– Guaranteed performance compared to unstructured
– Chord used
Dynamic Group Discovery (DGD)
16
• Goal of DGD is to allow community formation in structured
P2P networks
Dynamic Group Discovery (DGD)
17
• Establish group interest– Personal interests
• Two thresholds used:
– λ = for personal interests
– μ = for acquired interests
• λ << μ
void forward(key, msg, nextHop*)
{
if msg type = GET
{
extract group ID from key;
if group ID finger table and not pointing to me // is there interest
{
if hops < (𝑙𝑜𝑔2𝑁)/2{
set nextHop using entries in group ID finger table;
}
else
{
use chord to set nextHop;
}
}
else
{
keep track of specific GET message;
use chord to set nextHop;
if specific GET messages received λ OR μ times
send FIND GROUP request;
}
}
}
Group IDFinger To Group
Member
Frequency of
Use
120 node A N/A
239 node A N/A
912 node A N/A
122 node X 0
122 node Y 7
122 node Z 9
912 node X 20
235 node E 4
420 node G 1
Dynamic Group Discovery (DGD)
18
• Maintaining group ID finger table
– Limited resources
• σ = max number of fingers
Group IDFinger To Group
Member
Frequency of
Use
120 node A N/A
239 node A N/A
912 node A N/A
122 node X 0
122 node Y 7
122 node Z 9
912 node X 20
235 node E 4
420 node G 1
void handle_FINDGROUP_response(groupID, finger)
{
if <group ID, finger> pair already exist // done to avoid duplicates
return;
if group ID finger table is at capacity
{
if one least used finger can be identified
delete it;
else // i.e. multiple fingers with same low frequency use number
pick one at random and delete;
}
add new found finger;
}σ = 3
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
19
Simulation Implementation
20
• Oversim
– Flexible network simulation framework
– Popular event driven simulator for P2P networks
• Keys and queries generated external to the simulator
– Able to indirectly control community size and symmetry
• Keys and queries generation:
1. Determine desired symmetry and size
2. Generate random keys with desired symmetry
3. Sort keys based level 1 identification
4. Assign Zipf’s α parameter per community
5. Generate queries
Simulation Implementation
21
void handle PUT event
{
key = read from key file;
if key is unspecified // i.e. all keys have been read from key file
{
schedule GET event;
}
else
{
extract group ID information from key;
add entry to group ID finger table;
create PUT message and send it out to network;
schedule next PUT event;
}
}
Simulation Implementation
22
void handle GET event
{
select group ID file to read from; // Based on personal interest
query = read from key from query file;
if query is unspecified // i.e. all keys have been read from query file
{
return; // do nothing
}
else
{
create GET message with query;
send out GET message;
schedule next GET event;
}
}
Simulation Implementation
23
void hand_REMOVE_request(group id, finger, maxHops, curHops)
{
if <group ID, finger> pair exist
delete entry
if curHops < maxHops
{
curHops ++;
forward message to all nodes in group ID finger table
}
}
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
24
Case 1: Varying Number of Nodes
25
• Between 500 and 10,000 nodes
• 40,000 keys used with following distribution:
– 40% group 1
– 40% group 2
– 20% shared equally between group 3 to 9
• Maximum group ID finger table per node = 160
• λ = 2
• μ = 20
• σ = 3
Asymmetrical communities
Case 1: Varying Number of Nodes
26
Case 2: Varying Community Size
27
• 2,000 nodes
• 40,000 keys divided into 2 section i.e., 80% section 1 and 20% section 2
– Run 1: one community in section 1 and eight communities in section 2
– Run 2: two communities in section1 and seven communities in section 2
– Run 3: …
• Maximum group ID finger table per node = 160
• λ = 2
• μ = 20
• σ = 3
Case 2: Varying Community Size
28
Case 3: Introducing Churn
29
• Between 500 and 10,000 nodes
• 40,000 keys used with following distribution:
– 40% group 1
– 40% group 2
– 20% shared equally between group 3 to 9
• Maximum group ID finger table per node = 160
• λ = 2
• μ = 20
• σ = 3
Asymmetrical communities
Case 3: Introducing Churn
30
Overview
• Introduction
• Motivation and Problem Statement
• Dynamic Group Discovery Algorithm
• Simulation Implementation
• Case Studies and Results
• Conclusion and Future Work
31
Conclusion
32
1. Decentralized dynamic group discovery algorithm
2. Key generation with embedded group ID information
3. Improve lookup performance for queries resolved using cache data
• Stronger community basis i.e. personal and acquired interests
• Without churn
4. Easy implementation of dynamic group discovery • Utilize already existing messages
• Additional computation – extracting group ID information
5. Great potential for robust caching solution
Future Work
33
1. Optimize entries in group ID finger table
• Consider distance of new finger relative to other fingers in other tables
2. Consider location of next hop of group member
• Applicable for structured P2P networks
3. DGD need to know exactly how many nodes in network• if finger not pointing to me and hops < (𝑙𝑜𝑔2𝑁)/2
• Find solution to determine number of nodes in network with churn
4. Introduce churn in measurable manner
• Better characterize DGD’s performance
5. Test DGD with other type of P2P networks
Thank you!
• Dr. Jayasumana
• Liuiqing Yang
• Christos Papadopoulos
• Friend and co-workers at CSU and Dot Hill
• Family
34
QUESTIONS?
35