Upload
aldous-mccoy
View
218
Download
1
Embed Size (px)
Citation preview
Towards Efficient Load Balancing in Structured P2P Systems
Yingwu Zhu, Yiming Hu
University of Cincinnati
Outline
• Motivation and Preliminaries
• Load balancing scheme
• Evaluation
Why Load Balancing?• Structured P2P systems, e.g., Chord,Pastry
– Object IDs and Node IDs are produced by using a uniform hash function.
– Results in O(log N) load imbalance, in the number of objects stored at each node.
• Skewed distribution of node capacity– Nodes may carry loads proportional to their
capacities.
• Other problems: different object sizes, non-uniform dist. of object IDs.
Virtual Servers (VS)• First introduced in Chord/CFS.
• A VS is responsible for a contiguous region of the ID space.
• A node can host multiple VSs.
Chord Ring
Node A
Node C
Node B
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load
to be transferred between nodes.
• L – Load, T – Target Load.
T=15
Chord Ring
Heavy
L=45
L=41
L=3Node C
Node B
Node A
30
20 11
3
10
15T=50
T=35
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load
to be transferred between nodes.
• L – Load, T – Target Load.
T=15
Chord Ring
Heavy
L=45
L=41
L=3Node C
Node B
Node A
30
20 11
3
10
15T=50
T=35
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load to
be transferred between nodes.• L – Load, T – Target Load.
Chord Ring
Node A
Node C
Node B
T=50
T=15
T=35
L=45
L=31
L=14
L=30
30
20 11
3
10
15
Advantages of Virtual Servers
• Flexible: load is moved in the unit of a virtual server.
• Simple: – VS movement is supported by all structured P2P
systems.– Simulated by a leave operation followed by a join
operation.
Current Load Balancing Solutions
• Some use the concept of virtual server
• However:– Either ignore the heterogeneity of node
capabilities.– Or transfer loads without considering proximity
relationships between nodes.– Or both.
Goals
• Goals:– To maintain each node’s load less than its target
load (maximum load a node is willing to take).– High capacity nodes take more loads.– Load balancing is performed in proximity-aware
manner, to minimize the overhead of load movement (bandwidth usage) and allow more efficient and fast load balancing.
• Load: depends on the particular P2P systems.– E.g., storage, network bandwidth, and CPU cycles.
Assumptions
• Nodes in system are cooperative.
• Only one bottlenecked resource, e.g., storage or network bandwidth.
• The load of each virtual server is stable over the timescale when load balancing is performed.
Overview of Design
• Step1: Load balancing information (LBI) aggregation, e.g., load and capacity info.
• Step2: Node classification. E.g., heavy nodes, light nodes, neutral nodes.
• Step3: Virtual server assignment (VSA).
• Step4: Virtual server transferring (VST).
• Proximity-aware load balancing– VSA is proximity-aware.
LBI Aggregation and Node Classification• Rely on a fully decentralized, self-repairing, and fault-tolerant K-nary
tree built on top of a DHT (distributed hash table). • Each K-nary tree node is planted in a DHT node.• <L, C, Lmin> represents the load, capacity and the minimum load of
virtual servers, respectively.
<12,10,2> <15,8,3> <20,10,5> <15,20,4>
<27,18,2> <35,30,4>
<62, 48, 2>
LBI Aggregation and Node Classification• Relying on a fully decentralized, self-repairing, and fault-tolerant K-
nary tree built on top of a DHT. • Each K-nary tree node is planted in a DHT node.
• <L, C, Lmin> represents the load, capacity, and the minimum load of virtual servers.
<12,10,2> <15,8,3> <20,10,5> <15,20,4>
<62, 48, 2>
<62, 48, 2> <62, 48, 2>
<62, 48, 2> <62, 48, 2> <62, 48, 2> <62, 48, 2>
Light
Heavy
Heavy
LightTi = (L/C+)*Ci
Virtual Server Assignment
H1 L1 H2 H3 Ln Ln+1 Hm Hm+1…
V11, V12 C1 V21 V31, V32 Cn Cn+1 Vm1, Vm2 Vm+1
VSA information VSA information
Rendezvous point: best-fit heuristics
Rendezvous point: best fit heuristics
Unpaired VSA information
Final rendezvous pointVS
A happens earlier betw
een logically closer nodes
Logically close
Virtual Server Assignment• DHT identifier space-based VSA:
– VSA happens earlier between logically closer nodes.– Proximity-ignorant, because logically close nodes in DHT do
NOT mean they are physically close together.
H1
L3
L2L4
L1
H2
[1] Nodes in same colors are
physically close to each other.
[2] H – heavy nodes, L – light nodes.
[3] Vi – virtual servers.V1
V2 V3
Proximity-Aware VSA• Nodes in same colors are physically close to each other.
• H – heavy node, L – light node, Vi – virtual server.
• VSs are assigned between physically close nodes.
H1
L3
L2L4
L1
H2
V1V2
V3
Proximity-Aware VSA
• Use landmark clustering to generate proximity information, e.g. landmark vectors.
• Use space-filling curves (e.g., Hilbert curve): Landmark vectors Hilbert numbers as DHT keys.
• Heavy nodes and light nodes each puts/maps their VSA info. into the underlying DHT with the resulting DHT keys: align physical closeness with logical closeness.
• Each virtual server independently reports the VSA info. which is mapped into its responsible region, rather than its node’s own VSA info.
Proximity-Aware Virtual Server Assignment
H1 L1 H2 H3 Ln Ln+1 Hm Hm+1…
V11, V12 C1 V21 V31, V32 Cn Cn+1 Vm1, Vm2 Vm+1
VSA information VSA information
Rendezvous point: best-fit heuristics
Rendezvous point: best fit heuristics
Unpaired VSA information
Final rendezvous point
Physically close
VS
A happens earlier betw
een physically closer nodes
Experimental Setup
• A K-nary tree built on top of a DHT (Chord), e.g., k=2, and 8, respectively.
• Two node capacity distributions:– Gnutella-like capacity profile, 5-level capacities.– Zipf-like capacity profile.
• Two load distributions of virtual servers:– Gaussian dist. and Pareto dist.
• Two transit-stub topologies (5,000 nodes):– “ts5k-large” and “ts5k-small”.
High Capacity Nodes Carry More Loads
Gaussian load distribution + Gnutella-like capacity profile
High Capacity Nodes Carry More Loads
Pareto load distribution + Zipf-like capacity profile
Proximity-Aware Load Balancing
CDF of Moved Load Distribution in ts5k-large
Gaussian load distribution and
Gnutella-like capacity profile
Pareto load distribution and
Zipf-like capacity profile
More loads are moved within shorter distances by proximity-aware load balancing.
Benefit of Proximity-Aware Scheme
• Load movement cost:
LM(d) denotes the load moved in the distance of d hops.
• Benefit:
• Results: – For ts5k-large: B = 37-65%
– For ts5k-small: B = 11-20%
Other Results
• Quantify the overhead of K-nary tree construction:– Link stress, node stress.
• The latencies of LBI aggregation and VSA, bound in O(logN) time.
• The effect of pairing threshold in rendezvous points.
Conclusions• Current load balancing approaches using virtual servers
have limitations:– Either ignore node capacity heterogeneity.– Or transfer loads without considering proximity relationships
between nodes.– Or both.
• Our solution:– A fully decentralized, self-repairing, and fault-tolerant K-nary
is built on top of DHTs for performing load balancing.– Nodes carry loads in proportion to their capacities.– The first work to address load balancing issue in a proximity-
aware manner, thereby minimizing the overhead of load movement and allowing more efficient load balancing.
Questions?