147
Structured overlays: Self-organization and Scalability @ SASO 2009 1 P P 2 Structured Overlays - Self-organization and Scalability by Anwitaman Datta – Nanyang Technological University, Singapore – [email protected] Ali Ghodsi – UC Berkeley, USA – [email protected]

Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Embed Size (px)

Citation preview

Page 1: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 1

PP

2

Structured Overlays- Self-organization and Scalability

by

Anwitaman Datta – Nanyang Technological University, Singapore – [email protected]

Ali Ghodsi – UC Berkeley, USA –

[email protected]

Page 2: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 2

PP

2

The P2P paradigm- A brief introduction

Part I

Page 3: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 3

PP

23

Outline

• The P2P paradigm– History and philosophy

• P2P in the realm of distributed systems

– Concepts• Decentralization• Self-organization• Overlays

• Resource location problem at the large– Structured overlay networks – Unstructured overlay networks

Page 4: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 4

PP

24

P2P is more than justPirate-to-Pirate

file-sharing!&

distributingillegal copies

The P2P paradigm

Page 5: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 5

PP

25

<rdf:Description about='' xmlns:xap='http://ns.abode.com/xap/1.0/'> <xap:CreateDate>2001-12-19T18:49:03Z</xap:CreateDate> <xap:ModifyDate>2001-12-19T20:09:28Z</xap:ModifyDate> <xap:Creator> Brahma </xap:Creator></rdf:Description>…

knowledge

bandwidth

storage

processing

content

Sharing resources in large-scale networks

Homo sapiens

The P2P paradigm: Application Perspective

Page 6: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 6

PP

26

• Centralized solution undesirable or unattainable• Exploit resources at the edge

- no dedicated infrastructure/servers- peers act as both clients and servers (servent)

• Autonomous participants- large scale- dynamic system and workload- source of unpredictability

- e.g., correlated failures• Lack of global control or knowledge

- rely on self-organization

The P2P paradigm: Systems Perspective

Page 7: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 7

PP

27

• So where does P2P fit in the realm of distributed systems?

A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine. This is in contrast to a network, where the user is aware that there are several machines, and their location, storage replication, load balancing and functionality is not transparent.

[http://foldoc.org/index.cgi?distributed+system]

– In its loosest sense, distributed system is any system with several nodes and a network between them

P2P is just distributed systems

Acknowledgement: The following discussion on how p2p paradigm fits in the realm of distributed systems is inspired by J. Kangasharju’s take on the same issue.

Page 8: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 8

PP

28

• The definition (represents traditional view of distributed systems) implies a managed and controlled entity which acts as a single, logical system– Often also relies on dedicated infrastructure

• In contrast, P2P is decentralized and is not controlled or managed. P2P uses individually unreliable autonomous participants and generally rely on self-organization. – Still, ideally, the system should provide some

overall reliability guarantees

P2P in the realm of distributed systems

Page 9: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 9

PP

29

• Grid– Coordinated resource sharing and problem

solving in dynamic, multi-institutional virtual organizations. - Ian Foster

• Note that a Grid is generally centralized

P2P in the realm of distributed systems

Page 10: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 10

PP

210

• Ad-hoc networks– A wireless ad hoc network is a decentralized wireless network.

The network is ad hoc because each node is willing to forward data for other nodes, and so the determination of which nodes forward data is made dynamically based on the network connectivity. This is in contrast to wired networks in which routers perform the task of routing. It is also in contrast to managed (infrastructure) wireless networks, in which a special node known as an access point manages communication among other nodes.

– Can be seen as a `kind of´ peer-to-peer network• Though often very different research communities are involved,

and the focus of problems and functionalities are also very different.

P2P in the realm of distributed systems

Page 11: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 11

PP

211

Self-organization

• Self-organizing systems common in nature– Physics, biology, ecology, economics, sociology, cybernatics– Microscopic (local) interactions– Limited information, individual decisions

• Distribution of control => decentralization– Symmetry in roles/peer-to-peer

– Emergence of macroscopic (global) properties• Resilience

– Fault tolerance as well as recovery– Adaptivity

Page 12: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 12

PP

2

Resource discovery in the large - Structured overlay basics

Part II

Page 13: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 13

PP

2

Structured overlays/Distributed hash tables

what it is

Page 14: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 14

PP

2

What’s a Distributed Hash Table?

• An ordinary hash table

• Every node provides a lookup operation–Given a key: return the associated value

• Nodes keep routing pointers–If item not found locally, route to another node

Key ValueAnwitaman

Singapore

Ali Berkeley

Alberto Trento

Kurt Kassel

Ozalp Bologna

Randy Berkeley

, which is distributed

Page 15: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 15

PP

2

Why’s that interesting?

• Characteristic properties– Self-management in presence joins/leaves/failures

• Routing information • Data items

– Scalability• Number of nodes can be huge• Number of items can be huge

Page 16: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 16

PP

2

short interlude

applications

Page 17: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 17

PP

2Name-based communication Pattern• Map node names to location

– Can store all kinds of contact information• Mediator peers for NAT hole punching• Profile information

• Used this way by:– Host Identity Payload (HIP)– P2P Session Initiation Protocol (P2PSIP)– Wuala– Internet Indirection Infrastructure (i3)

130.237.32.51anwita

193.10.64.99ali

18.7.22.83alberto

128.178.50.12ozalp

……

ValueKey

node A

node D

node B

node C

Page 18: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 18

PP

2

Global File System

• Similar to DFS (eg NFS, AFS)– But files/metadata stored in directory– E.g. Wuala, WheelFS…

• What is new?– Application logic self-managed

• Add/remove servers on the fly• Automatic faliure handling• Automatic load-balancing

– No manual configuration for these ops

130.237.32.51/home/...

193.10.64.99/usr/…

18.7.22.83/boot/…

128.178.50.12/etc/…

……

ValueKey

node A

node D

node B

node C

Page 19: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 19

PP

2

• A distributed web proxy/cache– Every node in the LAN runs a DHT client

• Browsing for a page:– Check DHT

• If page exists locally download from peer– Otherwise, fetch and cache

• Seamlessly add/remove workstations– No central servers

• Example:– Squirrel

130.237.32.51www.s...

193.10.64.99www2…

18.7.22.83www3…

128.178.50.12cs.edu

……

ValueKey

node A

node D

node B

node C

P2P Proxy

Page 20: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 20

PP

2

P2P Web Servers

• Distributed Web Server– Pages stored in the directory

• What is new?– Application logic self-managed

• Automatically load-balances• Add/remove servers on the fly• Automatically handles failures

• Example:– CoralCDN

130.237.32.51www.s...

193.10.64.99www2

18.7.22.83www3

128.178.50.12cs.edu

……

ValueKey

node A

node D

node B

node C

Page 21: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 21

PP

2

Access Layers for DHTs

• A relational view of the DHT (PIER)– Use SQL to fetch data– Standard operations (projection, selection,

equi-join)

• Approximate Matching (CUBIT)– Get k items with keys most similar to given key

130.237.32.51www.s...

193.10.64.99www2

18.7.22.83www3

128.178.50.12cs.edu

……

ValueKey

node A

node D

node B

node C

select name,salary

from emp, sal

where emp.id=sal.f_id

130anwita...

223alberto

141ali

221ozalp

……

ValueKey

node A

node D

node B

node C

get(”arwitanam”,1):

(”anwita”:”130”)

Page 22: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 22

PP

2

towards DHT construction

consistent hashing

Page 23: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 23

PP

2

Hash tables

• Ordinary hash tables– put(key,value)

• Store <key,value> in bucket (hash(key) mod 7)

– get(key)• Fetch <key,v> s.t. <key,v> is in bucket

(hash(key) mod 7)

0 1 2 3 4 5 6

Page 24: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 24

PP

2DHT by mimicking Hash Tables

• Let each bucket be a server– n servers means n buckets

• Problem– How do we remove or add buckets?– A single bucket change requires re-shuffling a

large fraction of items

Page 25: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 25

PP

2

Consistent Hashing Idea

• Logical name space, called the identifier space, consisting of identifiers {0,1,2,…, N-1}

• Identifier space is a logical ring modulo N

• Every node picks a random identifier

• Example:

– Space N=16 {0,…,15}

– Five nodes a, b, c, d• a picks 6• b picks 5• c picks 0• d picks 5• e picks 2

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 26: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 26

PP

2

Definition of Successor

• The successor of an identifier is the first node met going in clockwise direction

starting at the identifier

• Example– succ(12)=14

– succ(15)=2

– succ(6)=6

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 27: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 27

PP

2

Where to store items?

• Use globally known hash function, H

• Each item <key,value> gets the

identifier H(key)

• Store item at successor of H(key)– Term: node is responsible for item k

• Example– H(“Anwitaman”)=12

– H(“Ali”)=2

– H(“Alberto”)=9

– H(“Ozalp”)=14

2

11

6

5

01

3

4

789

10

15

14

13

12

Key Value

Anwitaman

Singapore

Ali BerkeleyAlberto TrentoKurt KasselOzalp Bologna

Page 28: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 28

PP

2Consistent Hashing: Summary• Scalable

– Each node stores avg D/n items (for D total items, n nodes)– Reshuffle on avg D/n items for every join/leave/failure

• Everybody knows everybody– Akamai works this way– Amazon Dynamo too

• Load balancing– Whp O(log n) imbalance– Eliminate imbalance by

having each server ”simulate”log(n) random buckets

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 29: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 29

PP

2

towards dht construction

reducing neighbors

Page 30: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 30

PP

2

Where to point (Chord)?• Each node points to its successor

– The successor of a node p is succ(p+1)– Known as a node’s succ pointer

• Each node points to its predecessor– First node met in anti-clockwise direction starting at n-1 – Known as a node’s pred pointer

• Example– 0’s successor is succ(1)=2– 2’s successor is succ(3)=5– 5’s successor is succ(6)=6– 6’s successor is succ(7)=11– 11’s successor is succ(12)=0

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 31: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 31

PP

2

DHT Lookup

• To lookup a key k

– Calculate H(k)

– Follow succ pointers until item k is found

• Example– Lookup ”Alberto” at node 2

• H(”Alberto”)=9

• Traverse nodes:2, 5, 6, 11 (BINGO)

• Return “Trento” to initiator

2

11

6

5

01

3

4

789

10

15

14

13

12

Key Value

Anwitaman Singapore

Ali Berkeley

Alberto Trento

Kurt Kassel

Ozalp Bologna

Page 32: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 32

PP

2

towards dht construction

handling joins/leaves/failures

Page 33: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 33

PP

2

Dealing with failures• Each node keeps a successor-list

– Pointer to f closest successors• succ(p+1)• succ(succ(p+1)+1)• succ(succ(succ(p+1)+1)+1)• ...

• Rule: If successor fails– Replace with closest alive successor

• Rule: If predecessor fails– Set pred to nil

• Set f=log(n)– With failure probability 0.5, w.h.p. all nodes in list

will not fail: 1/2log(n)=1/n

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 34: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 34

PP

2

Handling Dynamism

• Periodic stabilization used to make pointers eventually correct

– Try pointing succ to closest alive successor

– Try pointing pred to closest alive predecessor

Periodically at node p:

1. set v:=succ.pred2. if v≠nil and v is in

(p,succ]3. set succ:=v4. send a notify(p) to succ

When receiving notify(q) at node p:

1. if pred=nil or q is in (pred,p]

2. set pred:=q

Page 35: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 35

PP

2

Handling joins

• When new node n joins– Find n’s successor with lookup(n)– Set succ to n’s successor– Stabilization fixes the rest

Periodically at node p:

1. set v:=succ.pred2. if v≠nil and v is in

(p,succ]3. set succ:=v4. send a notify(p) to succ

When receiving notify(q) at node p:

1. if pred=nil or q is in (pred,p]

2. set pred:=q

11

1513

Page 36: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 36

PP

2

Handling leaves

• When n leaves– Just dissappear (like failure)

• When pred detected failed– Set pred to nil

• When succ detected failed– Set succ to closest alive in successor list

11

1513

Periodically at node p:

1. set v:=succ.pred2. if v≠nil and v is in

(p,succ]3. set succ:=v4. send a notify(p) to succ

When receiving notify(q) at node p:

1. if pred=nil or q is in (pred,p]

2. set pred:=q

Page 37: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 37

PP

2Speeding up lookups with fingers

• If only pointer to succ(p+1) is used– Worst case lookup time is n, for n nodes

• Improving lookup time (binary search)– Point to succ(p+1)– Point to succ(p+2)– Point to succ(p+4)– Point to succ(p+8)– …– Point to succ(p+2(log N)-1)

• Distance always halved to

the destination, log hops

2

11

6

5

01

3

4

789

10

15

14

13

12

Page 38: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 38

PP

2Handling Dynamism of Fingers and SList

• Node p periodically:

– Update fingers• Lookup p+21, p+22, p+23,…,p+2(log N)-1

– Update successor-list• slist := trunc(succ · succ.slist)

Page 39: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 39

PP

2

Chord: Summary

• Lookup hops is logarithmic in n– Fast routing/lookup like in a dictionary

• Routing table size is logarithmic in n– Few nodes to ping

Page 40: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 40

PP

2

Reliable Routing

• Iterative lookup– Generally slow (handling NATs, fw)– Reliability easy to achieve

• Initiator in full control

• Recursive lookup– Generally fast (use established links)– Several ways to do reliability

• End-to-end timeouts• Any node timeouts

– Difficult to determine timeout value

• Transitive lookup– Reliability: end-to-end timeouts

Page 41: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 41

PP

2

Replication of items

• Successor-list replication (Chord,Pastry)– Idea: replicate nodes

• If node p responsible for set of items K• Replicate K on p’s immediate successors

• Symmetric Replication (DKS)– Idea: replicate identifiers

• Items with key 0,16,32,48 equivalent• Whoever is responsible for 0, also stores 16,32,48• Whoever is responsible for 16, also stores 0,32,48• …

Page 42: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 42

PP

2

towards proximity awareness

plaxton-mesh (PRR)pastry/tapestry

Page 43: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 43

PP

2

Plaxton Mesh [PRR]

• Identifiers represented with radix/base k– We use k=16, hexadecimal radix– Ring size N is a large power of k, e.g. 1640

Page 44: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 44

PP

2

Plaxton Mesh (2)

• Additional routing table on top of ring• Routing table construction by example

– Node 3a7f keeps following routing table

• Kleene star * for wildcards– Flexibility to choose proximate neighbors

• Invariant: row i of any node in row i interchangeable

30* 31* 32* 33* 34* 35* 36* 37* 38* 39* self 3b* 3c* 3d* 3e* 3f*

3a0* 3a1* 3a2* 3a3* 3a4* 3a5* 3a6* self 3a8* 3a9* 3aa* 3ab* 3ac* 3ad* 3ae* 3af*

0* 1* 2* self 4* 5* 6* 7* 8* 9* a* b* c* d* e* f*

3a70* 3a71* 3a72* 3a73* 3a74* 3a75* 3a76* 3a77* 3a78* 3a79* 3a7a* 3a7b* 3a7c* 3a7d* 3a7e* self

Page 45: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 45

PP

2

Plaxton Routing

• To route from 1234 to abcd:1. 1234 uses rt row 1: jump to a*, eg a999

2. a999 uses rt row 2: jump to ab*, eg ab11

3. ab11 uses rt row 3: jump to abc*, eg abc0

4. abc0 uses rt row 4: jump to abcd

• Routing terminates in log(N) hops– In practise log(n),

where N is id size and n is number of nodes

Page 46: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 46

PP

2

Pastry

• Leaf set– Successor-list in both directions– Periodically gossiped to all leafs O(n2) [Bamboo]

• Plaxton-mesh on top of ring– Failures in routing table

• Get replacement from any node on same row

• Routing1) Route directly to responsible node in leaf set,

otherwise2) Route to closer (prefix) node, otherwise3) Route on ring

Page 47: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 47

PP

2

Routing Table Initialization

• How does a new node initialize RT?– New node lookup its own id– At step i copy row i of the node

• Good if latencies are symmetric

– Example:Assume new node abcd knows 1234

1. 1234 uses rt row 1: jump to a*, eg a9992. a999 uses rt row 2: jump to ab*, eg ab113. ab11 uses rt row 3: jump to abc*, eg abc04. abc0 uses rt row 4: jump to abcd

Page 48: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 48

PP

2

constant number of neighbors

De-bruijn graphsKoorde

Page 49: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 49

PP

2

Even less routing info…

• How much routing state necessary?• Moore bound from graph theory

– Assume each node has k neighbors– How many nodes (at most) reachable in d hops?

– 0 hops: 1

– 1 hop: 1+k

– 2 hops: 1+k+k(k-1)

– 3 hops: 1+k+k(k-1)+k(k-1)2

– d hops: 1+k∑(k-1)i

=1+k((k-1)d-1)/(k-2)

Page 50: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 50

PP

2

Moore bound

• Given k pointers per node– In d hops maximum n nodes are reachable– n ≤ 1+k((k-1)d-1)/(k-2)

– Solve d as a function of n– d ≥ logk-1[(n(k-2)+2)/k] ≈ logk n

• In DHTs, each node has k=log(n) neighbors– d ≈ loglog nn = log n/log(log n)

• So, optimally, for n nodes, with log(n) pointers we should reach everyone in log(n)/log(log n) hops

Page 51: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 51

PP

2

Optimal Graphs

• De Bruijn graphs provide our bounds

• Example k=2– Consider each node’s identifier in binary– Each node i should know 2 neighbors:

• 2i (mod N)• 2i+1 (mod N)

• Example:– Node 011011 knows 110110 and 110111

Page 52: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 52

PP

2

Routing in De Bruijn Graphs

• Example– k=2, n=23=8

• Routing– Main idea:

• Each hop shifts in one final digit (left-to-right)

– Eg node 110 wants to find 011• 110 jumps to 100 [010]

• 100 jumps to 001 [010]

• 001 jumps to 011 [011]

Page 53: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 53

PP

2

Routing in De Bruijn Graphs

• Lookup algorithm at node m– Initially kshift=k (key to lookup)– All operations (<<) mod N

Page 54: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 54

PP

2

Making a DHT of De Bruijn Graphs

• With d=2 pointers we get log(N) hops, where N is id space size (2160)– How to achieve log(n), n=number of nodes

• Main idea– Route on an imaginary 2160 graph,

• Invariant: go to predecessor of imaginary node

– Store pointer called d to predecessor of 2i

Page 55: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 55

PP

2

Koorde DHT

– Algorithm at node m• i is imaginary node

– Initially i=m.successor

• Initially k=kshift

Page 56: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 56

PP

2

2-hop Lemma

• The number of hops is w.h.p. at most 3log(N)– i.e. we need 2 succ traversals per De Bruijn hop

• When at m=predecessor(i)– Jump to 2m

– Traverse successor to reach predecessor(2i)• (2i-2m)/N fraction of space, with n(2i-2m)/N nodes• On average i-m = N/n • So n(2N/n)/N = 2 nodes traversed

QED

Page 57: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 57

PP

2

Koorde works! (2)

• Still O(log N), how to get O(log n)?

• Use flexibility in i parameter– Can set i to any node in range m, m.succ

– Set low bits of i to maximize number of final digits

Page 58: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 58

PP

2O(log n) Hops Koorde Theorem

• Distance between m and i – on avg N/n– Whp the distance is more than N/n2

– Number of low-order bits in range• log(N)-2log(n) bits can be set arbitrarily

• Need to route total log(N) bits– log(N)-2log(n) already done– 2log(n) bits needed to be shifted

QED

Page 59: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 59

PP

2

architecture of structured overlays

a formal view of DHTs

Page 60: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 60

PP

2General Architecture for DHTs

• Metric space S with distance function d– d(x,y)≥0

– d(x,x)=0

– d(x,y)=0 x=y

– d(x,y) + d(y,z) ≤ d(x,z)

– d(x,y)=d(y,x) (not always)

• Eg:– d(x,y) = y – x (mod N) Chord

– d(x,y) = x xor y Kademlia

– d(x,y) = sqrt( (x1-y1)2 + … + (xd-yd)2 ) CAN

Page 61: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 61

PP

2

Graph Embedding

• Embed a virtual graph for routing– Powers of 2 (Chord)– Plaxton mesh (Pastry/Tapestry)– Hypercube – De-bruijn (Koorde, DH)– Butterfly (Viceroy)

• A node responsible for many virtual identifiers– Eg Chord nodes responsible for all virtual ids between

node id and predecessor

Page 62: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 62

PP

262

Abstracting a tree Actual connectivity graph

0 1

00 01

000 001 010 011 100 101 110 111

A B C D E F G H

A

B

C

D

E

F

G

H

• Structural replication– Multiple peers responsible for the same key-space– Multiple routes resolving same prefix

P-Grid (EPFL)

Page 63: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 63

PP

263

Query at A for 010 - A forwards it to D

000

011

A

B

C

D

E

F

G

H

0 1

00 01

000 001 010 011 100 101 110 111

A B C D E F G H

Query forwarding in P-Grid

Page 64: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 64

PP

264

011

Query at A for 010 - D forwards it to C who has the answers!

000010A

B

C

D

E

F

G

H

0 1

00 01

000 001 010 011 100 101 110 111

A B C D E F G H

Query forwarding in P-Grid

Page 65: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 65

PP

265

z x

y

*** 0 *** 1

New node y wants to join the network

z

x

y

*** 0 *** 1

Nodes y and z negotiates to repartition the key-space(alternatively, they could have decided to be replicas)

*** 01*** 00

Node joining in P-Grid

•Multiple peers can also decide to be replicas of the same partition

–Structural replication (a.k.a. Zone overloading)–Different kind of replication than in Chord

Page 66: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 66

PP

266

directory(logical ID <-> IP address)

(if local cache does not work)lookup IP address

P-Grid

routing based on logical address (and cached IP)

Self-referential directory

routing based on logical address lookup IP address

in case of failure

Churn: Membership dynamics (peers leave and re-join) Peers rejoin with dynamic IP addresses

You may want to reconnect with the same guy

• Social/trust networks …

• Storage systems … (returning back with content)

P-Grid’s Self-referential directory and overlay maintenance

Page 67: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 68

PP

268

Stale cache

1 : 12, 1301 : 5, 10001: 9,4

1 1

1 : 12, 1301 : 5,14001: 9,4

7 1

1 : 6,1301 :10,14000: 1,7

4 2,3

1 : 8,201 : 3, 10000: 1,7

9 2,3

1 : 8, 1300 : 7,9011: 3,10

5 4,5

1 : 2,1200 : 9,4011: 3,10

14 4,5

1 : 6,800 : 1,7010: 5,14

10 6,7

1 : 11,1200 : 1,9010: 5,14

3 6,7

0 : 4,711 : 2,12101: 8,13

11 8,9

1 : 1,311 : 2,12101: 8,13

6 8,9

0 : 5,911 : 2,12100: 6,11

13 10,11

0 : 4,911 : 2,12100: 6,11

8 10,11

0 : 5,710 : 6,13

12 12,13,14

0 : 1,1410 : 11,13

2 12,13,14

0 1

00

000 001

01

010 011

10

100 101

11

ID

ID

1 : 2 ,12

Up-to-date cache

Presently online

Presently offnline

LEGEND

4, 5 at 5,14

This toy example uses 4-bit representation of ID as the corresponding keyInformation about peer 4 is stored corresponding to key 0100 at peers 5,14

[Aberer04]

Page 68: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 69

PP

269

1 : 12, 1301 : 5, 10001: 9,4

1 1

1 : 12, 1301 : 5,14001: 9,4

7 1

1 : 6,1301 :10,14000: 1,7

4 2,3

1 : 8,201 : 3, 10000: 1,7

9 2,3

1 : 8, 1300 : 7,9011: 3,10

5 4,5

1 : 2,1200 : 9,4011: 3,10

14 4,5

1 : 6,800 : 1,7010: 5,14

10 6,7

1 : 11,1200 : 1,9010: 5,14

3 6,7

0 : 4,711 : 2,12101: 8,13

11 8,9

1 : 1,311 : 2,12101: 8,13

6 8,9

0 : 5,911 : 2,12100: 6,11

13 10,11

0 : 4,911 : 2,12100: 6,11

8 10,11

0 : 5,710 : 6,13

12 12,13,14

0 : 1,1410 : 11,13

2 12,13,14

0 1

00

000 001

01

010 011

10

100 101

11

ID

ID

1 : 2 ,12

Stale cache

Up-to-date cache

Presently online

Presently offnline

query(01*) @ 7…query(0101) @ 7 (for stale entry 5, cycle -> abort)…query(1110) @ 7 (for stale entry 14, forward to 12 or 13)…query(1110) @ 12 (is offline)…query(1110) @ 13 (for stale entry 2)……query(0010) @ 13 (forward to 5)……query(0010) @ 5 (forward to 7)……query(0010) @ 7 (forward to 9)……query(0010) @ 9 (new entry for 2 found !)…query(1110) @ 2 (new entry for 14 found !)query(01*) @ 14 (finally )

Page 69: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 70

PP

270

• Encountering unusable routes trigger queries recursively

• Recursive queries heal the network - A family of more efficient and adaptive overlay maintenance schemes (than proactive approaches)

- two extremes (of this family): Correction on Use, Correction on Failure

• System operates at a dynamic equilibrium

Self-healing recursive queries

Page 70: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 71

PP

271

At steady-state: Effects of churn and self-healing cancel out Churn => ID-to-IP changes (unusable routing entries) Healing => make routes usable again

Dynamic equilibrium under churn

Page 71: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 72

PP

272

Steady state: Probability distribution of the number of stale reference does not change We can obtain the repair cost and routing performance (latency / message cost) corresponding to this steady state.

0 refstale

1 refstale

2 refstale

r refstale…

repairs

IDchange

IDchange

IDchange

IDchange

r references (redundancy) per routing level per peer

Dynamic equilibrium under churn

Page 72: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 73

PP

273

Contour map of cost/resilience trade-offs

Dynamic equilibrium under churn

Page 73: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 74

PP

274

Comparison of maintenance mechanisms based on degree of laziness• Breakdown of the lazy mechanism

Dynamic equilibrium under churn

Analogous to Bamboo’s empirical experience of positive feedbacks!

Page 74: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 75

PP

275

Reactivestrategies

Taxonomy of route maintenance mechanisms (circa 2004)

Page 75: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 76

PP

276

Prevention is better than cure

•Predictive and proactive strategies for routing table maintenance

–Kademlia•Like P-Grid but uses XOR metric for routing

– Accordion•Like Chord but exploiting properties of algebraic small-world networks

Page 76: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 77

PP

277

Kademlia

• Note similarity of topology with P-Grid - but uses different (XOR) routing mechanism

[Maymounkov02]

Page 77: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 78

PP

278

XOR routing

Page 78: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 79

PP

279

Reducing the effect of churn

Empirical observation from Gnutella trace: Probability of remaining online for another hour (y-axis) as a function of uptime (x-axis in minutes).

• Least recently seen eviction policy for `k-bucket´- but never evicts live nodes

Page 79: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 80

PP

280

• . [Kleinberg00]• Guarantees poly-log n lookup hops • Allows smooth expansion of routing table

xx

1 space] IDin away isneighbor Pr[

Proactive route maintenanceBased on: Small-world distribution is flexible - useful for only long distance routes

Accordion [Li05]

Page 80: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 81

PP

281

Main idea: Evicting stale entries efficiently

• Delete proactively before a lookup times out

• Pinging uses bandwidth inefficiently

• Predict each entry’s Pr(alive)

• Delete entries with Pr(alive) < threshold

Proactive route maintenance

Page 81: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 82

PP

282

Analytic results

h(p

)Avg

Loo

kup

hops

+ t

imeo

uts

Delete entries with Pr(alive) < x

Best threshold

Delete AggressivelyDelete lazily

Choosing best deletion threshold

Page 82: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 83

PP

283

U: known uptime A: time since last contacted Age

joined

UA

UUlifetimeAUlifetimealive

)|Pr()Pr(

• With Pareto session time:

Last contacted now

Timeline

• Delete entry if < thresholdUA

U

Predicting routing entry liveness

Page 83: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 84

PP

284

CostBandwidth budget (bytes/node/sec)

Per

form

ance

Avg

lo

oku

p l

aten

cy (

mse

c)

Evaluation: performance/cost tradeoff

Page 84: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 85

PP

285

Comparing with parameterized DHTs

Avg

lo

oku

p l

aten

cy (

mse

c)

Avg bandwidth consumed (bytes/node/sec)

Page 85: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 86

PP

286

Avg

Lo

oku

p l

aten

cy (

mse

c)

Avg bandwidth consumed (bytes/node/sec)

Convex hull outlines best tradeoffs

Page 86: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 87

PP

287

Lowest latency for varying churn

Median node session time (hours)

Avg

look

up la

tenc

y (m

sec)

• Accordion has lowest latency at low churn • Accordion’s latency increases slightly at high churn

Fixed budget,Variable churn

Page 87: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 88

PP

288

Accordion stays within budget

Median node session time (hours)

Avg

ban

dwid

th (

byte

s/no

de/s

ec)

• Other protocols’ bandwidth increases with churn

Fixed budget,Variable churn

Page 88: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 89

PP

289

Conclusions• Reactive strategies

– Redundancy can be exploited• To determine the degree of laziness• Trade-off between cost/resilience

– May lead to catastrophic failures under high churn (particularly for a lazy reactive strategy)

• e.g., because of positive feed-back

• Proactive strategies– Reduces the chance of catastrophic failure

• At the cost of continuous bandwidth usage– Sometimes unneccessarily

• Most maintenance strategies ignore the fact that persistent IDs may be useful

• E.g., does not look into the storage maintenance costs that need be carried out as a collateral

Page 89: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 90

PP

2

Bootstrapping structured overlays

Part III

Page 90: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 91

PP

291

Issues: – Properties of the resulting overlay

Load-balance, proximity, …

– Bootstrapping mechanisms• Sequential, Parallelized

- some implicit centralization• Decentralized

– Cost and overheads

• Construction cost & latency, …

Bootstrapping structured overlays

Page 91: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 92

PP

292

In the beginning, there was …

Trivia:

The term "bootstrapping" alludes to a German legend about Baron Münchhausen, who claimed to have been able to lift himself out of a swamp by pulling himself up by his own hair. In later versions of the legend, he used his own boot straps to pull himself out of the sea which gave rise to the term bootstrapping. The term is believed to have entered computer jargon during the early 1950s by way of Heinlein's short story By His Bootstraps first published in 1941. (from Wikipedia)

Bootstrapping

Page 92: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 93

PP

293

Load-balancing in DHTs

Load balancing in peer-to-peer (P2P) systems is a mechanism to spread various kinds of loads like storage, access and message forwarding among participating peers in order to achieve a fair or optimal utilization of contributed resources such as storage and bandwidth. For example,

Bootstrapping overlays

While bootstrapping an overlay network, we need to ensure good load-balancing characteristics.

– System with N homogeneous nodes

– The load is optimally balanced, • Load of each node is around 1/N of the total load.

Page 93: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 94

PP

294

Load-balancing in DHTs

A First step: DHT

Use uniform hashing

The basic idea: Generate keys for each object to be stored by applying uniform (consistent) hashing (e.g. SHA-1)

• The keys are then uniformly distributed over the key-space

Assign peers to a part of the key-space by also applying (the same) hashing, on lets say on the peers’ IP address*

• Peers are then distributed uniformly over the key-space

This was expected to achieve load-balance

* Hashing was also expected to provide security in the original design of Chord, etc.

Page 94: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 95

PP

295

• Analysis of distribution of data

• Example– Parameters

• 4,096 nodes• 500,000 documents

– Optimum• ~122 documents

per node

Optimal distribution of documents across nodes

Load-balancing in DHTs

[Rieche06]

Page 95: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 96

PP

296

• Number of nodes storing no document– Parameters

• 4,096 nodes• 100,000 to 1,000,000

documents

– Some nodes w/o any load

Load-balancing in DHTs

Something’s wrong! What? Why??

Page 96: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 97

PP

297

Balls into bins analogy

• n number of intervals (bins)– Intervals of equal size

• m number of items (balls)• sequentially choose a bin randomly for

each ball– A bin is hit with probability p = 1/n

• The number of balls in a bin is then given by the binomial distribution– Binomial distribution– Standard deviation

)(1

11

)(imi

b nni

miloadp

nn

mb

11

Page 97: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 98

PP

298

A quick example (balls into bins)

Using mathematica

Expected value

In[61]:= Arraya,50; Doai0,i,1,50;DoxCeiling50Random;axax1,j,1,1000HistogramArraya,50,FrequencyData True

10 20 30 40 50

5

10

15

20

25

30

Page 98: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 99

PP

299

Node C

Node A

Node B

Load-balancing in DHTs: Virtual Servers

• Each node is responsible for several intervals– "Virtual server"

• Example– Chord

Chord Ring

Increase the effective “n” by having many virtual peers for the same physical computer

nn

mb

11[Rao03]

Page 99: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 100

PP

2100

• Each node is responsible for several intervals– log (n) virtual servers

• Load balancing– Different possibilities to change servers

• One-to-one• One-to-many• Many-to-many

– Copy of an interval is like removing and inserting a node in a DHT

Virtual Server

Page 100: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 101

PP

2101

L L

L

L

LHH

HL

Load stealing/shedding

• One-to-One– Light node picks a random ID– Contacts the node x responsible for it– Accepts load if x is heavy

Slide has animation

Page 101: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 102

PP

2102

Light nodes

L1

L4

L2

L3

Heavy nodes

H3

H2

H1

Directories

D1

D2

L5

• One-to-Many– Light nodes report their load information to directories– Heavy node H gets this information by contacting a directory– H contacts the light node which can accept the excess load

Load stealing/shedding

Slide has animation

Page 102: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 103

PP

2103

Heavy nodes

H3

H2

H1

Directories

D1

D2L4

Light nodes

L1

L2

L3

L4

L5

• Many-to-Many– Many heavy and light nodes rendezvous at each step– Directories periodically compute the transfer schedule and report it

back to the nodes, which then do the actual transfer

Load stealing/shedding

Slide has animation

Page 103: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 104

PP

2104

• Advantages– Easy shifting of load

• Whole Virtual Servers are shifted

– Can be extended for heterogeneous environments*• More virtual servers for a resource rich node

• Disadvantages– Increased administrative and message overheads

• Maintenance of all Finger-Tables

– Much load is shifted– Much more overlay traffic

Load-balancing in DHTs: Virtual Servers

* [Godfrey05]

Page 104: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 105

PP

2105

• Idea– One hash function for all nodes

• h0

– Multiple hash functions for data• h1, h2, h3, …hd

• Two options– Data is stored at one node– Data is stored at one node &

other nodes store a pointer

Load-balancing in DHTs: Power of 2 choices

[Byers03]

Page 105: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 106

PP

2106

• Inserting Data– Results of all hash functions are calculated

• h1(x), h2(x), h3(x), …hd(x)

– Data is stored on the retrieved node with the lowest load

– Alternative• Other nodes stores pointer

Load-balancing in DHTs: Power of 2 choices

Page 106: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 107

PP

2107

• Retrieving– Without pointers

• Results of all hash functions are calculated• Request all of the possible nodes in parallel• One node will answer

– With pointers• Request only one of the possible nodes.• Node can forward the request directly to the final

node

Load-balancing in DHTs: Power of 2 choices

Page 107: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 108

PP

2108

• Advantages– Simple– Generic randomized algorithm

• Disadvantages (with the specific realization)

– Message overhead at inserting data– With pointers

• Additional administration of pointers – More load– More adverse effect of churn

– Without pointers• Message overhead at every search

Load-balancing in DHTs: Power of 2 choices

Page 108: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 109

PP

2109

A quick example (power of two choices)

Using mathematica

Expected value

In[67]:=

Arrayb,50; Dobi0,i,1,50;DoxCeiling50Random;yCeiling50Random; Ifbx by,by by1,bx bx1,j,1,1000HistogramArrayb,50,FrequencyData True

10 20 30 40 50

5

10

15

20

Page 109: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 110

PP

2110

11 21

2

3

1

4

3

2

d=2d=2

So far ...

• Bootstrapping DHTs– Uniform key distributions

…– Peers joined the network

quasi-sequentially• The network was

partitioned incrementally

• Next– Non-uniform keys– Parallelized construction

CAN network construction

Page 110: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 111

PP

2111

Beyond DHTs: Data-oriented overlays

Preserve ordering information As occurring in natural language (say).

Needs more sophisticated (storage) load-balancing mechanisms to support

range partitioned data

Uniform hashing (used in DHTs) destroys ordering information!

Resource Key What is a suitable function? Depends on the application needs!

Figure courtesy Sarunas

Page 111: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 112

PP

2112

Beyond DHTs: Data-oriented overlays

• Complex queries– Approximate or similarity queries

• DHTs can only support exact search

– Range queries– etc.

• e.g., Skyline queries

• Overlay supporting arbitrarily skewed load-distributions– DHTs are just a special case

Page 112: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 113

PP

2113

Parallelized construction of overlays

• Shortcomings of sequential construction– Implicitly assumes some coordinator

• Implicit centralization

– Slow• Since peers join one by one

• Parallelized construction– Faster– Analogous to (re-)indexing a new attribute in a

DB • Can be useful for recovery from catastrophic

failures

Page 113: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 114

PP

2114

Parallelized construction of overlays

Skewed load-distribution

1

23

45

6 7

8

• Given– A mechanism to meet other random peers

• e.g., an existing unstructured overlay

– A parameter p• Determined according to the load-skew

[Aberer05]

Page 114: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 115

PP

2115

Distributed proportional partitioning:

- p fraction of peers take one half of the space (partition 0)

- 1-p fraction of peers take the other half (partition 1)- Needed for partitioning the key-space

in a granularity adaptive to load-skew

Skewed load-distribution

1

23

45

67

8

p = 0.75

0 1

Parallelized construction of overlays

Page 115: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 116

PP

2116

Referential integrity:

- Each peer needs to know some peer from the complimentary partition- Needed for overlay routing- This constraint necessitates a

non-trivial algorithm (in order to reduce communication cost during overlay construction)

Skewed load-distribution

1

23

45

6 7

8

0 1

Each of themneeds to know

7 or 8 vice versa

Parallelized construction of overlays

Page 116: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 117

PP

2117

Markov Partitioning process - peers are decided 0/1 or undecided - each undecided peer interacts with

some random peer - which has decided 0/1 or is still undecided

Skewed load-distribution

1

23

45

6 7

8

0 1

know s 6

know s 1

vice versa

Parallelized construction of overlays

[Aberer05]

Page 117: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 118

PP

2118

Used recursively: Partitions are repartitioned (using

appropriate parameters) A load-balanced overlay is formed

Skewed load-distribution

1

2

3

4

5

6 7

8

00

010 011

1

Several other practical issues - local estimates of parameter p - replication factor (re-)balancing

Now we can builda load-balanced overlay in a parallelized manner

for rather arbitrary load-skews

Parallelized construction of overlays

Page 118: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 119

PP

2119

• Advantages– Parallelized and fast construction– No need of coordination

• Since no need of sequential joins

– Load-balancing for arbitrary load-skews

• Disadvantages (with the specific realization)

– Complex• Algorithm design, analysis and implementation• Needs partial global information

– e.g., parameter choices (based on sampling)

Distributed proportional partitioning

Page 119: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 120

PP

2120

• Pairing and merging virtual trees– Pair nodes randomly

• By probing potential successors – and accepting/rejecting probes

• Paired nodes act as virtual supernode• Repeat the process

– needs a mechanism to merge such virtual trees

Other parallelized construction mechanism: Sorting peer-IDs to build a ring

1 7

1

5 9

5

1 5

1

7 9

7

1Virtual nodes

Pairing Merging tree (with sorted peers)

… …

[Angluin05]

Page 120: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 121

PP

2121

• Gossip based mechanism– Nodes start with random

subsets• Leaf-set: Maintain a

constant number of nodes– Arrange them as potential

predecessors/successors» Ideally equal number

of each• Gossip leaf-set information

with nodes it knows– E.g., its current leaf-set

nodes (may also include past ones)

– Refine information & repeat process

Other parallelized construction mechanism: Sorting peer-IDs to build a ring

7: 4, 5, 9, 109: 6, 8, 12, 14…

node leaf-set

Node 7 gossips its leaf-set with nodes it knows (including node 9), and each node refreshes their leaf-sets.

after gossip

7: 5, 6, 8, 99: 7, 8, 10, 12…

recalculated leaf-set

Gradually converges to form a sorted list [Montresor05]

Page 121: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 122

PP

2122

• Advantages– Parallelized and fast construction– No need of coordination

• Since no need of sequential joins– Relatively simple (no global information)– Gossip based mechanism is robust against churn

during sorting process• Disadvantages (with the specific realizations)

– Do not take into account load-balancing issues– Not directly applicable for systems using structural

replication/zone over-loading– Just builds the basic ring, but not the long range links

• Though not complicated to build once the ring is in place– Pairing & merging mechanism is vulnerable to churn

during the sorting process

Sorting peer IDs

Page 122: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 123

PP

2123

So far …

Bootstrapping issues: – Load-balance– How?

• Sequential• Parallelized

Assumes (implicitly) that any peer can potentially meet any other peer, and thus already are part of one connected network, and then build a single structured

overlay composed of all these peers.

Page 123: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 124

PP

2124

Figure from http://www.tellagate.com/kojima/blog/

Cluster A

Cluster B

Cluster C

Cluster X

Network 1 formed over time

Network 2 formed over time

join

Merger is accomplished trivially & transparently

The network can thus grow by organic merger of smaller (originally isolated) networks, allowing decentralized bootstrapping of Gnutella like unstructured overlays

Bootstrapping in unstructured networks

Page 124: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 125

PP

2125

• Overlay merger– Needed for decentralized bootstrapping

– Needed for recovery from partitioning• Ignored in P2P literature!

– Lack of experience with real deployments

– Focus on other issues like churn

• Trivial in unstructured and super-peer networks

– Merger of index is a standard DB issue

Merging structured overlays

[Datta07]

Page 125: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 126

PP

2126

• Overlay merger– Correctness of routing

• Maintain routing table

– Correct and complete key binding• Ship data to responsible peer(s)

– Replica synchronization

For locating the desired data/content, both are essential!

Merging structured overlays

Page 126: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 127

PP

2127

• Merging tree topology with structural replication (e.g., P-Grid) has significantly different challenges than merging ring based networks.– Merging P-Grid networks transparently is much

simpler algorithmically.• So we use this case to illustrate the idea, the

challenges, important metrics …

Merging structured overlays

Page 127: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 128

PP

2128

• If they have the same path– Synchronize replicas

• If one has a strict prefix path– Extend path and routing table, synchronize replica

• Stimulate new interactions

When peers from different networks meet

Page 128: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 129

PP

2129

When peers from different networks meet

Page 129: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 130

PP

2130

• Keys continue to be accessible to peers– Replica synchronization needed to access keys

from the other network

The merger process should be transparent. At application level, all the keys which are once accessible

continue to be accessible to individual users(unless the application deletes them).

When peers from different networks meet

Page 130: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 131

PP

2131

- This transparency may be violated until the replicas are actually synchronized!- Ideally, if we could determine when the sync process is completed throughout the network, but could distinguish peers from originally different networks in the meanwhile, then we could retain transparency.- Instead of detecting global completion, use a heuristic time out (once local sync is completed).

When peers from different networks meet

Page 131: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 132

PP

2132

• 3 axis

– Network sizes

– Duplicate content in the original unmerged networks

– Heuristic parameter (time out)

Parameter space

Page 132: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 133

PP

2133

• Recall (over time) Ri/j

– Metric from Information retrieval • Recall is the fraction of the documents that are

relevant to the query that are successfully retrieved.

– Ri/i should always be 1 for the merger to be transparent to applications

• Volume of data transferred

Important performance metrics

Page 133: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 134

PP

2134

Recall

Page 134: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 135

PP

2135

Volume of data transferred

Page 135: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 136

PP

2136

Increases when there is more common data! - The allotment of peers’ key-space change, so … Merger o

f two networks with originally arbitra

rily diffe

rent key-space partitions

Volume of data transferred

Page 136: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 137

PP

2137

Volume of data transferred

Page 137: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 138

PP

2138

Recall (with worst choice of timeout)

Page 138: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 139

PP

2

Concluding remarks

Part IV

Page 139: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 140

PP

2

DHTs

• Characteristic property– Self-manage responsibilities in presence:

• Node joins• Node leaves• Node failures• Load-imbalance• Replicas

• Basic structure of DHTs– Metric space– Embed graph with efficient search algo– Let each node simulate many virtual nodes

Page 140: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 141

PP

2

The future of DHTs

• DHTs automatically handle– Replication, faults, load-balancing, joins, leaves, …

• One-size-fits-all?– Need dynamically auto-tunable DHTs– Difference applications have different needs

• Stronger guarantees– Consistency models– Transactions– Access layers

Page 141: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 142

PP

2142

To P2P or not to P2P

• P2P vs. dedicated infrastructure– Is it technically feasible to realize everything using

P2P?• May be: Its an open issue

– Unlikely, in terms of performance– Harder to guarantee reliability (no one is fully accountable)

Page 142: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 143

PP

2143

To P2P or not to P2P

• P2P vs. dedicated infrastructure– Shall P2P be preferred whenever technically

possible?• Don’t think so …

– A matter of risk/cost vs. benefit trade-offs

Page 143: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 144

PP

2144

To P2P or not to P2P

• P2P vs. dedicated infrastructure– How about scalability?

• Again, depends– With enough money, client-server can scale in many cases

» e.g., Google – Network resource consuming applications like content

distribution may scale better using a P2P approach than client-server

Page 144: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 145

PP

2145

To summarize …

• P2P makes sense if:– Budget/resource is limited

• Dedicated infrastructure is unsustainable or makes less economic sense

– Wide interest and relevance • To form a critical mass of users contributing resources

– Trust between participants is reasonably `high’• What’s `high’ depends on the application

– Rate of change is manageable• E.g., membership dynamics is not `too high’

– Criticality is `low’• Since it is harder to guarantee reliability or QoS in P2P• E.g., Skype’s disclaimer states its not for making

emergency calls!

Page 145: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 146

PP

2146

To summarize …

• P2P systems exhibit following characteristics:– Autonomy from central servers– Use of edge resources

• Instead of dedicated infrastructure– Intermittent connectivity– Reliance on self-organizing mechanisms using

limited (locally available) information • No global coordination and control

– Unlike other distributed systems like Grid

Page 146: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 147

PP

2147

References

[Aberer05] Indexing data-oriented overlay networks. Karl Aberer, Anwitaman Datta, Manfred Hauswirth, Roman Schmidt (VLDB 2005)

[Angluin05] Fast construction of overlay networks. D. Angluin, J. Aspnes, J. Chen, Y. Wu and Y. Yin (SPAA 2005) [Byers03] Simple Load Balancing for Distributed Hash Tables.J. Byers, J. Considine, and M. Mitzenmacher (IPTPS 2003)

[Datta07] Merging Intra-Planetary Index Structures: Decentralized Bootstrapping of Overlays. Anwitaman Datta (SASO 2007)

[Ghodsi06] Distributed k-ary System: Algorithms for Distributed Hash TablesAli Ghodsi, Dissertation, KTH—Royal Institute of Technology, Sweden, 2006

[Godfrey05] Heterogeneity and Load Balance in Distributed Hash Tables. P. Brighten Godfrey and Ion Stoica (INFOCOM 2005)

[Montresor05] Chord on Demand. A. Montresor, M. Jelasity and O. Babaoglu (P2P 2005) [Rao03] Load Balancing in Structured P2P Systems.A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, I. Stoica (IPTPS 2003)

[Rieche06] Ref: Ralf Steinmetz, Klaus Wehrle (Eds): Peer-to-Peer Systems and Applications.Reliability and Load-Balancing in DHTs, Simon Rieche, Klaus Wehrle, Heiko Niedermayer, Stefan Götz

[Xu03] On the Fundamental Tradeoffs between Routing Table Size and Network Diameter in Peer-to-Peer Networks.J. Xu, A. Kumar and X. Yu (JSAC 2003)

Page 147: Structured overlays: Self-organization and Scalability @ SASO 2009 PP 2 1 Structured Overlays - Self-organization and Scalability by Anwitaman Datta –

Structured overlays: Self-organization and Scalability @ SASO 2009 148

PP

2148

References

[Aberer04] Efficient, self-contained handling of identity in Peer-to-Peer systems.Karl Aberer, Anwitaman Datta, Manfred Hauswirth IEEE Transactions on Knowledge and Data Engineering (TKDE) 16(7), 2004.

[Kleinberg00] The small-world phenomenon: An algorithmic perspective.J. Kleinberg. Proc. 32nd ACM Symposium on Theory of Computing (STOC) 2000.

[Li05] Bandwidth-efficient Management of DHT Routing Tables. Jinyang Li, Jeremy Stribling, Robert Morris, and M. Frans Kaashoek.Usenix Symposium on Networked Systems Design and Implementation (NSDI) 2005.

[Rhea04] Handling Churn in a DHT. Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. Proceedings of the USENIX Annual Technical Conference, June 2004.

[Maymounkov02] Kademlia: A Peer-to-peer Information System Based on the XOR MetricPetar Maymounkov and David Mazières (IPTPS 2002)