28
Tapestry: Decentralized Routing and Location Ben Y. Zhao CS Division, U. C. Berkeley Slides are modified and presented by Kyoungwon Suh

Tapestry: Decentralized Routing and Location

Embed Size (px)

DESCRIPTION

Tapestry: Decentralized Routing and Location. Ben Y. Zhao CS Division, U. C. Berkeley. Slides are modified and presented by Kyoungwon Suh. Outline. Problems facing wide-area applications Tapestry Overview Mechanisms and protocols Preliminary Evaluation Related and future work. - PowerPoint PPT Presentation

Citation preview

Page 1: Tapestry: Decentralized  Routing and Location

Tapestry: Decentralized

Routing and Location

Ben Y. Zhao

CS Division, U. C. Berkeley

Slides are modified and presented by Kyoungwon Suh

Page 2: Tapestry: Decentralized  Routing and Location

2

Outline

Problems facing wide-area applications

Tapestry Overview

Mechanisms and protocols

Preliminary Evaluation

Related and future work

Page 3: Tapestry: Decentralized  Routing and Location

3

Motivation Shared Storage systems need an data

location/routing mechanism– Finding the peer in a scalable way is a difficult problem– Efficient insertion and retrieval of content in a large

distributed storage infrastructure Existing solutions

– Centralized: expensive to scale, less fault tolerant,vulnerable to DoS attacks (e.g. Napster,DNS,SDS)

– Flooding: not scalable (e.g. Gnutella)

Page 4: Tapestry: Decentralized  Routing and Location

4

Key: Location and Routing

Hard problem:– Locating and messaging to resources and data

Approach: wide-area overlay infrastructure:– Scalable, Dynamic, Fault-tolerant, Load balancing

Page 5: Tapestry: Decentralized  Routing and Location

5

Decentralized Hierarchies Centralized hierarchies

– Each higher level node responsible for locating objects in a greater domain

Decentralize: Create a tree for object O (really!)– Object O has its

own root andsubtree

– Server on each levelkeeps pointer tonearest object in domain

– Queries search up inhierarchy

Root ID = O

Directory servers tracking 2 replicas

Page 6: Tapestry: Decentralized  Routing and Location

6

What is Tapestry? A prototype of a decentralized, scalable, fault-tolerant,

adaptive location and routing infrastructure(Zhao, Kubiatowicz, Joseph et al. U.C. Berkeley)

Network layer of OceanStore global storage systemSuffix-based hypercube routing– Core system inspired by Plaxton Algorithm (Plaxton,

Rajamaran, Richa (SPAA97))

Core API:– publishObject(ObjectID, [serverID])– sendmsgToObject(ObjectID)– sendmsgToNode(NodeID)

Page 7: Tapestry: Decentralized  Routing and Location

7

Incremental Suffix Routing Namespace (nodes and objects)

– large enough to avoid collisions (~2160?)(size N in Log2(N) bits)

Insert Object:– Hash Object into namespace to get ObjectID– For (i=0, i<Log2(N), i+j) { //Define hierarchy

j is base of digit size used, (j = 4 hex digits) Insert entry into nearest node that matches on

last i bits When no matches found, then pick node matching

(i – n) bits with highest ID value, terminate

Page 8: Tapestry: Decentralized  Routing and Location

8

Routing to Object Lookup object

– Traverse same relative nodes as insert, except searching for entry at each node

– For (i=0, i<Log2(N), i+n) Search for entry in nearest node matching on last i bits

Each object maps to hierarchy defined by single root– f (ObjectID) = RootID

Publish / search both route incrementally to root Root node = f (O), is responsible for “knowing” object’s

location

Page 9: Tapestry: Decentralized  Routing and Location

9

Object LocationRandomization and Locality

Page 10: Tapestry: Decentralized  Routing and Location

10

4

2

3

3

3

2

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

Tapestry MeshIncremental suffix-based routing

NodeID0x43FE

NodeID0x13FENodeID

0xABFE

NodeID0x1290

NodeID0x239E

NodeID0x73FE

NodeID0x423E

NodeID0x79FE

NodeID0x23FE

NodeID0x73FF

NodeID0x555E

NodeID0x035E

NodeID0x44FE

NodeID0x9990

NodeID0xF990

NodeID0x993E

NodeID0x04FE

NodeID0x43FE

Page 11: Tapestry: Decentralized  Routing and Location

11

Contribution of this workPlaxtor Algorithm

Limitations– Global knowledge

algorithms– Root node vulnerability– Lack of adaptability

Tapestry Distributed algorithms

Dynamic node insertion

Dynamic root mapping– Redundancy in location

and routing– Fault-tolerance protocols– Self-configuring / adaptive– Support for mobile objects

Application Infrastructure

Page 12: Tapestry: Decentralized  Routing and Location

12

Dynamic Insertion Example

NodeID0x243FE

NodeID0x913FENodeID

0x0ABFE

NodeID0x71290

NodeID0x5239E

NodeID0x973FE

NEW0x143FE

NodeID0x779FE

NodeID0xA23FE

Gateway0xD73FF

NodeID0xB555E

NodeID0xC035E

NodeID0x244FE

NodeID0x09990

NodeID0x4F990

NodeID0x6993E

NodeID0x704FE

4

2

3

3

3

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

NodeID0x243FE

Page 13: Tapestry: Decentralized  Routing and Location

13

Fault-tolerant Location Minimized soft-state vs. explicit fault-recovery Multiple roots

– Objects hashed w/ small salts multiple names/roots– Queries and publishing utilize all roots in parallel– P(finding Reference w/ partition) = 1 – (1/2)n

where n = # of roots

Soft-state periodic republish– 50 million files/node, daily republish,

b = 16, N = 2160 , 40B/msg, worst case update traffic: 156 kb/s,

– expected traffic w/ 240 real nodes: 39 kb/s

Page 14: Tapestry: Decentralized  Routing and Location

14

Fault-tolerant Routing Detection:

– Periodic probe packets between neighbors Handling:

– Each entry in routing map has 2 alternate nodes– Second chance algorithm for intermittent failures– Long term failures alternates found via routing

tables Protocols:

– First Reachable Link Selection– Proactive Duplicate Packet Routing

Page 15: Tapestry: Decentralized  Routing and Location

15

Simulation Environment Implemented Tapestry routing as packet-level

simulator Delay is measured in terms of network hops Do not model the effects of cross traffic or

queuing delays Four topologies: AS, MBone, GT-ITM, TIERS

Page 16: Tapestry: Decentralized  Routing and Location

16

Results: Location Locality

Measuring effectiveness of locality pointers (TIERS 5000)

RDP vs Object Distance (TI5000)

0

2

4

6

8

10

12

14

16

18

20

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Object Distance

RD

P

Locality Pointers No Pointers

Page 17: Tapestry: Decentralized  Routing and Location

17

Results: Stability via Redundancy

Parallel queries on multiple roots. Aggregate bandwidth measures b/w used for soft-state republish 1/day and b/w used by requests at rate of 1/s.

Retrieving Objects with Multiple Roots

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5

# of Roots Utilized

La

ten

cy

(H

op

Un

its

)

0

10

20

30

40

50

60

70

Ag

gre

ga

te B

an

dw

idth

(k

b/s

)

Average Latency Aggregate Bandwidth per Object

Page 18: Tapestry: Decentralized  Routing and Location

18

Related Work Content Addressable Networks

– Ratnasamy et al., (ACIRI / UCB)

Chord– Stoica, Morris, Karger, Kaashoek,

Balakrishnan (MIT / UCB) Pastry

– Druschel and Rowstron(Rice / Microsoft Research)

Page 19: Tapestry: Decentralized  Routing and Location

19

Strong Points Designed system based on Theoretically

proven idea (Plaxton Algorithm) Fully decentralized and scalable solution for

deterministic location and routing problem

Page 20: Tapestry: Decentralized  Routing and Location

20

Weaknesses/Improvements Substantially complicated

– Esp, dynamic node insertion algorithm is non-trivial, and each insertion will take a non-negligible amount of time.

– Attempts to insert a lot of nodes at the same time Where to put “root” node for a given object

– Needs universal hashing function– Possible to put “root” to near expected clients

dynamically?

Page 21: Tapestry: Decentralized  Routing and Location

21

Question/Suggestions?

Page 22: Tapestry: Decentralized  Routing and Location

22

Backup Slides Follow…

Page 23: Tapestry: Decentralized  Routing and Location

23

Routing to NodesExample: Octal digits, 218 namespace, 005712 627510

005712

340880 943210

387510

834510

727510

627510

Neighbor MapFor “5712” (Octal)

Routing Levels1234

xxx1

5712

xxx0

xxx3

xxx4

xxx5

xxx6

xxx7

xx02

5712

xx22

xx32

xx42

xx52

xx62

xx72

x012

x112

x212

x312

x412

x512

x612

5712

0712

1712

2712

3712

4712

5712

6712

7712

005712 0 1 2 3 4 5 6 7

340880 0 1 2 3 4 5 6 7

943210 0 1 2 3 4 5 6 7

834510 0 1 2 3 4 5 6 7

387510 0 1 2 3 4 5 6 7

727510 0 1 2 3 4 5 6 7

627510 0 1 2 3 4 5 6 7

Page 24: Tapestry: Decentralized  Routing and Location

24

Dynamic InsertionOperations necessary for N to become fully integrated: Step 1: Build up N’s routing maps

– Send messages to each hop along path from gateway to current node N’ that best approximates N

– The ith hop along the path sends its ith level route table to N– N optimizes those tables where necessary

Step 2: Send notify message via acked multicast to nodes with null entries for N’s ID, setup forwarding ptrs

Step 3: Each notified node issues republish message for relevant objects

Step 4: Remove forward ptrs after one republish period Step 5: Notify local neighbors to modify paths to route

through N where appropriate

Page 25: Tapestry: Decentralized  Routing and Location

25

Dynamic Root Mapping Problem: choosing a root node for every object

– Deterministic over network changes– Globally consistent

Assumptions– All nodes with same matching suffix contains same

null/non-null pattern in next level of routing map– Requires: consistent knowledge of nodes across

network

Page 26: Tapestry: Decentralized  Routing and Location

26

Plaxton Solution Given desired ID N,

– Find set S of nodes in existing network nodes n matching most # of suffix digits with N

– Choose Si = node in S with highest valued ID Issues:

– Mapping must be generated statically using global knowledge

– Must be kept as hard state in order to operate in changing environment

– Mapping is not well distributed, many nodes in n get no mappings

Page 27: Tapestry: Decentralized  Routing and Location

27

Tapestry Solution Globally consistent distributed algorithm:

– Attempt to route to desired ID Ni

– Whenever null entry encountered, choose next “higher” non-null pointer entry

– If current node S is only non-null pointer in rest of route map, terminate route, f (N) = S

Assumes:– Routing maps across network are up to date– Null/non-null properties identical at all nodes sharing

same suffix

Page 28: Tapestry: Decentralized  Routing and Location

28

AnalysisGlobally consistent deterministic mapping

Null entry no node in network with suffix consistent map identical null entries across same route

maps of nodes w/ same suffix

Additional hops compared to Plaxtor solution: Reduce to coupon collector problem

Assuming random distribution With n ln(n) + cn entries, P(all coupons) = 1-e-c

For n=b, c=b-ln(b), P(b2 nodes left) = 1-b/eb = 1.8 10-6

# of additional hops Logb(b2) = 2

Distributed algorithm with minimal additional hops