Chord and CFS Philip Skov Knudsen (skov@diku.dk) Niels Teglsbo Jensen (teglsbo@diku.dk) Mads...

Preview:

Citation preview

Chord and CFS

Philip Skov Knudsen (skov@diku.dk)

Niels Teglsbo Jensen (teglsbo@diku.dk)

Mads Lundemann (thenox@diku.dk)

Distributed hash table

• Stores values at nodes

• Hash function

• Name -> Hash key, name can be any string or byte array

• Article mixes up key and ID

• Chord

• CFS

Chord

A scalable Peer-to-peer Lookup Protocol for Internet Applications

Chord purpose

• Map keys to nodes

• (Compared to Freenet: No anonymity)

Goals

• Load balance

• Decentralization

• Scalability

• Availability

• Flexible naming

Consistent hashing

Simple network topology

Efficient network topology

Lookup algorithm

Node joining

26.join(friend) -> 26.successor = 32

26.stabilize -> 32.notify(26)

21.stabilize -> 21.successor=26 -> 26.notify(21)

Preventing lookup failure

• Successor list length r

• Disregarding network failures

• Assuming each node failing within one stabilization period with probability p:

• Connectivity loss for a node with probability: p^r

Path lengths from simulation

Probability densityfunction for path length in anetwork of 2^12 nodes.

Path lengths with varying N

Load balance

Nodes: 10^4, keys: 5*10^5

Virtual servers

10^4 nodes and 10^6 keys

Resilience to failed nodes

In a network of 1000 nodes

Latency stretch

In a network of 2^16 nodesc = Chord latencyi = IP latencystretch = c / i

CFS

Wide-area cooperative storage

Purpose

• Distributed cooperative file system

System design

File system using DHash

Block placement

Tick mark: Block IDSquare: Server responsible for ID (in Chord)Circles: Servers holding replicasTriangle: Servers receiving a copy of the block to cache

Availability

• r servers holding replicas of a block

• The server responsible for ID is responsible for detecting failed replica servers

• If the server responsible for ID fails the new server in charge will be the first replica server

• Replica server detects this when Chord stabilizes

• Replica nodes are found in the successor list

Persistence

• Each server promises to keep a copy of a block available for at least an agreed-on interval

• Publishers can ask for extensions

• This does not apply to cached copies, but to replicas

• The server responsible for the ID is also responsible for relaying extension requests to servers holding replicas

Load balancing

• Consistent hashing

• Virtual servers

• Caching

Preventing flooding

• Each CFS server limits any one IP address to using a certain percentage of its storage

• Percentage might be lowered as more nodes enter the network

• Can be circumvented by clients with dynamic IP addresses

Efficiency

• Efficient lookups using Chord

• Prefetching

• Server selection

Conclusion

• Efficient

• Scalable

• Available

• Load-balanced

• Decentralized

• Persistent

• Prevents flooding