Upload
kovit
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems. Bruce Hammer, Steve Wallis, Raymond Ho. 10.1:Introduction. Peer-to-Peer Systems Where data and computational resources are contributed by many hosts - PowerPoint PPT Presentation
Citation preview
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems
Bruce Hammer, Steve Wallis, Raymond Ho
Bruce Hammer, Steve Wallis, Raymond Ho 2
10.1: Introduction Peer-to-Peer Systems
Where data and computational resources are contributed by many hosts
Objective to balance network traffic and reduce the load on the primary host
Management requires knowledge of all hosts, their accessibility, (distance in number of hops), availability and performance.
They exploit existing naming, routing, data replication and security techniques in new ways
Bruce Hammer, Steve Wallis, Raymond Ho 3
10.1: Introduction Goal of Peer-to-Peer Systems
Sharing data and resources on a very large scale ‘Applications that exploit resources available at
the edges of the Internet – storage, cycles, content, human presence’ (Shirky 2000)
Uses data and computing resources available in the personal computers and workstations
Bruce Hammer, Steve Wallis, Raymond Ho 4
10.1: Introduction Characteristics of Peer-to-Peer Systems
Each computer contributes resources All the nodes have the same functional
capabilities and responsibilities No centrally-administered system Offers a limited degree of anonymity Algorithm for placing and accessing the data
Balance workload, ensure availability Without adding undue overhead
Bruce Hammer, Steve Wallis, Raymond Ho 5
10.1: Introduction Evolution of Peer-to-Peer Systems
Napster – download music, return address Freenet, Gnutella, Kazaa and BitTorrent
More sophisticated – greater scalability, anonymity and fault tolerance
Pastry, Tapestry, CAN, Chord, Kademlia Peer-to-peer middleware
Bruce Hammer, Steve Wallis, Raymond Ho 6
10.1: Introduction Evolution (Continued)
Immutable Files, (music, video) GUIDs (Globally Unique Identifiers) Middleware to provide better routing algorithms,
react to outages Evolve to mutable files Application within one company’s intranet
Bruce Hammer, Steve Wallis, Raymond Ho 7
10.2: Napster and its Legacy Napster
Provided a means for users to share music files – primarily MP3s
Launched 1999 – several million users Not fully peer-to-peer since it used central servers to
maintain lists of connected systems and the files they provided, while actual transactions were conducted directly between machines
Proved feasibility of a service using hardware and data owned by ordinary Internet users
Bruce Hammer, Steve Wallis, Raymond Ho 8
10.2: Napster and its Legacy
Napster serverIndex1. File location
2. List of peers
request
offering the file
peers
3. File request
4. File delivered5. Index update
Napster serverIndex
Bruce Hammer, Steve Wallis, Raymond Ho 9
10.2: Napster and its Legacy Bit Torrent
Designed and implemented 2001 Next generation from Napster - true Peer To Peer (P2P) Can handle large files e.g WAV, DVD, FLAC (e.g 1CD =
approx 500KB) After the initial pieces transfer from the seed, the pieces
are individually transferred from client to client. The original seeder only needs to send out one copy of the file for all the clients to receive a copy
Tracker URL hosted at Bit Torrent site e.g Traders Den
Bruce Hammer, Steve Wallis, Raymond Ho 10
10.2: Napster and its Legacy Bit Torrent (contd)
Many Bit Torrent clients e.g Vuze Keep track of seeders and leechers Torrent – contains metdata about the files to be shared
and about the tracker Tracker - coordinates the file distribution, and which
controls which other peers to download the pieces of the file.
Bruce Hammer, Steve Wallis, Raymond Ho 11
10.2: Napster and its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 12
10.2: Napster and its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 13
10.2: Napster and Its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 14
10.2: Napster and its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 15
10.2: Napster and its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 16
10.2: Napster and its Legacy
Bruce Hammer, Steve Wallis, Raymond Ho 17
10.3: Peer-to-Peer Middleware Peer To Peer Middleware
To provide mechanism to access data resources anywhere in network
Functional Requirements : Simplify construction of services across many hosts in wide
network Add and remove resources at will Add and remove new hosts at will Interface to application programmers should be simple and
independent of types of distributed resources
Bruce Hammer, Steve Wallis, Raymond Ho 18
10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd)
Non-Functional Requirements : Global Scalability Load Balancing Optimization for local interactions between neighboring peers Accommodation to highly dynamic host availability Security of data in an environment simplify construction of
services across many hosts in wide network Anonymity, deniability and resistance to censorship
Bruce Hammer, Steve Wallis, Raymond Ho 19
10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd)
Global scalability, dynamic host availability and load sharing and balancing across large numbers of computers pose major design challenges.
Design of Middleware layer Knowledge of locations of objects must be distributed
throughout network Use of replication to achieve this
Bruce Hammer, Steve Wallis, Raymond Ho 20
10.4: Routing Overlays Routing Overlays
Sub-systems, APIs, within the peer-to-peer middleware Responsible for locating nodes and objects Implements a routing mechanism in the application layer
Separate from any other routing mechanisms such as IP routing
Ensures that any node can access any object by routing each request thru a sequence of nodes Exploits knowledge at each node to locate the destination
Bruce Hammer, Steve Wallis, Raymond Ho 21
10.4: Routing Overlays GUIDs
‘pure’ names or opaque identifiers Reveal nothing about the locations of the objects Building blocks for routing overlays
Computed from all or part of the state of the object using a function that deliver a value that is very likely to be unique. Uniqueness is then checked against all other GUIDs
Not human readable
Bruce Hammer, Steve Wallis, Raymond Ho 22
10.4: Routing Overlays Tasks of a routing overlay
Client submits a request including the object GUID, routing overlay routes the request to a node at which a replica of the object resides
A node introduces a new object by computing its GUID and announces it to the routing overlay
Clients can remove an object Nodes may join and leave the service
Bruce Hammer, Steve Wallis, Raymond Ho 23
10.4: Routing Overlays Types of Routing Overlays
DHT – Distributed Hash Tables Computes the GUID from all or part of the state of the object
DOLR – Distributed Object Location and Routing DOLR is a layer over the DHT that maps GUIDs and address of
nodes
DHT – GUIDs are stored based on the hash value DOLR – GUIDs host address is notified using the
Publish() operation
Bruce Hammer, Steve Wallis, Raymond Ho 24
10.5: Overlay Case Studies: Pastry, Tapestry
Both Pastry and Tapestry adopt the prefix routing approach
Pastry has a straightforward but effective design. It is the message routing infrastructure deployed in applications such as PAST, an archival file system
Bruce Hammer, Steve Wallis, Raymond Ho 25
10.5: Overlay Case Studies: Pastry, Tapestry
Tapestry is the basis for OceanStore storage system. It has a more complex architecture than Pastry because it aims to support a wider range of locality approaches
Bruce Hammer, Steve Wallis, Raymond Ho 26
10.5: Overlay Case Studies: Pastry, Tapestry
Let’s talk about Pastry
Bruce Hammer, Steve Wallis, Raymond Ho 27
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry A routing overlay with the common
characteristics All the nodes and objects are assigned 128-bit
GUIDs Nodes are computed by applying a secure hash
function such as SHA-1 to the public key with each node is provided
Bruce Hammer, Steve Wallis, Raymond Ho 28
10.5: Overlay Case Studies: Pastry, Tapestry
Objects such as files he GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state
The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128
-1 In a network with N participating nodes, the
Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps
Bruce Hammer, Steve Wallis, Raymond Ho 29
10.5: Overlay Case Studies: Pastry, Tapestry
GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one
Active nodes take responsibility of rprocessing requests addressed to al objects in their numerical neighborhood
Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination
Bruce Hammer, Steve Wallis, Raymond Ho 30
10.5: Overlay Case Studies: Pastry, Tapestry
The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops
Pastry users a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node
Bruce Hammer, Steve Wallis, Raymond Ho 31
10.5: Overlay Case Studies: Pastry, Tapestry
The participated Hosts are fully self organizing and obtaining the data need to construct a routing table and other required state from existing members in O(log N) messages, where N is the number of hosts participating in the overlay
When a node fails, the remaining nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure
Bruce Hammer, Steve Wallis, Raymond Ho 32
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry Routing Algorithm The algorithm involves the user of a routing table
at each node to route messages efficiently. Describe the algorithm in two stages
Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table
Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages
Bruce Hammer, Steve Wallis, Raymond Ho 33
10.5: Overlay Case Studies: Pastry, Tapestry
Stage 1: Each active node stores a leaf set – a vector L (of
size of 2l) The vector contains the GUIDs and IP addresses
of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below)
Leaf sets are maintained by Pastry as nodes join and leave
Bruce Hammer, Steve Wallis, Raymond Ho 34
10.5: Overlay Case Studies: Pastry, Tapestry
Even after a node failure they will be corrected within a short time within the defined maximum rate of failure
The GUID space is treated as circular
Bruce Hammer, Steve Wallis, Raymond Ho 35
10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but
inefficient
Node A (65A1FC) receives message M with destination address D (D46A1C)
Destination D
Bruce Hammer, Steve Wallis, Raymond Ho 36
10.5: Overlay Case Studies: Pastry, Tapestry
Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of
routing tables Each Pastry node maintains a tree-structured
routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values
Bruce Hammer, Steve Wallis, Raymond Ho 37
10.5: Overlay Case Studies: Pastry, Tapestry
Structure of a routing table
Bruce Hammer, Steve Wallis, Raymond Ho 38
10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be
delivered in ~log 16 (N) hops.
Bruce Hammer, Steve Wallis, Raymond Ho 39
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry’s routing algorithm
Bruce Hammer, Steve Wallis, Raymond Ho 40
10.5: Overlay Case Studies: Pastry, Tapestry
Host integration New nodes use a joining protocol to acquire their
routing table and leaf set contents Notify other nodes of changes they must make to
their tables.
Bruce Hammer, Steve Wallis, Raymond Ho 41
10.5: Overlay Case Studies: Pastry, Tapestry
Host failure or departure Nodes in Pastry infrastructure may fail or depart
without warning A node is considered failed when its immediate
neighbours can no longer communicate with it Required to repair the leaf sets that contain the
failed node’s GUID
Bruce Hammer, Steve Wallis, Raymond Ho 42
10.5: Overlay Case Studies: Pastry, Tapestry
Locality The locality metric is used to compare candidates and the
closest available node is chosen This mechanism cannot produce globally optimal routings
because available information is not comprehensive
Bruce Hammer, Steve Wallis, Raymond Ho 43
10.5: Overlay Case Studies: Pastry, Tapestry
Fault tolerance Use ‘at-least-once’ delivery mechanism and
repeat several time sin the absence of a response to allow Pastry a longer time window to detect and repair node failures
Bruce Hammer, Steve Wallis, Raymond Ho 44
10.5: Overlay Case Studies: Pastry, Tapestry
Both Pastry and Tapestry adopt the prefix routing approach Pastry has a straightforward but effective design. Tapestry has a more complex architecture than
Pastry because it aims to support a wider range of locality approaches
Bruce Hammer, Steve Wallis, Raymond Ho 45
10.5: Overlay Case Studies: Pastry, Tapestry
Let’s talk about Pastry
Bruce Hammer, Steve Wallis, Raymond Ho 46
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry A routing overlay network All the nodes and objects are assigned 128-bit
GUIDs Nodes are computed by applying a secure hash
function such as SHA-1 to the public key with each node is provided
Bruce Hammer, Steve Wallis, Raymond Ho 47
10.5: Overlay Case Studies: Pastry, Tapestry
Objects such as files the GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state
The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128
-1 In a network with N participating nodes, the
Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps
Bruce Hammer, Steve Wallis, Raymond Ho 48
10.5: Overlay Case Studies: Pastry, Tapestry
GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one
Active nodes take responsibility of processing requests addressed to all objects in their numerical neighborhood
Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination
Bruce Hammer, Steve Wallis, Raymond Ho 49
10.5: Overlay Case Studies: Pastry, Tapestry
The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops
Pastry uses a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node
Bruce Hammer, Steve Wallis, Raymond Ho 50
10.5: Overlay Case Studies: Pastry, Tapestry
The participated Hosts are fully self organizing Nodes obtains data from network to construct a
routing table and other required state from existing members
When a node fails, the remaining nodes reconfigure the required changes in the routing structure
Bruce Hammer, Steve Wallis, Raymond Ho 51
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry Routing Algorithm The algorithm involves the use of a routing table
at each node to route messages efficiently. Describe the algorithm in two stages
Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table
Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages
Bruce Hammer, Steve Wallis, Raymond Ho 52
10.5: Overlay Case Studies: Pastry, Tapestry
Stage 1: Each active node stores a leaf set – a vector L (of
size of 2l) The vector contains the GUIDs and IP addresses
of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below)
Leaf sets are maintained by Pastry as nodes join and leave
Bruce Hammer, Steve Wallis, Raymond Ho 53
10.5: Overlay Case Studies: Pastry, Tapestry
Even after a node failure they will be corrected within a short time within the defined maximum rate of failure
The GUID space is treated as circular
Bruce Hammer, Steve Wallis, Raymond Ho 54
10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but
inefficient
Node A (65A1FC) receives message M with destination address D (D46A1C)
Destination D
Bruce Hammer, Steve Wallis, Raymond Ho 55
10.5: Overlay Case Studies: Pastry, Tapestry
Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of
routing tables Each Pastry node maintains a tree-structured
routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values
Bruce Hammer, Steve Wallis, Raymond Ho 56
10.5: Overlay Case Studies: Pastry, Tapestry
Structure of a routing table
Bruce Hammer, Steve Wallis, Raymond Ho 57
10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be
delivered in ~log 16 (N) hops.
Node A
Bruce Hammer, Steve Wallis, Raymond Ho 58
10.5: Overlay Case Studies: Pastry, Tapestry
Pastry’s routing algorithm
Bruce Hammer, Steve Wallis, Raymond Ho 59
10.5: Overlay Case Studies: Pastry, Tapestry
Host integration New nodes use a joining protocol to acquire their
routing table and leaf set contents Notify other nodes of changes they must make to
their tables.
Bruce Hammer, Steve Wallis, Raymond Ho 60
10.5: Overlay Case Studies: Pastry, Tapestry
Host failure or departure Nodes in Pastry infrastructure may fail or depart
without warning A node is considered failed when its immediate
neighbours can no longer communicate with it Required to repair the leaf sets that contain the
failed node’s GUID
Bruce Hammer, Steve Wallis, Raymond Ho 61
10.5: Overlay Case Studies: Pastry, Tapestry Locality
The Pastry routing structure is highly redundant
The locality metric is used to compare candidates and the closest available node is chosen
This mechanism cannot produce globally optimal routings because available information is not comprehensive
Bruce Hammer, Steve Wallis, Raymond Ho 62
10.5: Overlay Case Studies: Pastry, Tapestry
Fault tolerance Send ‘heartbeat’ messages to neighboring nodes Use ‘at-least-once’ delivery mechanism and
repeat several times in the absence of a response to allow Pastry a longer time window to detect and repair node failures
Introduce randomness into the Pastry routing algorithm to overcome the problem
Bruce Hammer, Steve Wallis, Raymond Ho 63
10.5: Overlay Case Studies: Pastry, Tapestry
Dependability Additional dependability measures and some
performance optimizations in the host management algorithms were included in the update version called MSPastry by the authors
Bruce Hammer, Steve Wallis, Raymond Ho 64
10.5: Overlay Case Studies: Pastry, Tapestry
Evaluation Work Use MSPastry to evaluate the impact on
perfromance and dependability of the host join/leave rate and the associated dependability mechanisms
Bruce Hammer, Steve Wallis, Raymond Ho 65
10.5: Overlay Case Studies: Pastry, Tapestry
Tapestry Implements a distributed hash table and routes
messages to nodes based on GIDs associated with resources using prefix routing in a manner similar to Pastry.
API conceals the distributed hash table from applications behind a Distributed Object Location and Routing (DOLR) interface
Bruce Hammer, Steve Wallis, Raymond Ho 66
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel OceanStore Ivy file stores
Bruce Hammer, Steve Wallis, Raymond Ho 67
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel
The SHA-1 secure hash function is applied to the RL of each cached object to produce a 128-bit Pastry GUID
Authors based on the end-to-end argument, the HTTPS protocol should be used to achieve a much better guarantee of those interactions that require it
Bruce Hammer, Steve Wallis, Raymond Ho 68
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel implementation
the node whose GUID is numerically closest to the GUID of an object becomes that object’s home node to hold any cached copy of the object
Client nodes respond to cache local and remote web objects
Request a fresh copy of a object from home node if no copy in local cache
Bruce Hammer, Steve Wallis, Raymond Ho 69
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Evaluation of Squirrel
The reduction in total external bandwidth used The latency perceived by users for access to web
objects The computational and storage load imposed on
client nodes
Bruce Hammer, Steve Wallis, Raymond Ho 70
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel OceanStore file store
Aims to provide a very large scale, incrementally-scalable persistent storage facility for mutable data objects long-term persistence reliability in constantly changing network and
computing resources used for NFS-like file service, electronic mail hosting,
database sharing persistent storage of large numbers of data objects
Bruce Hammer, Steve Wallis, Raymond Ho 71
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel
Built prototype, called Pond to validate the OceanStore design and compare its performance with traditional approaches
Pond uses Tapestry routing overlay mechanism to place blocks of data at nodes distributed throughout the Internet and to dispatch requests to them
Bruce Hammer, Steve Wallis, Raymond Ho 72
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel
Bruce Hammer, Steve Wallis, Raymond Ho 73
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel
OceanStore/Pond Storage organization Data objects are analogous to files, with their data
stored in a set of blocks Each object represents an ordered sequence of
immutable version Three types of identifier used in the storage
BGUID - Secure hash of a data block VGUID - BGUID of the root block of a version AGUID – Uniquely identifies all the version of an object
Bruce Hammer, Steve Wallis, Raymond Ho 74
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Performance of OceanStore/Pond
Pond is prototype to prove feasibility of a scalable peer-to-peer file service
Evaluated against several purpose-designed benchmarks including Andrew benchmark
Use Simple emulation of an NFS client and server
Bruce Hammer, Steve Wallis, Raymond Ho 75
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Conclusions of OceanStore/Pond
When operating over a wide-area network Substantially exceeds NFS for reading Within a factor of three of NFS for updating files and
directories LAN results are substantially worse Overall, the results suggest that an Internet-scale peer-
to-peer file service would be an effective solution for the distribution of files that do not change very rapidly
Bruce Hammer, Steve Wallis, Raymond Ho 76
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system
A read/write file system supports multiple readers and writers
Implemented over an overlay routing layer Distributed hash-addressed data store Emulates a Sun FNS server Stores the state of files as logs Scans the logs to reconstructs the files
Bruce Hammer, Steve Wallis, Raymond Ho 77
10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system
Resolved issues to host files in partially trusted or unreliable machines The maintenance of consistent file metadata Partial trust between participants and vulnerability Continued operation during partitions
Bruce Hammer, Steve Wallis, Raymond Ho 78
10.7: Summary Napster – immutable data, unsophisticated
routing Current – mutable data, routing overlays,
sophisticated algorithms Internet or company intranet support Distributed Computing (SETI)
Bruce Hammer, Steve Wallis, Raymond Ho 79
10.7: Summary Benefits of Peer-to-Peer Systems
Ability to exploit unused resources (storage, processing) in the host computers
Scalability to support large numbers of clients and hosts with load balancing of network links and host computer resources
Self-organizing properties of the middleware platforms reduces costs
Bruce Hammer, Steve Wallis, Raymond Ho 80
10.7: Summary Weaknesses of Peer-to-Peer Systems
Costly for the storage of mutable data compared to trusted, centralized service
Can not yet guarantee anonymity to hosts
Bruce Hammer, Steve Wallis, Raymond Ho 81
10: Peer-to-Peer Systems Questions????
Comments??