40
Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Embed Size (px)

Citation preview

Page 1: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Peer-to-Peer Filesystems

Slides originally created by Tom Roeder

Page 2: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Announcements

Welcome back from Spring break!

We are in the final stretch Homework 4 and Project 4 design doc this week Homework 5 and 6 due in April Project 5 and 6 due in April and beginning of May

Prelim II, Thursday April 26th??? Final, Wednesday May 17th

Page 3: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Goals for Today

What is P2P? P2P File Sharing Systems

Case study: Napster Overlays

Structured Unstructured

Distributed hash tables (DHTs) Hash table ontop of structured overlay Case studies: PAST, CFS, OpenDHT Content-Addressable Storage (CAS)

Page 4: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Nature of P2P Systems

P2P: communicating peers in the system normally an overlay in the network Hot topic because of Napster, Gnutella, etc

In some sense, P2P is older than the name many protocols used symmetric interactions not everything is client-server

What’s the real definition? no-one has a good one, yet depends on what you want to fit in the class

Page 5: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Nature of P2P Systems

Standard definition symmetric interactions between peers no distinguished server

Minimally: is the Web a P2P system? We don’t want to say that it is but it is, under this definition I can always run a server if I want: no asymmtery

There must be more structure than this Let’s try again

Page 6: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Nature of P2P Systems

Recent definition No distinguished initial state Each server has the same code servers cooperate to handle requests clients don’t matter: servers are the P2P system

Try again: is the Web P2P? No, not under this def: servers don’t interact

Is the Google server farm P2P? Depends on how it’s set up? Probably not.

Page 7: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Case study: Napster File Sharing service created by Shawn Fanning in 1999 Flat FS: single-level FS with no hierarchy

Multiple files can have the same name

All storage done at edges: Hosts export set of files stored locally Host is registered with centralized directory

Uses keepalive messages to check for connectivity Centralized directory notified of file names exported by the host

File lookup: client sends request to central directory Directory server sends 100 files matching the request to client Client pings each host, computes RTT and displays results Client transfers files from the closest host

File transfers are peer-to-peer; central directory not part

Page 8: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Architecture

NetworkFirewall

IP Sprayer/Redirector

NapsterDirectoryServer 1

NapsterDirectoryServer 2

NapsterDirectoryServer 3Napster.com

H1

H2

H3

Page 9: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Protocol

NetworkFirewall

IP Sprayer/Redirector

NapsterDirectoryServer 1

NapsterDirectoryServer 2

NapsterDirectoryServer 3Napster.com

H1

H2

H3

I have “metallica / enter sandman”

Page 10: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Protocol

NetworkFirewall

IP Sprayer/Redirector

NapsterDirectoryServer 1

NapsterDirectoryServer 2

NapsterDirectoryServer 3Napster.com

“who has metallica ?”

“check H1, H2”

H1

H2

H3

I have “metallica / enter sandman”

Page 11: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Protocol

NetworkFirewall

IP Sprayer/Redirector

NapsterDirectoryServer 1

NapsterDirectoryServer 2

NapsterDirectoryServer 3Napster.com

“who has metallica ?”

“check H1, H2”

pingping

H1

H2

H3

I have “metallica / enter sandman”

Page 12: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Protocol

NetworkFirewall

IP Sprayer/Redirector

NapsterDirectoryServer 1

NapsterDirectoryServer 2

NapsterDirectoryServer 3Napster.com

“who has metallica ?”

“check H1, H2”

pingping

H1

H2

H3

transfer

I have “metallica / enter sandman”

Page 13: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Napster Discussion

Issues: Centralized file location directory Load balancing Relies on keepalive messages Scalability an issue!

Success: ability to create and foster an online community Built in ethics Built in faults Communication medium

Had upto 40 million users in June 2000! May actually be lower, 26.4 million by February 2001

Ultimately found to be illegal service

Page 14: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Other P2P File Sharing Systems

Napster has a central database! Removing it will make regulating file transfers harder

Freenet, gnutella, kazaa … all are decentralized Freenet: anonymous, files encrypted

So not known which files stored locally, which file searched

Kazaa: allows parallel downloads (Bit)Torrents for faster download

Legality. Are there any good legal uses for P2P systems?

Page 15: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Overlays

P2P systems possess some degree of self-organization each node finds its peers and helps maintain the system structure

Two types of overlays connect most P2P systems Unstructured

No infrastructure set up for routing Random walks, flood search

Structured Small World Phenomenon: Kleinberg Set up enough structure to get fast routing We will see O(log n) For special tasks, can get O(1)

Page 16: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Overlays: Unstructured

From Gribble a common unstructured overlay look at connectivity more structure than it seems at first

Page 17: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Overlays: Unstructured

Gossip: state synchronization technique Instead of forced flooding, share state Do so infrequently with one neighbor at a time Original insight from epidemic theory

Convergence of state is reasonably fast with high probability for almost all nodes good probabilistic guarantees

Trivial to implement Saves bandwidth and energy consumption

Page 18: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Overlays: Structured

Need to build up long distance pointers think of routing within levels of a namespace eg. namespace is 10 digit numbers base 4

0112032101 then you can hop levels to find other nodes

This is the most common structure imposed

Page 19: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Distributed Hash Tables

One way to do this structured routing Assign each node each node an id from space eg. 128 bits: SHA-1 salted hash of IP address build up a ring: circular hashing assign nodes into this space

Value diversity of neighbors even coverage of space less chance of attack?

Page 20: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Distributed Hash Tables Why “hash tables”?

Stored named objects by hash code Route the object to the nearest location in space key idea: nodes and objects share id space

How do you find an object without its name? Close names don’t help because of hashing

Cost of churn? In most P2P apps, many joins and leaves

Cost of freeloaders?

Page 21: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Distributed Hash Tables

Dangers Sybil attacks: one node becomes many id attacks: can place your node wherever Solutions hard to come by

crytpo puzzles / money for IDs? Certification of routing and storage?

Many routing frameworks in this spirit Very popular in late 90s early 00s Pastry, Tapestry, CAN, Chord, Kademlia

Page 22: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Applications of DHTs

Almost anything that involves routing illegal file sharing: obvious application backup/storage filesystems P2P DNS

Good properties O(log N) hops to find an id (but how good is this?) Non-fate-sharing id neighbors Random distribution of objects to nodes

Page 23: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Pastry: Node state

Page 24: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Pastry: Node Joins

Find another geographically nearby node Hash IP address to get Pastry id Try to route a join message to this id get routing tables from each hop and dest select neighborhood set from nearby node get the leaf set from the destination Give info back to nodes so they can add you

Assuming the Pastry ring is well set up, this procedure will give good parameters

Page 25: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Pastry: Node Joins

Consider what happens from node 0 bootstraps itself next node to come adds itself and adds this node Neighborhood information will be bad for a while

need a good way to discover network proximity This is a current research problem

On node leaves, do the reverse If a node leaves suddenly, must be detected removal from tables by detecting node

Page 26: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Pastry: Routing

The key idea: grow common prefix given an object id, try to send to a node with at

least one more digit in common if not possible, send to a node that is closer

numerically if not possible, then you are the destination

Gives O(log N) hops Each step gets closer to destination Guaranteed to converge

Page 27: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

How Does Routing/Lookup Work?

0…

10…

110…

111…

Lookup ID

Source

Response

Assign IDs to nodes Map hash values to node

with closest ID Leaf set is successors

and predecessors All that’s needed for

correctness Routing table matches

successively longer prefixes Allows efficient lookups

Page 28: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

PAST: Pastry Filesystem

Now a simple filesystem follows: to get a file, hash its name and look up in Pastry to store a file, store it Pastry

Punt on metadata/discovery Can implement directories as files Then just need to know the name of root

Shown to give reasonable utilization of storage space

Page 29: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

PAST: File Replication

Since any one node might fail, replicate Uses the neighbor set for k-way storage Keeps the same file at each neighbor Diversity of neighbors helps fate-sharing

Certification Each node signs a certificate

Says that it stored the file Client will retry storage if not enough certificates

OK guarantees

Page 30: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

PAST: Tradeoffs

No explicit FS structure: Could build any sort of system by storing files Basically variable-sized block storage mechanism This buys simplicity at the cost of optimization

Speed vs. storage See Beehive for this tradeoff Makes it an explicit formula; can be tuned

Ease of use vs. security Hashes make file discovery non-transparent

Page 31: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Rationale and Validation

Backing up on other systems no fate sharing automatic backup by storing the file

But Cost much higher than regular filesystem Incentives: why should I store your files? How is this better than tape backup? How is this affected by churn/freeloaders Will anyone ever use it?

Page 32: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

PAST: comparsion to CFS

CFS: a filesystem built on Chord/DHash Pastry is MSR, Chord/DHash is MIT Very similar routing and storage

Page 33: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

PAST: comparison to CFS

PAST stores files, CFS blocks Thus CFS can use more fine-grained space lookup could be much longer

get each block: must go through routing for each CFS claims: ftp-like speed

Could imagine much faster: get blocks in parallel thus routing is slowing them down Remember: hops here are overlay, not internet, hops

Load balancing in CFS predictable storage requirements per file per node

Page 34: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

IP

ChordDHT

CFS(MIT)

PastryDHT

PAST(MSR/Rice)

TapestryDHT

OStore(UCB)

CANDHT

pSearch(HP)

KademliaDHT

Coral(NYU)

ChordDHT

i3(UCB)

DHT Deployment Today

BambooDHT

Antiquity(UCB)

connectivity

Every application deploys its own DHT(DHT as a library)

BambooDHT

PIER(UCB)

Page 35: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Why so many DHT implementations?

Distributed Hash Table Peer-to-peer algorithm to offering put/get interface Associative map for peer-to-peer applications

More generally, provide lookup functionality Map application-provided hash values to nodes (Just as local hash tables map hashes to memory locs.) Put/get then constructed above lookup

Many proposed applications File sharing, end-system multicast, aggregation trees

Page 36: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

IP

ChordDHT

CFS(MIT)

PastryDHT

PAST(MSR/Rice)

TapestryDHT

OStore(UCB)

BambooDHT

PIER(UCB)

CANDHT

pSearch(HP)

KademliaDHT

Coral(NYU)

ChordDHT

i3(UCB)

BambooDHT

Antiquity(UCB)

DHT

connectivity

indirection

OpenDHT: one DHT, shared across applications(DHT as a service)

DHT Deployment Tomorrow?

Page 37: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Issues Faster lookup.

Cornell’s Kelips, Beehive DHTs achieve 1-hop lookups Churn

When nodes come and go, some algorithms perform repairs that involve disruptive overheads

For example, CFS and PAST are at risk of copying tons of data to maintain replication levels!

Robustness to network partitioning Chord, for example, can develop a split brain problem!

Legal uses

Page 38: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

Conclusions

Tradeoffs are critical Why are you using P2P to begin with? What sort of security/anonymity guarantees?

DHT applications Think of a good one and become famous PAST

caches whole files Save some routing overhead Harder to implement true filesystem

CFS Easier space management since stores fixed sized-blocks But possible higher management overheads

OpenDHT All DHT’s have same put/get interface, but different implementations Use same implementation for all applications

Page 39: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

References A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object

location and routing for large-scale peer-to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001.

A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", ACM Symposium on Operating Systems Principles (SOSP'01), Banff, Canada, October 2001.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001, pp. 149-160.

Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica, Wide-area cooperative storage with CFS, ACM SOSP 2001, Banff, October 2001.

Page 40: Peer-to-Peer Filesystems Slides originally created by Tom Roeder

References Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A

Measurement Study of Peer-to-Peer File Sharing Systems, Proceedings of Multimedia Computing and Networking 2002 (MMCN'02), San Jose, CA, January 2002.Kleinberg

C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, Newport, Rhode Island, pages 311-320, June 1997.

Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu. OpenDHT: A Public DHT Service and Its Uses. Proceedings of ACM SIGCOMM, Philidelphi, Pennsylvania, pages 73 - 84, August 2005.

Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz. Pond: The OceanStore Prototype. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST). San Francisco, California, pages 1 - 14, March, 2003.