Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement...
60
Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington
Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble
Characteristics of Current P2P File-Sharing Systems (with a
brief excursion into network measurement tools) Stefan Saroiu P.
Krishna Gummadi Steven Gribble University of Washington
Slide 2
Peer-to-Peer Frenzy Both research and industrial excitement
CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus,
AudioGalaxy Basic Premise wide-area, distributed system voluntary,
ad-hoc, dynamic home-user peers exchange information (mostly large
files) Many proposals, yet nobody knows the participating peers
characteristics and behavior
Slide 3
SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R
P S peer server Q R D response query file download NapsterGnutella
R Napster & Gnutella
Slide 4
Methodology 2 stages: 1.periodically crawl Gnutella/Napster
discover peers and their metadata 2.feed output from crawl into
measurement tools: bottleneck bandwidth SProbe latency SProbe peer
availability LF degree of content sharing Napster crawler
Slide 5
Network Bandwidth Scenarios Network measurements Dynamic
server/peer selection P2P overlay formation or application-level
multicast Placement of content replicas
Slide 6
Network Bandwidth 1.Throughput: number of transferred bytes
during a fix interval of time 2.Available bandwidth: the maximum
attainable throughput of a newly started flow 3.Bottleneck
bandwidth: maximum throughput ideally obtained across the slowest
link Hard to measure: throughput, available bandwidth Easier to
measure: bottleneck bandwidth
Slide 7
One-Packet Model slope = bandwidthbottleneck 1 probing packet
Traversal Time Packet Size
Slide 8
Packet-Pair Model bottleneck bandwidth time dispersion
proportional to bottleneck bandwidth t sizepacket
bandwidthbottleneck
Slide 9
Vital Properties of an Ideal Tool Accurate Fast: 1
min/measurement too slow Scalable: flooding the network will not
work Works in Uncooperative Environments cant deploy software at
both endpoints
Slide 10
Properties of an Ideal Tool Active: existent traffic might not
be suitable TCP/UDP based: ICMP heavily filtered Cross-traffic
resilient: should detect and give up in the face of cross traffic
Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled
Evaluations
Slide 11
Current Tools Desired Properties Path- char
pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative
Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric
Bandwidth changes Controlled Evaluations
Slide 12
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 13
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 14
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 15
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 16
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 17
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 18
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 19
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 20
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 21
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 22
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 23
SProbe Uses TCP Tricks From local host To remote host No
cooperation needed LocalRemote SYN packet RST packet
Slide 24
SProbe Uses TCP Tricks From remote To local Involuntary
cooperation of application layer LocalRemote (Web) HTTP Get request
Data packet ACK (last data packet)
Slide 25
SProbes Accuracy
Slide 26
Slide 27
More SProbe Bottleneck Bandwidth Latency Availability (LF):
send a SYN packet receive: SYN/ACK host active RST host inactive,
but online nothing host offline
Slide 28
P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
Slide 29
P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
Slide 30
Higher Downstream Bandwidths
Slide 31
Most Peers have Cable Modem-like Bandwidths
Slide 32
Yes, Lots of Cable Modems
Slide 33
Closest 20% are 4X closer than furthest 20%
Slide 34
Two horizontal bands East Coast and Transoceanic Links
Slide 35
Availability Period probes yield data like: start end
Slide 36
Availability Period probes yield data like: Divide into two
periods Keep segments that: start in 1 st period end in 1 st or 2
nd periods draw conclusion only on segments no larger than 2 nd
period start end 12 hours
Slide 37
Median Session is about one hour (same for both systems)
Slide 38
Gnutella/Napster Uptime
Slide 39
P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
Slide 40
Who Has the Files?
Slide 41
Slide 42
Correlation of Free-Riding with B/W
Slide 43
P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
Slide 44
Its all about incentive!
Slide 45
Lack of Knowledge is Universal
Slide 46
P2P Characteristics How many peers are server-like? Who are the
free-riders? Do peers tend to lie? How robust is the Gnutella
overlay?
Slide 47
Power-Law Networks are here to Stay Barabasi and Albert showed
that networks which grow by continuous addition of new nodes
exhibit preferential attachment (likelihood of connecting to a node
depends on the nodes degree) power-law distribution of vertex
degree Internet, WWW, Gnutella
Slide 48
Resilience to Failures Power-law networks (Cohen et al.): very
resilient in face of random node failures a giant spanning cluster
still exists fairly resilient in face of cascading failures very
vulnerable in face of orchestrated attacks (towards high-degree
nodes)
Slide 49
Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular
sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0
Slide 50
30% random failures 1771 471 294 hostsFri Feb 16
05:21:52-05:23:22 PST
Discussion Heterogeneity: 3 orders of magnitude of bandwidth
50Kbps-100Mbps 6 orders of magnitude of latency 10us-10s >4
orders of magnitude in availability 1%-99.99% Peers should not be
treated as equals
Slide 53
Cooperating, Well-Behaved Peers Incentive: game-theoretic
approaches of enforcing local behavior for global benefit System
enforcement: peers can: measure each others characteristics
(SProbe) enforce the reported ones a reported 56Kbps peer should
not download content at higher speed
Slide 54
Feedback to Current Proposals CAN, Chord, Past: great memory
and lookup algorithms: log(N) time and space at the price of
maintaining rigid network structure: hypercubes, butterflies,
Plaxton trees unclear how network structure is maintained given
heterogeneity and dynamics of peers Conjecture these networks will
have a hard time stabilizing: will need lots of routine,
maintenance traffic
Slide 55
Instead Gnutella Easy join procedure: this simplicity gave
Gnutella its power-law shape Easy to implement protocol (broadcast)
Lots of maintenance traffic already although the protocol has
become smarter with its subsequent versions Searching is a
nightmare
Slide 56
Document Popularity Follows Zipf distribution long-tailed
Popular documents become more popular with Napster/Gnutella
Currently, need to resubmit queries in the hope that someone will
answer Wish-list based system
Slide 57
Wide-area Network Measurements Sending a few packets can be
identified with hostile behavior Even a few SYN packets are
sufficient to trigger software firewalls dialogue box pops up
possible scan from washington.edu, click OK or Cancel Many
confused, angry, threatening e-mails sent to many people (security,
root, Ed): active Internet measurements are not simple to
perform
Slide 58
Excerpt from e-mail Thank you for your reply. Unfortunately, I
did not authorise anybody from washington.edu to attempt to crack
into my computer. Attempting to break into computers is a crime in
Australia. Please advise the names and contact details of the
people involved in this "research" so that I can contact the
Australian Federal Police, who will no doubt contact your Federal
Bureau of Investigation to investigate this incident and institute
criminal proceedings against those concerned.
Slide 59
Current Work Quantify and show that current proposals are too
rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed
exchange system big distributed scheduling problem SGet a
downloading tool with automatic server selection no bandwidth is
wasted