View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Peer-to-peer systems for autonomic VoIP and web hotspot handling
Kundan Singh, Weibin Zhao and Henning SchulzrinneInternet Real Time Laboratory
Computer Science Dept., Columbia University, New York
http://www.cs.columbia.edu/IRT/p2p-siphttp://www.cs.columbia.edu/IRT/dotslash
IBM Delhi (Jan. 2006) 2
P2P for autonomic computing• Autonomic at the application layer:
– Robust against partial network faults– Resources grow as user population grows– Self-configuring
• Traditional p2p systems– file storage
• motivation is often legal, not technical, efficiency– usually unstructured, optimized for Zipf-like popularity
• Other p2p applications:– Skype demonstrates usefulness for VoIP
• identifier lookup• NAT traversal: media traversal
– OpenDHT (and similar) as emerging common infrastructure?– Non-DHT systems with smaller scope web hotspot rescue– Network management (see our IRTF slides)
IBM Delhi (Jan. 2006) 3
Aside: middle services instead of middleware
• Common & successful network services– identifier lookup: ARP, DNS– network storage: proprietary (Yahoo, .mac, …)– storage + computation: CDNs
• Emerging network services– peer-to-peer identifier lookup– network storage– network computation (“utility”)
• maybe programmable• already found as web hosts and grid
computing
IBM Delhi (Jan. 2006) 4
What is P2P?
• Share the resources of individual peers– CPU, disk, bandwidth,
information, …
C
C
C
C
C
SP
P
P
P
P
Computer systems
Centralized Distributed
Client-server Peer-to-peer
Flat Hierarchical Pure Hybrid
mainframesworkstations
DNSmount
RPCHTTP
GnutellaChord
NapsterGroove
Kazaa
File sharing
Communication and collaboration
Distributed computing
SETI@Homefolding@Home
NapsterGnutellaKazaaFreenetOvernet
MagiGrooveSkype
IBM Delhi (Jan. 2006) 5
Distributed Hash Table (DHT)• Types of search
– Central index (Napster)– Distributed index with flooding (Gnutella)– Distributed index with hashing (Chord, Bamboo, …)
• Basic operationsfind(key), insert(key, value), delete(key), but no search(*)
Properties/types Every peer has complete table
Chord Every peer has one key/value
Search time or messages
O(1) O(log(N)) O(n)
Join/leave messages O(n) O(log(N)2) O(1)
IBM Delhi (Jan. 2006) 6
CANContent Addressable Network
• Each key maps to one point in the d-dimensional space
• Each node responsible for all the keys in its zone.
• Divide the space into zones.
A B
C D E
0.0 1.00.0
1.0
A B
C D E
IBM Delhi (Jan. 2006) 7
CAN
AE
X CB
D
(x,y)
AE
X CB
D
Z
Node X locates (x,y)=(.3,.1)Node Z joins
State = 2d Search = dxn1/d
0.0 .25 .5 .75 1.0
1.0
.75
.5
.25
0.0
IBM Delhi (Jan. 2006) 8
Chord• Identifier circle• Keys assigned to
successor• Evenly distributed
keys and nodes
1
8
14
21
32
38
58
47
10
2430
54
38
42
0 1 2 3 4 5 6 7 8
IBM Delhi (Jan. 2006) 9
Chord
• Finger table: logN
• ith finger points to first node that succeeds n by at least 2i-1
• Stabilization after join/leave
1
8
14
21
32
38
58
47
10
2430
54
38
42
Key node
8+1 = 9 14
8+2 = 10 14
8+4 = 12 14
8+8 = 16 21
8+16=24 32
8+32=40 42
IBM Delhi (Jan. 2006) 10
Tapestry• ID with base B=2b
• Route to numerically closest node to the given key
• Routing table has O(B) columns. One per digit in node ID
• Similar to CIDR – but suffix-based
427 763
135
365
123
324
564
364
N=2 N=1 N=0
064 ?04 ??0
164 ?14 ??1
264 ?24 ??2
364 ?34 ??3
464 ?44 ??4
564 ?54 ??5
664 ?64 ??6
**4 => *64 => 364
IBM Delhi (Jan. 2006) 11
Pastry
• Prefix-based• Route to node with shared
prefix (with the key) of ID at least one digit more than this node
• Neighbor set, leaf set and routing table
65a1fc
d13da3
d4213f
d462bad467c4
d471f1
d46a1c
Route(d46a1c)
IBM Delhi (Jan. 2006) 12
Other schemes• Distributed TRIE• Viceroy• Kademlia• SkipGraph• Symphony• …
IBM Delhi (Jan. 2006) 13
DHT Comparison
Property/scheme
Un-structured CAN Chord Tapestry Pastry Viceroy
Routing O(N) or no guarantee
d x N1/d log(N) logBN logBN log(N)
State Constant 2d log(N) logBN B.logBN log(N)
Join/leave Constant 2d (logN)2 logBN logBN log(N)
Reliability and fault resilience
Data at Multiple locations;Retry on failure; finding popular content is efficient
Multiple peers for each data item; retry on failure; multiple paths to destination
Replicate data on consecutive peers; retry on failure
Replicate data on multiple peers; keep multiple paths to each peers
Routing load is evenly distributed among participant lookup servers
IBM Delhi (Jan. 2006) 14
Server-based vs peer-to-peer
Reliability, failover latency
DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval
DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval.
Scalability, number of users
Depends on number of servers in the two stages.
Depends on refresh rate, join/leave rate, uptime
Call setup latency One or two steps. O(log(N)) steps.
Security TLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes
Maintenance, configuration
Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses
PSTN interoperability
Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway
IBM Delhi (Jan. 2006) 15
The basic SIP service • HTTP: retrieve resource identified by URI• SIP: translate address-of-record SIP URI
(sip:[email protected]) to one or more contacts (hosts or other AORs, e.g., sip:[email protected])
– single user multiple hosts• e.g., home, office, mobile, secretary• can be equal or ordered sequentially
• Thus, SIP is (also) a binding protocol– similar, in spirit, to mobile IP except application
layer and without some of the related issues• Function performed by SIP proxy for AOR’s domain
– delegated logically to location server• This function is being replaced by p2p approaches
IBM Delhi (Jan. 2006) 16
What is SIP? Why P2P-SIP?
Bob’s hostAlice’s host128.59.19.194
(1) REGISTER [email protected] =>128.59.19.194
(2) INVITE [email protected]
(3) Contact: 128.59.19.194
columbia.edu
Problem in client-server: maintenance, configuration, controlled infrastructure
Peer-to-peer network
Alice128.59.19.194
(1) REGISTER(2) INVITE alice
(3) 128.59.19.194
No central server, but more lookup latency
IBM Delhi (Jan. 2006) 17
How to combine SIP + P2P?
• SIP-using-P2P– Replace SIP
location service by a P2P protocol
• P2P-over-SIP– Additionally,
implement P2P using SIP messaging
P2P network
Alice128.59.19.194
INSERT
INVITE sip:[email protected]
P2P-SIPoverlay Alice
128.59.19.194
REGISTERINVITE aliceFIND
SIP-using-P2P P2P SIP proxies P2P-over-SIP
Maintenance P2P P2P SIP
Lookup P2P SIP SIP
IBM Delhi (Jan. 2006) 18
Design alternatives
65a1fc
d13da3
d4213f
d462bad467c4
d471f1
d46a1c
Route(d46a1c)
18
14
21
3238
58
47
10
24 30
54
38
42
Use DHT in server farm
Use DHT for all clients - but some are resource limited
Use DHT among super-nodes
1. Hierarchy2. Dynamically adapt
servers
clients
1
10
2430
54
38
IBM Delhi (Jan. 2006) 19
Deployment scenarios
P
P
P
P
P
P2P proxies
P
P
P
P
P
P2P database
P
P
P
P
P
P2P clients
Plug and play; May use adaptors;Untrusted peers
Zero-conf server farm; Trusted servers and user identities
Global, e.g., OpenDHT; Clients or proxies can use;Trusted deployed peers
Interoperate among these!
IBM Delhi (Jan. 2006) 20
Hybrid architecture
• Cross register, or • Locate during call setup
– DNS, or– P2P-SIP hierarchy
IBM Delhi (Jan. 2006) 21
What else can be P2P?• Rendezvous/signaling (SIP)• Configuration storage• Media storage (e.g., voice mail)• Identity assertion (?)• PSTN gateway (?)• NAT/media relay (find best one)
Trust models are different for different components!
IBM Delhi (Jan. 2006) 22
What is our P2P-SIP?• Unlike server-based SIP architecture• Unlike proprietary Skype architecture
– Robust and efficient lookup using DHT– Interoperability
• DHT algorithm uses SIP communication– Hybrid architecture
• Lookup in SIP+P2P• Unlike file-sharing applications
– Data storage, caching, delay, reliability• Disadvantages
– Lookup delay and security
IBM Delhi (Jan. 2006) 23
Implementation: SIPpeer• Platform: Unix (Linux), C++• Modes:
– Chord: using SIP for P2P maintenance– OpenDHT: using external P2P data storage
• based on Bamboo DHT, running on PlanetLab nodes
• Scenarios:– P2P client, P2P proxies– Adaptor for existing phones
• Cisco, X-lite, Windows Messenger, SIPc– Server farm
IBM Delhi (Jan. 2006) 24
P2P-SIP: identifier lookup• P2P serves as SIP location server:
– address-of-record contacts– e.g., [email protected]
128.59.16.1, 128.72.50.13
• multi-valued: (keyn, value1), (keyn, value2)
• with limited TTL• variant: point to SIP proxy server
– either operated by supernode or traditional server• allows registration of non-
p2p SIP domains (*@example.com)
– easier to provide call routing services (e.g., CPL) alice 128.59.16.1
alice 128.72.50.13
IBM Delhi (Jan. 2006) 25
Background: DHT (Chord)
• Identifier circle• Keys assigned to
successor• Evenly distributed
keys and nodes• Finger table: logN
– ith finger points to first node that succeeds n by at least 2i-1
• Stabilization for join/leave
1
8
14
21
32
38
58
47
10
2430
54
38
42
Key node
8+1 = 9 14
8+2 = 10 14
8+4 = 12 14
8+8 = 16 21
8+16=24 32
8+32=40 42
0 1 2 3 4 5 6 7 8
IBM Delhi (Jan. 2006) 26
Implementation: SIPpeer
User interface (buddy list, etc.)
SIPICE RTP/RTCP
Codecs
Audio devicesDHT (Chord)
On startup
Discover
User location
Multicast REGISTERPeer found/Detect NAT
REGISTER
REGISTER, INVITE,MESSAGE
Signup,Find buddies
JoinFind
Leave
On resetSignout,transfer
IM,call
SIP-over-P2P
P2P-using-SIP
IBM Delhi (Jan. 2006) 27
P2P vs. server-based SIP
• Prediction:– P2P for smaller &
quick setup scenarios
– Server-based for corporate and carrier
• Need federated system– multiple p2p
systems, identified by DNS domain name
– with gateway nodes
2000 requests/second ≈7 million registered users
IBM Delhi (Jan. 2006) 28
Open issues• Presence and IM
– where to store presence information: need access authorization
• Performance– how many supernodes are needed? (Skype: ~1000)
• Reliability– P2P nodes generally replicate data– if proxy or presence agent at leaf, need proxy data replication
• Security– Sybil attacks: blackholing supernodes– Identifier protection: protect first registrant against identity
theft– Anonymity, encryption– Protecting voicemails on storage nodes
• Optimization– Locality, proximity, media routing
• Deployment– SIP-P2P vs P2P-SIP, Intra-net, ISP servers
• Motivation– Why should I run as super-node?
IBM Delhi (Jan. 2006) 29
Comparison of P2P and server-based systems
server-based P2P
scaling server count scales with user count, but limited by supernode count
efficiency most efficient DHT maintenance = O((log N)2)
security trust server provider; binary
trust most supernodes; probabilistic
reliability server redundancy; catastrophic failure possible
unreliable supernodes; catastrophic failure unlikely
IBM Delhi (Jan. 2006) 30
Using P2P for binding updates• Proxies do more than just plain identifier
translation:– translation may depend on who’s asking, time of
day, …• e.g., based on script output• hide full range of contacts from caller
– sequential and parallel forking– disconnected services: e.g., forward to voicemail
if no answer• Using a DHT as a location service
– use only plain translation– run services on end systems– run proxy services on supernode(s) and use
proxy as contact need replication for reliability
Skype approach
IBM Delhi (Jan. 2006) 31
Reliability and scalabilityTwo stage architecture for CINEMA
Master
Slave
Master
Slave
sip:[email protected]:[email protected]
s1
s2
s3
a1
a2
b1
b2
a*@example.com
b*@example.com
example.com_sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com
a.example.com_sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com
b.example.com_sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com
Request-rate = f(#stateless, #groups)
Bottleneck: CPU, memory, bandwidth?Failover latency: ?
ex
IBM Delhi (Jan. 2006) 32
SIP p2p summary
• Advantages– Out-of-box
experience– Robust
• catastrophic failure-unlikely
– Inherently scalable• more with more
nodes• Status
– IETF involvement– Columbia SIPpeer
• Security issues– Trust, reputation– malicious node, sybil
attack– SPAM, DDoS– Privacy, anonymity (?)
• Other issues– Lookup
latency,proximity– P2P-SIP vs SIP-using-
P2P– Why should I run as
super-node?
http://www.cs.columbia.edu/IRT/p2p-sip http://www.p2psip.org and
DotSlash: An Automated Web Hotspot Rescue System
Weibin ZhaoHenning Schulzrinne
IBM Delhi (Jan. 2006) 34
The problem• Web hotspots
– Also known as flash crowds or the Slashdot effect
– Short-term dramatic load spikes at web servers• Existing mechanisms are not sufficient
– Over-provisioning• Inefficient for rare events• Difficult because the peak load is hard to
predict– CDNs
• Expensive for small web sites that experience the Slashdot effect
IBM Delhi (Jan. 2006) 35
The challenges• Automate hotspot handling
– Eliminate human intervention to react quickly– Improve availability during critical periods (“15
minutes of fame”)• Allocate resources dynamically
– Static configuration is insufficient for unexpected dramatic load spikes
• Address different bottlenecks– Access network, web server, application server,
and database server
IBM Delhi (Jan. 2006) 36
Our approach• DotSlash
– An automated web hotspot rescue system by building an adaptive distributed web server system on the fly
• Advantages– Fully self-configuring – no configuration
• Service discovery, adaptive control, dynamic virtual hosting
– Scalable, easy to use– Works for static & LAMP applications
• handles network, CPU and database server bottlenecks
– Transparent to clients• cf. CoralCache
IBM Delhi (Jan. 2006) 37
DotSlash overview• Rescue model
– Mutual aid community using spare capacity– Potential usage by web hosting companies
• DotSlash components– Workload monitoring– Rescue server discovery– Load migration (request redirection) – Dynamic virtual hosting– Adaptive rescue and overload control
IBM Delhi (Jan. 2006) 38
Handling load spikes• Request redirection
– DNS-RR: reduce arrival rate– HTTP redirect: increase service rate
• Handle different bottlenecks
Technique Bottleneck Addressed
Cache static content Network, web server
Replicate scripts dynamically Application server
Cache query results on demand Database server
IBM Delhi (Jan. 2006) 39
Rescue example
• Cache static content
origin server rescue server
DNS server(1)
(1)
(2) HTTP redirect
(3)
reverse proxy
(4)
(4)
(3)
(2) DNS round robin
client1
client2
IBM Delhi (Jan. 2006) 40
Rescue example (2)
• Replicate scripts dynamically
origin server
rescue server
PHP
MySQL
Apache
Apache
(5) PHP(6) PHP
(1)
(2)(3)
(7)
(4)
(8)
clientdatabase
server
IBM Delhi (Jan. 2006) 41
Rescue example (3)
• Cache query results on demand
query result cache
data driver
origin server
databaseserver
query result cache
data driver
rescue server
databaseserver
client
IBM Delhi (Jan. 2006) 42
Server states
Normal state
Rescue state
SOS state
Allocate rescue server Release all rescues
Accept SOS request Shutdown all rescues
Origin server Get help from others
Rescue server Provide help to others
IBM Delhi (Jan. 2006) 43
Handling load spikes• Load migration
– DNS-RR: reduce arrival rate– HTTP redirect: increase service rate– Both: increase throughput
• Benefits– Reduce origin server network load by
caching static content at rescue servers– Reduce origin web server CPU load by
replicating scripts dynamically to rescue servers
IBM Delhi (Jan. 2006) 44
Adaptive overload control• Objective
– CPU and network load in desired load region• Origin server
– Allocate/release rescue servers – Adjust redirect probability
• Rescue server– Accept SOS requests– Shutdown rescues– Adjust allowed redirect rate
IBM Delhi (Jan. 2006) 45
Self-configuring• Rescue server discovery via SLP and DNS SRV• Dynamic virtual hosting:
– Serving content of a new site on the fly– use “pre-positioned” Apache virtual hosts
• Workload monitoring: network and CPU– take headers and responses into account
• Adaptive rescue control– Don’t know precise load handling capacity of
rescue servers• particularly for active content
– Establish desired load region (typically, ~70%)– Periodically measure and adjust redirect probability
• convey via protocol
IBM Delhi (Jan. 2006) 46
Implementation• Based on LAMP (Linux, Apache, MySQL, PHP)• Apache module (mod_dots), DotSlash daemon
(dotsd), DotSlash rescue protocol (DSRP)• Dynamic DNS using BIND with dot-slash.net• Service discovery using enhanced SLP
BIND mSLP
HTTPSHM
SLPDNS
DSRP otherdotsdclient Apache
dotsdmod_dots
IBM Delhi (Jan. 2006) 47
Handling File Inclusions• The problem
– A replicated script may include files that are located at the origin server
– Assume: included files under DocumentRoot
• Approaches– Renaming inclusion statements
•Need to parse scripts: heavy weight– Customized error handler
•Catch inclusion errors: light weight
IBM Delhi (Jan. 2006) 48
Evaluation• Workload generation
– httperf for static content– RUBBoS (bulletin board) for dynamic content
• Testbed– LAN cluster and WAN (PlanetLab) nodes– Linux Redhat 9.0, Apache 2.0.49, MySQL 4.0.18,
PHP 4.3.6• Metrics
– Max request rate and max data rate supported
IBM Delhi (Jan. 2006) 49
Results in LANs
Request rate, redirect rate, rescue rate Date rate
IBM Delhi (Jan. 2006) 50
Handling worst-case workload
Settling time: 24 second
#timeouts 921/113565
IBM Delhi (Jan. 2006) 51
Results for dynamic content
No rescue: R=118
With rescue: R=245
#rescue servers: 9
Origin (HC) DB (HC)Rescue (LC)
Configuration:
Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)
245/118>2
CPU: Origin=100% DB=45%
CPU: Origin=55% DB=100%
IBM Delhi (Jan. 2006) 52
Caching TTL and Hit Ratio (Read-Only)
100
101
102
103
60
65
70
75
80
85
90
95
100
Caching TTL (seconds)
Cac
he
hit
ra
tio
(%)
IBM Delhi (Jan. 2006) 53
CPU Utilization (Read-Only)
READ3 with rescue no cache
READ4with rescue
with co-located cache
READ5 with rescue
with shared cache500 1000 1500 2000 2500 3000 3500 40000
10
20
30
40
50
60
70
80
90
100
Number of clients
CP
U u
tiliz
atio
n (%
)
READ3 database serverREAD4 database serverREAD5 database serverREAD5 shared cache server
IBM Delhi (Jan. 2006) 54
Request Rate (Read-Only)
READ3 with rescue no cache
READ4with rescue
with co-located cache
READ5 with rescue
with shared cache500 1000 1500 2000 2500 3000 3500 4000
100
150
200
250
300
350
400
450
500
550
Number of clients
Req
ues
ts p
er
seco
nd
READ3READ4READ5
IBM Delhi (Jan. 2006) 55
CPU Utilization (Submission)
SUB4 with rescue no cache
SUB5with rescuewith cache
no invalidation
SUB6 with rescue with cache
with invalidation3000 3500 4000 4500 5000 5500 6000 6500 70000
10
20
30
40
50
60
70
80
90
100
Number of clients
Ori
gin
da
tab
ase
ser
ver
CP
U u
tiliz
atio
n (
%) SUB4
SUB6SUB5
IBM Delhi (Jan. 2006) 56
Request Rate (Submission)
SUB4 with rescue no cache
SUB5with rescuewith cache
no invalidation
SUB6 with rescue with cache
with invalidation3000 3500 4000 4500 5000 5500 6000 6500 7000400
450
500
550
600
650
700
750
800
850
900
Number of clients
Req
ues
ts p
er
seco
nd SUB4
SUB6SUB5
IBM Delhi (Jan. 2006) 57
Performance• Static content (httperf)
– 10-fold improvement – Relieve network and web server bottlenecks
• Dynamic content (RUBBoS)– Completely remove web/application server
bottleneck– Relieve database server bottleneck– Overall improvement: 10 times for read-only
mix, 5 times for submission mix
IBM Delhi (Jan. 2006) 58
Conclusion• DotSlash prototype
– Applicable to both static and dynamic content– Promising performance improvement– Released as open-source software
• On-going work– Address security issues in deployment– Extensible to SIP servers? Web services?
• For further information– http://www.cs.columbia.edu/IRT/dotslash– DotSlash framework: WCW 2004– Dynamic script replication: Global Internet 2005– On-demand query result cache: TR CUCS-035-
05 (under submission)