Peer-to-peer systems for autonomic VoIP and web hotspot handling Kundan Singh, Weibin Zhao and Henning Schulzrinne Internet Real Time Laboratory Computer

Peer-to-peer systems for autonomic VoIP and web hotspot handling

Kundan Singh, Weibin Zhao and Henning SchulzrinneInternet Real Time Laboratory

Computer Science Dept., Columbia University, New York

http://www.cs.columbia.edu/IRT/p2p-siphttp://www.cs.columbia.edu/IRT/dotslash

IBM Delhi (Jan. 2006) 2

P2P for autonomic computing• Autonomic at the application layer:

– Robust against partial network faults– Resources grow as user population grows– Self-configuring

• Traditional p2p systems– file storage

• motivation is often legal, not technical, efficiency– usually unstructured, optimized for Zipf-like popularity

• Other p2p applications:– Skype demonstrates usefulness for VoIP

• identifier lookup• NAT traversal: media traversal

– OpenDHT (and similar) as emerging common infrastructure?– Non-DHT systems with smaller scope web hotspot rescue– Network management (see our IRTF slides)


Aside: middle services instead of middleware

• Common & successful network services– identifier lookup: ARP, DNS– network storage: proprietary (Yahoo, .mac, …)– storage + computation: CDNs

• Emerging network services– peer-to-peer identifier lookup– network storage– network computation (“utility”)

• maybe programmable• already found as web hosts and grid

computing


What is P2P?

• Share the resources of individual peers– CPU, disk, bandwidth,

information, …

C

C

C

C

C

SP

P

P

P

P

Computer systems

Centralized Distributed

Client-server Peer-to-peer

Flat Hierarchical Pure Hybrid

mainframesworkstations

DNSmount

RPCHTTP

GnutellaChord

NapsterGroove

Kazaa

File sharing

Communication and collaboration

Distributed computing

SETI@Homefolding@Home

NapsterGnutellaKazaaFreenetOvernet

MagiGrooveSkype


Distributed Hash Table (DHT)• Types of search

– Central index (Napster)– Distributed index with flooding (Gnutella)– Distributed index with hashing (Chord, Bamboo, …)

• Basic operationsfind(key), insert(key, value), delete(key), but no search(*)

Properties/types Every peer has complete table

Chord Every peer has one key/value

Search time or messages

O(1) O(log(N)) O(n)

Join/leave messages O(n) O(log(N)2) O(1)


CANContent Addressable Network

• Each key maps to one point in the d-dimensional space

• Each node responsible for all the keys in its zone.

• Divide the space into zones.

A B

C D E

0.0 1.00.0

1.0

A B

C D E


CAN

AE

X CB

D

(x,y)

AE

X CB

D

Z

Node X locates (x,y)=(.3,.1)Node Z joins

State = 2d Search = dxn1/d

0.0 .25 .5 .75 1.0

1.0

.75

.5

.25

0.0


Chord• Identifier circle• Keys assigned to

successor• Evenly distributed

keys and nodes

1

8

14

21

32

38

58

47

10

2430

54

38

42

0 1 2 3 4 5 6 7 8


Chord

• Finger table: logN

• ith finger points to first node that succeeds n by at least 2i-1

• Stabilization after join/leave

1

8

14

21

32

38

58

47

10

2430

54

38

42

Key node

8+1 = 9 14

8+2 = 10 14

8+4 = 12 14

8+8 = 16 21

8+16=24 32

8+32=40 42


Tapestry• ID with base B=2b

• Route to numerically closest node to the given key

• Routing table has O(B) columns. One per digit in node ID

• Similar to CIDR – but suffix-based

427 763

135

365

123

324

564

364

N=2 N=1 N=0

064 ?04 ??0

164 ?14 ??1

264 ?24 ??2

364 ?34 ??3

464 ?44 ??4

564 ?54 ??5

664 ?64 ??6

**4 => *64 => 364


Pastry

• Prefix-based• Route to node with shared

prefix (with the key) of ID at least one digit more than this node

• Neighbor set, leaf set and routing table

65a1fc

d13da3

d4213f

d462bad467c4

d471f1

d46a1c

Route(d46a1c)


Other schemes• Distributed TRIE• Viceroy• Kademlia• SkipGraph• Symphony• …


DHT Comparison

Property/scheme

Un-structured CAN Chord Tapestry Pastry Viceroy

Routing O(N) or no guarantee

d x N1/d log(N) logBN logBN log(N)

State Constant 2d log(N) logBN B.logBN log(N)

Join/leave Constant 2d (logN)2 logBN logBN log(N)

Reliability and fault resilience

Data at Multiple locations;Retry on failure; finding popular content is efficient

Multiple peers for each data item; retry on failure; multiple paths to destination

Replicate data on consecutive peers; retry on failure

Replicate data on multiple peers; keep multiple paths to each peers

Routing load is evenly distributed among participant lookup servers


Server-based vs peer-to-peer

Reliability, failover latency

DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval

DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval.

Scalability, number of users

Depends on number of servers in the two stages.

Depends on refresh rate, join/leave rate, uptime

Call setup latency One or two steps. O(log(N)) steps.

Security TLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes

Maintenance, configuration

Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses

PSTN interoperability

Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway


The basic SIP service • HTTP: retrieve resource identified by URI• SIP: translate address-of-record SIP URI

(sip:[email protected]) to one or more contacts (hosts or other AORs, e.g., sip:[email protected])

– single user multiple hosts• e.g., home, office, mobile, secretary• can be equal or ordered sequentially

• Thus, SIP is (also) a binding protocol– similar, in spirit, to mobile IP except application

layer and without some of the related issues• Function performed by SIP proxy for AOR’s domain

– delegated logically to location server• This function is being replaced by p2p approaches


What is SIP? Why P2P-SIP?

Bob’s hostAlice’s host128.59.19.194

(1) REGISTER [email protected] =>128.59.19.194

(2) INVITE [email protected]

(3) Contact: 128.59.19.194

columbia.edu

Problem in client-server: maintenance, configuration, controlled infrastructure

Peer-to-peer network

Alice128.59.19.194

(1) REGISTER(2) INVITE alice

(3) 128.59.19.194

No central server, but more lookup latency


How to combine SIP + P2P?

• SIP-using-P2P– Replace SIP

location service by a P2P protocol

• P2P-over-SIP– Additionally,

implement P2P using SIP messaging

P2P network

Alice128.59.19.194

INSERT

INVITE sip:[email protected]

P2P-SIPoverlay Alice

128.59.19.194

REGISTERINVITE aliceFIND

SIP-using-P2P P2P SIP proxies P2P-over-SIP

Maintenance P2P P2P SIP

Lookup P2P SIP SIP


Design alternatives

65a1fc

d13da3

d4213f

d462bad467c4

d471f1

d46a1c

Route(d46a1c)

18

14

21

3238

58

47

10

24 30

54

38

42

Use DHT in server farm

Use DHT for all clients - but some are resource limited

Use DHT among super-nodes

1. Hierarchy2. Dynamically adapt

servers

clients

1

10

2430

54

38


Deployment scenarios

P

P

P

P

P

P2P proxies

P

P

P

P

P

P2P database

P

P

P

P

P

P2P clients

Plug and play; May use adaptors;Untrusted peers

Zero-conf server farm; Trusted servers and user identities

Global, e.g., OpenDHT; Clients or proxies can use;Trusted deployed peers

Interoperate among these!


Hybrid architecture

• Cross register, or • Locate during call setup

– DNS, or– P2P-SIP hierarchy


What else can be P2P?• Rendezvous/signaling (SIP)• Configuration storage• Media storage (e.g., voice mail)• Identity assertion (?)• PSTN gateway (?)• NAT/media relay (find best one)

Trust models are different for different components!


What is our P2P-SIP?• Unlike server-based SIP architecture• Unlike proprietary Skype architecture

– Robust and efficient lookup using DHT– Interoperability

• DHT algorithm uses SIP communication– Hybrid architecture

• Lookup in SIP+P2P• Unlike file-sharing applications

– Data storage, caching, delay, reliability• Disadvantages

– Lookup delay and security


Implementation: SIPpeer• Platform: Unix (Linux), C++• Modes:

– Chord: using SIP for P2P maintenance– OpenDHT: using external P2P data storage

• based on Bamboo DHT, running on PlanetLab nodes

• Scenarios:– P2P client, P2P proxies– Adaptor for existing phones

• Cisco, X-lite, Windows Messenger, SIPc– Server farm


P2P-SIP: identifier lookup• P2P serves as SIP location server:

– address-of-record contacts– e.g., [email protected]

128.59.16.1, 128.72.50.13

• multi-valued: (keyn, value1), (keyn, value2)

• with limited TTL• variant: point to SIP proxy server

– either operated by supernode or traditional server• allows registration of non-

p2p SIP domains (*@example.com)

– easier to provide call routing services (e.g., CPL) alice 128.59.16.1

alice 128.72.50.13


Background: DHT (Chord)

• Identifier circle• Keys assigned to

successor• Evenly distributed

keys and nodes• Finger table: logN

– ith finger points to first node that succeeds n by at least 2i-1

• Stabilization for join/leave

1

8

14

21

32

38

58

47

10

2430

54

38

42

Key node

8+1 = 9 14

8+2 = 10 14

8+4 = 12 14

8+8 = 16 21

8+16=24 32

8+32=40 42

0 1 2 3 4 5 6 7 8


Implementation: SIPpeer

User interface (buddy list, etc.)

SIPICE RTP/RTCP

Codecs

Audio devicesDHT (Chord)

On startup

Discover

User location

Multicast REGISTERPeer found/Detect NAT

REGISTER

REGISTER, INVITE,MESSAGE

Signup,Find buddies

JoinFind

Leave

On resetSignout,transfer

IM,call

SIP-over-P2P

P2P-using-SIP


P2P vs. server-based SIP

• Prediction:– P2P for smaller &

quick setup scenarios

– Server-based for corporate and carrier

• Need federated system– multiple p2p

systems, identified by DNS domain name

– with gateway nodes

2000 requests/second ≈7 million registered users


Open issues• Presence and IM

– where to store presence information: need access authorization

• Performance– how many supernodes are needed? (Skype: ~1000)

• Reliability– P2P nodes generally replicate data– if proxy or presence agent at leaf, need proxy data replication

• Security– Sybil attacks: blackholing supernodes– Identifier protection: protect first registrant against identity

theft– Anonymity, encryption– Protecting voicemails on storage nodes

• Optimization– Locality, proximity, media routing

• Deployment– SIP-P2P vs P2P-SIP, Intra-net, ISP servers

• Motivation– Why should I run as super-node?


Comparison of P2P and server-based systems

server-based P2P

scaling server count scales with user count, but limited by supernode count

efficiency most efficient DHT maintenance = O((log N)2)

security trust server provider; binary

trust most supernodes; probabilistic

reliability server redundancy; catastrophic failure possible

unreliable supernodes; catastrophic failure unlikely


Using P2P for binding updates• Proxies do more than just plain identifier

translation:– translation may depend on who’s asking, time of

day, …• e.g., based on script output• hide full range of contacts from caller

– sequential and parallel forking– disconnected services: e.g., forward to voicemail

if no answer• Using a DHT as a location service

– use only plain translation– run services on end systems– run proxy services on supernode(s) and use

proxy as contact need replication for reliability

Skype approach


Reliability and scalabilityTwo stage architecture for CINEMA

Master

Slave

Master

Slave

sip:[email protected]:[email protected]

s1

s2

s3

a1

a2

b1

b2

a*@example.com

b*@example.com

example.com_sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com

a.example.com_sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com

b.example.com_sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com

Request-rate = f(#stateless, #groups)

Bottleneck: CPU, memory, bandwidth?Failover latency: ?

ex


SIP p2p summary

• Advantages– Out-of-box

experience– Robust

• catastrophic failure-unlikely

– Inherently scalable• more with more

nodes• Status

– IETF involvement– Columbia SIPpeer

• Security issues– Trust, reputation– malicious node, sybil

attack– SPAM, DDoS– Privacy, anonymity (?)

• Other issues– Lookup

latency,proximity– P2P-SIP vs SIP-using-

P2P– Why should I run as

super-node?

http://www.cs.columbia.edu/IRT/p2p-sip http://www.p2psip.org and

http://www.cs.columbia.edu/IRT/p2p-sip

http://www.p2psip.org/

DotSlash: An Automated Web Hotspot Rescue System

Weibin ZhaoHenning Schulzrinne


The problem• Web hotspots

– Also known as flash crowds or the Slashdot effect

– Short-term dramatic load spikes at web servers• Existing mechanisms are not sufficient

– Over-provisioning• Inefficient for rare events• Difficult because the peak load is hard to

predict– CDNs

• Expensive for small web sites that experience the Slashdot effect


The challenges• Automate hotspot handling

– Eliminate human intervention to react quickly– Improve availability during critical periods (“15

minutes of fame”)• Allocate resources dynamically

– Static configuration is insufficient for unexpected dramatic load spikes

• Address different bottlenecks– Access network, web server, application server,

and database server


Our approach• DotSlash

– An automated web hotspot rescue system by building an adaptive distributed web server system on the fly

• Advantages– Fully self-configuring – no configuration

• Service discovery, adaptive control, dynamic virtual hosting

– Scalable, easy to use– Works for static & LAMP applications

• handles network, CPU and database server bottlenecks

– Transparent to clients• cf. CoralCache


DotSlash overview• Rescue model

– Mutual aid community using spare capacity– Potential usage by web hosting companies

• DotSlash components– Workload monitoring– Rescue server discovery– Load migration (request redirection) – Dynamic virtual hosting– Adaptive rescue and overload control


Handling load spikes• Request redirection

– DNS-RR: reduce arrival rate– HTTP redirect: increase service rate

• Handle different bottlenecks

Technique Bottleneck Addressed

Cache static content Network, web server

Replicate scripts dynamically Application server

Cache query results on demand Database server


Rescue example

• Cache static content

origin server rescue server

DNS server(1)

(1)

(2) HTTP redirect

(3)

reverse proxy

(4)

(4)

(3)

(2) DNS round robin

client1

client2


Rescue example (2)

• Replicate scripts dynamically

origin server

rescue server

PHP

MySQL

Apache

Apache

(5) PHP(6) PHP

(1)

(2)(3)

(7)

(4)

(8)

clientdatabase

server


Rescue example (3)

• Cache query results on demand

query result cache

data driver

origin server

databaseserver

query result cache

data driver

rescue server

databaseserver

client


Server states

Normal state

Rescue state

SOS state

Allocate rescue server Release all rescues

Accept SOS request Shutdown all rescues

Origin server Get help from others

Rescue server Provide help to others


Handling load spikes• Load migration

– DNS-RR: reduce arrival rate– HTTP redirect: increase service rate– Both: increase throughput

• Benefits– Reduce origin server network load by

caching static content at rescue servers– Reduce origin web server CPU load by

replicating scripts dynamically to rescue servers


Adaptive overload control• Objective

– CPU and network load in desired load region• Origin server

– Allocate/release rescue servers – Adjust redirect probability

• Rescue server– Accept SOS requests– Shutdown rescues– Adjust allowed redirect rate


Self-configuring• Rescue server discovery via SLP and DNS SRV• Dynamic virtual hosting:

– Serving content of a new site on the fly– use “pre-positioned” Apache virtual hosts

• Workload monitoring: network and CPU– take headers and responses into account

• Adaptive rescue control– Don’t know precise load handling capacity of

rescue servers• particularly for active content

– Establish desired load region (typically, ~70%)– Periodically measure and adjust redirect probability

• convey via protocol


Implementation• Based on LAMP (Linux, Apache, MySQL, PHP)• Apache module (mod_dots), DotSlash daemon

(dotsd), DotSlash rescue protocol (DSRP)• Dynamic DNS using BIND with dot-slash.net• Service discovery using enhanced SLP

BIND mSLP

HTTPSHM

SLPDNS

DSRP otherdotsdclient Apache

dotsdmod_dots


Handling File Inclusions• The problem

– A replicated script may include files that are located at the origin server

– Assume: included files under DocumentRoot

• Approaches– Renaming inclusion statements

•Need to parse scripts: heavy weight– Customized error handler

•Catch inclusion errors: light weight


Evaluation• Workload generation

– httperf for static content– RUBBoS (bulletin board) for dynamic content

• Testbed– LAN cluster and WAN (PlanetLab) nodes– Linux Redhat 9.0, Apache 2.0.49, MySQL 4.0.18,

PHP 4.3.6• Metrics

– Max request rate and max data rate supported


Results in LANs

Request rate, redirect rate, rescue rate Date rate


Handling worst-case workload

Settling time: 24 second

#timeouts 921/113565


Results for dynamic content

No rescue: R=118

With rescue: R=245

#rescue servers: 9

Origin (HC) DB (HC)Rescue (LC)

Configuration:

Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)Rescue (LC)

245/118>2

CPU: Origin=100% DB=45%

CPU: Origin=55% DB=100%


Caching TTL and Hit Ratio (Read-Only)

100

101

102

103

60

65

70

75

80

85

90

95

100

Caching TTL (seconds)

Cac

he

hit

ra

tio

(%)


CPU Utilization (Read-Only)

READ3 with rescue no cache

READ4with rescue

with co-located cache

READ5 with rescue

with shared cache500 1000 1500 2000 2500 3000 3500 40000

10

20

30

40

50

60

70

80

90

100

Number of clients

CP

U u

tiliz

atio

n (%

)

READ3 database serverREAD4 database serverREAD5 database serverREAD5 shared cache server


Request Rate (Read-Only)

READ3 with rescue no cache

READ4with rescue

with co-located cache

READ5 with rescue

with shared cache500 1000 1500 2000 2500 3000 3500 4000

100

150

200

250

300

350

400

450

500

550

Number of clients

Req

ues

ts p

er

seco

nd

READ3READ4READ5


CPU Utilization (Submission)

SUB4 with rescue no cache

SUB5with rescuewith cache

no invalidation

SUB6 with rescue with cache

with invalidation3000 3500 4000 4500 5000 5500 6000 6500 70000

10

20

30

40

50

60

70

80

90

100

Number of clients

Ori

gin

da

tab

ase

ser

ver

CP

U u

tiliz

atio

n (

%) SUB4

SUB6SUB5


Request Rate (Submission)

SUB4 with rescue no cache

SUB5with rescuewith cache

no invalidation

SUB6 with rescue with cache

with invalidation3000 3500 4000 4500 5000 5500 6000 6500 7000400

450

500

550

600

650

700

750

800

850

900

Number of clients

Req

ues

ts p

er

seco

nd SUB4

SUB6SUB5


Performance• Static content (httperf)

– 10-fold improvement – Relieve network and web server bottlenecks

• Dynamic content (RUBBoS)– Completely remove web/application server

bottleneck– Relieve database server bottleneck– Overall improvement: 10 times for read-only

mix, 5 times for submission mix


Conclusion• DotSlash prototype

– Applicable to both static and dynamic content– Promising performance improvement– Released as open-source software

• On-going work– Address security issues in deployment– Extensible to SIP servers? Web services?

• For further information– http://www.cs.columbia.edu/IRT/dotslash– DotSlash framework: WCW 2004– Dynamic script replication: Global Internet 2005– On-demand query result cache: TR CUCS-035-

05 (under submission)

http://www.cs.columbia.edu/IRT/dotslash

Documents

Peer-to-peer systems for autonomic VoIP and web hotspot handling Kundan Singh, Weibin Zhao and Henning Schulzrinne Internet Real Time Laboratory Computer