Upload
riley-preston
View
30
Download
3
Tags:
Embed Size (px)
DESCRIPTION
SIP as infrastructure. Henning Schulzrinne Dept. of Computer Science, Columbia University, New York [email protected] SIP 2007 (upperside.fr) Paris, France February 2007. Outline. Scaling SIP to the real world: emergency calling Scaling SIP to very large deployments - PowerPoint PPT Presentation
Citation preview
SIP as infrastructure
Henning SchulzrinneDept. of Computer Science, Columbia University, New York
SIP 2007 (upperside.fr)
Paris, France
February 2007
February 2007 2
Outline
• Scaling SIP to the real world: emergency calling• Scaling SIP to very large deployments
– some measurements for designing large servers– congestion control and dealing with avalanche restart– P2P SIP– failure discovery
• The state of SIP standardization, year 11– developments in 2006 & upcoming highlights– trouble in standards land
February 2007 3
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 4
Evolution of VoIP
“amazing – the
phone rings”
“does it docall transfer?”
“How can I make it
stop ringing?”
1996-2000 2000-2003 2004-2005
catching upwith the digital PBX
long-distance calling,ca. 1930
going beyondthe black phone
2006-
“Can it really
replace the phone
system?”
replacing theglobal phone system
February 2007 5
IETF VoIP efforts
SIP(protocol)
SIPPING(usage, requirements)
ECRIT(emergency calling)
AVT(RTP, SRTP, media)
ENUM(E.164 translation)
IPTEL(tel URL)
SIMPLE(presence)
GEOPRIV(geo + privacy)
usesmay use
uses
provides
usually
used with
IETF RAI area
MMUSIC(SDP, RTSP, ICE)
XCON(conf. control)
SPEERMINT(peering)
uses
SPEECHSC(speech services)
SIGTRAN(signaling transport)
uses
February 2007 6
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 7
VoIP emergency communications
emergency call
dispatch
civic coordination
emergency alert(“inverse 911”)
Contact well-known
number or identifier
Route call to location-
appropriate PSAP
Deliver precise location to call
taker to dispatch
emergency help
now transition all IP
112911
112911
112, 911urn:service:sos
SR VPC LoST
phone number location(ALI lookup)
in-band key location
in-band
February 2007 8
IETF ECRIT working group
• Emergency Contact Resolution with Internet Technologies
• Solve four major pieces of the puzzle:– location conveyance (with SIP & GEOPRIV)– emergency call identification– mapping geo and civic caller locations to PSAP URI– discovery of local and visited emergency dial string
• Not solving– location discovery --> GEOPRIV– inter-PSAP communication and coordination– citizen notification
• Current status:– finishing general and security requirements– agreement on mapping protocol (LoST) and identifier (sos URN)– working on overall architecture and UA requirements
February 2007 9
ECRIT: Options for location delivery
• GPS• L2: LLDP-MED (standardized version of CDP + location data)
– periodic per-port broadcast of configuration information– currently implementing CDP
• L3: DHCP for– geospatial (RFC 3825)– civic (RFC 4676)
• L7: proposals for retrievals: HELD, RELO, LCP, SIP, …– for own IP address or by third party (e.g., ISP to infrastructure provider)– by IP address– by MAC address– by identifier (conveyed by DHCP or PPP)– HELD, RELO: both HTTP-based
February 2007 10
ECRIT: Finding the correct PSAP
• Which PSAP should the e-call go to?– Usually to the PSAP that serves the geographic area– Sometimes to a backup PSAP– If no location, then ‘default’ PSAP– solved by LoST
I am at "Otto -Hahn-Ring 6, 81739 München"
I need contact the ambulance . (Emergency Identifier)
MappingClient
MappingServer
Contact URI [email protected]
February 2007 11
ECRIT: LoST Functionality
• Civic as well as geospatial queries– civic address validation
• Recursive and iterative resolution• Fully distributed and hierarchical
deployment– can be split by any geographic or civic
boundary– same civic region can span multiple
LoST servers
<findService xmlns="urn:…:lost1"><location profile="basic-civic"> <civicAddress> <country>Germany</country> <A1>Bavaria</A1> <A3>Munich</A3> <A6>Neu Perlach</A6> <HNO>96</HNO> </civicAddress> </location> <service>urn:service:sos.police</service></findService>
• Indicates errors in civic location data debugging
– but provides best-effort resolution• Can be used for non-emergency services:
– directory and information services– pizza delivery services, towing companies, …
February 2007 12
LoST: Location-to-URL Mapping
clusterserves VSP2
NYUS
NJUS
Bergen CountyNJ US
123 Broad AveLeoniaBergen CountyNJ US
cluster serving VSP1
replicateroot information
searchreferral
rootnodes
LeoniaNJ US
VSP1
LoST
February 2007 13
LoST Architecture
T1
(.us)
T2
(.de) T3
(.dk)
G
G
GG
G broadcast (gossip)T1: .us
T2: .de
resolver
seeker313 Westview
Leonia, NJ US
Leonia, NJ sip:[email protected]
tree guide
February 2007 14
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 15
SIP server overload
• Proxies will return 503 --> retry elsewhere• Just adds more load• Retransmissions exacerbate the problem
overloaded
overloaded
overloaded
INVITE
503
Springsteen tickets!!earthquake
vote for your favorite…
February 2007 16
Avalanche restart
• Large number of terminals all start at once• Typically, after power outage• Overwhelms registrar• Possible loss of registrations due to retransmission time-out
#300,000
#1
reboot afterpower outage
REGISTER
February 2007 17
Overload control
• Current discussion in design team• Feedback control: rate-based or window-based• Avoid congestion collapse• Deal with multiple upstream sources
capacity
goodput
offered load
S5
S1
S2
S3
S4
UA
UA
February 2007 18
Scaling servers & TCP
• Need TCP– TLS support: customer
privacy, theft of service, …• particularly for WiFi
– many SIP messages now exceed reasonable UDP size (fragmentation)
• e.g., INVITE for IMS: 1182 bytes
• Concern: UA support– improving: 82% of systems at
recent SIPit’19 had TCP support
– only 45% support TLS
• Concern: TCP (and TLS) much less efficient than UDP
– running series of tests to identify differences
– difference mainly in• connection setup cost
• message splitting (may need pre-parsing or incremental parsers)
• thread count (one per socket?)
• Our model: – 300,000 customers/servers
• 0.1 Erlang, 180 sec/call
– 600,000 BHCA --> 167 req/sec– 300,000 registrations --> 83
req/sec– $0.001/subscriber
February 2007 19
Performance evaluation results
• Pentium 4 server, 3 GHz– 4 GB memory
– Linux 2.6.16
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
transaction persistent w/setup persistent w/o setup UDP
Response time (ms)
0
10
20
30
40
50
60
70
80
90
100
CPU (%)
response time 2,500 req/sec response time 14,800 req/sec CPU 2,500 req/sec CPU 14,800 req/sec
echo server
Kumiko Ono
February 2007 20
SIP server measurements
• Initial INVITE measurements– OpenSER
– 400 calls/sec for TCP
– roughly 260 calls/sec for TLS
sipd REGISTER test
Kumiko Ono, Charles Shen, Erich Nahum
TCP
February 2007 21
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 22
LAN
P2P SIP
• Why?– no infrastructure available: emergency
coordination– don’t want to set up infrastructure: small
companies– Skype envy :-)
• P2P technology for– user location
• only modest impact on expenses• but makes signaling encryption cheap
– NAT traversal• matters for relaying
– services (conferencing, …)• how prevalent?
• New IETF working group just formed– likely, multiple DHTs– common control and look-up protocol?
P2P provider A
P2P provider B
p2p network
traditional provider
DNS
zeroconf
generic DHT service
February 2007 23
P2P SIP -- components
• Multicast-DNS (zeroconf) SIP enhancements for LAN– announce UAs and their
capabilities
• Client-P2P protocol– GET, PUT mappings– mapping: proxy or UA
• P2P protocol– get routing table, join, leave, …– independent of DHT?– replaces DNS for SIP, not proxy
February 2007 24
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 25
VoIP user experience
• Only 95-99.5% call attempt success– “Keynote was able to complete VoIP
calls 96.9% of the time, compared with 99.9% for calls made over the public network. Voice quality for VoIP calls on average was rated at 3.5 out of 5, compared with 3.9 for public-network calls and 3.6 for cellular phone calls. And the amount of delay the audio signals experienced was 295 milliseconds for VoIP calls, compared with 139 milliseconds for public-network calls.” (InformationWeek, July 11, 2005)
• Mid-call disruptions common• Lots of knobs to turn
– Separate problem: manual configuration
February 2007 26
Open issues: Configuration
• Ideally, should only need a user name and some credential
– password, USB key, host identity (MAC address), …
• More than DHCP: device needs to get
– SIP-level information (outbound proxy, timers)
– policy information (“sorry, no video”)
• Multiple sources of configuration information
– local network (hotel proxy)– voice service provider (off-
network)• Configuration information may change• Needs to allow no-touch deployment of
thousands of devices• SIP configuration framework
– has been languishing for years– currently being rewritten to reduce
complexity
February 2007 27
Circle of blame
OS VSP
appvendor
ISP
must be a Windows registryproblem re-installWindows
probably packetloss in yourInternet connection reboot your DSL modem
must beyour software upgrade
probably a gateway fault choose us as provider
February 2007 28
Traditional network management model
SNMP
X
“management from the center”
February 2007 29
Old assumptions, now wrong
• Single provider (enterprise, carrier)– has access to most path
elements– professionally managed
• Problems are hard failures & elements operate correctly– element failures (“link dead”)– substantial packet loss
• Mostly L2 and L3 elements– switches, routers– rarely 802.11 APs
• Problems are specific to a protocol– “IP is not working”
• Indirect detection– MIB variable vs. actual
protocol performance
• End systems don’t need management– DMI & SNMP never
succeeded– each application does its own
updates
February 2007 30
Management
element inspection
configuration
fault location
network understanding
we’ve only
succeeded here
what causes the most trouble?
February 2007 31
Managing the protocol stack
RTP
UDP/TCP
IP
SIP
no routepacket loss
TCP neg. failureNAT time-outfirewall policy
protocol problem
playout errors
mediaecho
gain problemsVAD action
protocol problem
authorizationasymmetric conn (NAT)
February 2007 32
Proposal: “Do You See What I See?”
• Each node has a set of active and passive measurement tools• Use intercept (NDIS, pcap)
– to detect problems automatically• e.g., no response to HTTP or DNS request
– gather performance statistics (packet jitter)– capture RTCP and similar measurement packets
• Nodes can ask others for their view– possibly also dedicated “weather stations”
• Iterative process, leading to:– user indication of cause of failure– in some cases, work-around (application-layer routing) TURN
server, use remote DNS servers• Nodes collect statistical information on failures and their likely
causes
February 2007 33
Management architecture
“not working”
(notification)
inspect protocol requests
(DNS, HTTP, RTCP, …)
“DNS failure for 15m”
orchestrate testscontact others
ping 127.0.0.1
can buddy reach our resolver?
notify admin(email, IM, SIP events, …)
request diagnostics
February 2007 34
Roadmap
• Introduction
• Emergency calling
• Server scaling
• P2P SIP
• End-to-end management
• Standardization and interoperability
February 2007 35
SIP, SIPPING & SIMPLE –00 drafts
includes draft-ietf-*-00 and draft-personal-*-00
0
10
20
30
40
50
60
70
80
19992000200120022003200420052006
SIPSIPPINGSIMPLE
February 2007 36
RFC publication
0
2
4
6
8
10
12
14
2001 2002 2003 2004 2005 2006
SIPSIPPINGSIMPLE
February 2007 37
IETF WG: SIP in 2006 & 2007
• ~ 44 SIP-related RFCs published in 2006– BFCP, conferencing – SDP revision– rich presence
• Activities:– hitchhiker’s guide– infrastructure:
• GRUUs (random identifiers)• URI lists• XCAP configuration• SIP MIB
– services:• rejecting anonymous requests• consent framework• location conveyance• session policy
– security:• end-to-middle security• certificates• SAML• sips clarification
– NAT:• connection re-use• SIP outbound• ICE (in MMUSIC)
see http://tools.ietf.org/wg/sip’/
February 2007 38
IETF WG: SIPPING
• 31 RFCs published in 2006• Policy
– media policy
– SBC functions
• Services– service examples
– call transfer
– configuration framework
– spam and spit
– text-over-IP
– transcoding
• Testing and operations– IPv6 transition
– race condition examples
– IPv6 torture tests
– SIP offer-answer examples
– overload requirements
– configuration
– voice quality reporting
February 2007 39
Interoperability
• Generally no interoperability problems for basic SIP functionality– basic call, digest registration, call transfer, voice mail
• Weaker in advanced scenarios and backward compatibility– handling TCP, TLS– NAT support (symmetric RTP, ICE, STUN, ...)– multipart bodies– SIP torture tests– call transfer, call pick-up– video and voice codec interoperability (H.264, anything beyond G.711)
• SIPit useful, but no equivalent of WiFi certification– most implementations still single-vendor (enterprise, carrier) or vendor-
supplied (VSP)– SFTF (test framework) still limited
• Need profiles to guide implementers
February 2007 40
Trouble in Standards Land
• Proliferation of transition standards: 2.5G, 2.6G, 3.5G, …
– true even for emergency calling…
• Splintering of standardization efforts across SDOs
– primary:• IEEE, IETF, W3C, OASIS, ISO
– architectural:• PacketCable, ETSI, 3GPP, 3GPP2, OMA,
UMA, ATIS, …
– specialized:• NENA
– operational, marketing:• SIP Forum, IPCC, …
OASIS
IEEE
IETF
W3CISO (MPEG)
L2.5-L7protocols
dataexchange
dataformats
L1-L2
3G
PP
Pack
etC
abl
e
February 2007 41
IETF issues
• SIP WGs: small number (dozen?) of core authors (80/20)
– some now becoming managers…– or moving to other topics
• IETF: research engineering maintenance
– many groups are essentially maintaining standards written a decade (or two) ago
• DNS, IPv4, IPv6, BGP, DHCP; RTP, SIP, RTSP
• constrained by design choices made long ago
– often dealing with transition to hostile & “random” network
– network ossification
• Stale IETF leadership– often from core equipment
vendors, not software vendors or carriers
• fair amount of not-invented-here syndrome
• late to recognize wide usage of XML and web standards
• late to deal with NATs• security tends to be per-protocol
(silo)– some efforts such as SAML and
SASL
• tendency to re-invent the wheel in each group
February 2007 42
IETF issue: timeliness
• Most drafts spend lots of time in 90%-complete state
– lack of energy (moved on to new -00)– optimizers vs. satisfiers
• multiple choices that have non-commensurate trade-offs
• Notorious examples:– SIP request history: Feb. 2002 – May
2005 (RFC 4244)– Session timers: Feb. 1999 – May 2005
(RFC 4028)– Resource priority: Feb. 2001 – Feb
2006 (RFC 4412)• New framework/requirements phase
adds 1-2 years of delay• Three bursts of activity/year, with
silence in-between– occasional interim meetings
• IETF meetings are often not productive
– most topics gets 5-10 minutes lack context, focus on minutiae
– no background same people as on mailing list
– 5 people discuss, 195 people read email
• No formal issue tracking– some WGs use tools, haphazardly
• Gets worse over time:– dependencies increase,
sometimes undiscovered– backwards compatibility issues– more background needed to
contribute
February 2007 43
IETF issues: timeliness
• WG chairs run meetings, but are not managing WG progress– very little control of deadlines
• e.g., all SIMPLE deadlines are probably a year behind– little push to come to working group last call (WGLC)– limited timeliness accountability of authors and editors– chairs often provide limited editorial feedback
• IESG review can get stuck in long feedback loop– author – AD – WG chairs– sometimes lack of accountability (AD-authored documents)
• RFC editor often takes 6+ months to process document– dependencies; IANA; editor queue; author delays– e.g., session timer: Aug. 2004 – May 2005
February 2007 44
Conclusion
• Moving from lab and trials to large-scale deployments• Planning horizon includes turning off circuit-switched phones
– in large enterprises– in some carriers
• From emphasis on features to global scale:– interoperation– configuration– peer-to-peer systems– emergency services– overload behavior– failure detection across networks and protocol layers
• Integration of advanced features (IM, presence, video, programmable services) still lacking
• Current standardization processes slow and complexity-inducing