Upload
duonghuong
View
235
Download
7
Embed Size (px)
Citation preview
The Elusive QoS
XiPeng Xiao, Ph.D. Director of Product Management
Riverstone Networks (now part of Lucent & Alcatel)[email protected]
2Copyright © XiPeng Xiao 2006. All rights reserved.
AgendaAgenda
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
3Copyright © XiPeng Xiao 2006. All rights reserved.
The Current QoS ModelThe Current QoS Model
Carriers differentiate multiple traffic classes to users and charge QoS fees for premium traffic classesUsers will pick the appropriate traffic class for their application
Carriers differentiate multiple traffic classes to users and charge QoS fees for premium traffic classesUsers will pick the appropriate traffic class for their application
Users are Diffserv-aware
4Copyright © XiPeng Xiao 2006. All rights reserved.
The QoS RealityThe QoS Reality
SLA is in placeBandwidth, round-trip delay, packet loss rate and possibly jitter are covered
SLA is looseBut sufficient for most applications
Quality is unpredictablePerceived inadequate for paid services
SLA is in placeBandwidth, round-trip delay, packet loss rate and possibly jitter are covered
SLA is looseBut sufficient for most applications
Quality is unpredictablePerceived inadequate for paid services
5Copyright © XiPeng Xiao 2006. All rights reserved.
AgendaAgenda
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
6Copyright © XiPeng Xiao 2006. All rights reserved.
Unclear Economic ModelUnclear Economic Model
How to sell QoS?To sell QoS, you must first admit that the existing service is not always good - does not work well in a competitive environment
Who should pay for QoS, senders or receivers?Nobody wants to pay extraTraffic must be prioritized at the source – more natural to let senders pay – but how much to charge a multicast sender while the receiver statistics (e.g. number of receivers) may not be available?Should not be the receivers because senders may not be able to provide QoS
How to sell QoS?To sell QoS, you must first admit that the existing service is not always good - does not work well in a competitive environment
Who should pay for QoS, senders or receivers?Nobody wants to pay extraTraffic must be prioritized at the source – more natural to let senders pay – but how much to charge a multicast sender while the receiver statistics (e.g. number of receivers) may not be available?Should not be the receivers because senders may not be able to provide QoS
7Copyright © XiPeng Xiao 2006. All rights reserved.
“A network operator shall not block, degrade, alter, modify, impair, or change any bits, content, application or service transmitted over the network of such operator”
-- Internet Non-Discrimination Act of 2006 Ron Wyden, U.S. Senator
“A network operator shall not block, degrade, alter, modify, impair, or change any bits, content, application or service transmitted over the network of such operator”
-- Internet Non-Discrimination Act of 2006 Ron Wyden, U.S. Senator
Uncertain Government RegulationUncertain Government Regulation
Regulation uncertainty will hamper QoS deployment with existing model
8Copyright © XiPeng Xiao 2006. All rights reserved.
Lack of InteroperabilityLack of Interoperability
Lack of interoperability among carriersDifferent carriers may offer different number of traffic classesDifferent carriers may interpret the same DSCP differentlyThere is no standardized inter-mapping between different carrier’s traffic classes
Lack of interoperability among vendorsDifferent vendors offer different QoS capabilityDifferent systems of the same vendor offer different QoS capabilityDifferent vendors advocate different approaches towards the same QoS goal
Lack of interoperability tends to blur differentiation between traffic classes
Lack of interoperability among carriersDifferent carriers may offer different number of traffic classesDifferent carriers may interpret the same DSCP differentlyThere is no standardized inter-mapping between different carrier’s traffic classes
Lack of interoperability among vendorsDifferent vendors offer different QoS capabilityDifferent systems of the same vendor offer different QoS capabilityDifferent vendors advocate different approaches towards the same QoS goal
Lack of interoperability tends to blur differentiation between traffic classes
9Copyright © XiPeng Xiao 2006. All rights reserved.
Other ChallengesOther Challenges
QoS architects are in short supplyFew people understand QoS theory, vendor implementation, and network operations
Having too many traffic classes causes lack of differentiationHow do you differentiate 8 traffic classes in an user perceivable way?
Too many pieces need to work togetherHow do Diffserv, BGP, IGP, MPLS, many flavors of TE, and Traffic Management (policing, shaping, hierarchical shaping, SPQ, WFQ, hierarchical scheduling, RED/WRED) fit into the QoS big picture?
Reliability is often overlooked as a QoS factorWithout failures, most traffic controls are not even needed
QoS architects are in short supplyFew people understand QoS theory, vendor implementation, and network operations
Having too many traffic classes causes lack of differentiationHow do you differentiate 8 traffic classes in an user perceivable way?
Too many pieces need to work togetherHow do Diffserv, BGP, IGP, MPLS, many flavors of TE, and Traffic Management (policing, shaping, hierarchical shaping, SPQ, WFQ, hierarchical scheduling, RED/WRED) fit into the QoS big picture?
Reliability is often overlooked as a QoS factorWithout failures, most traffic controls are not even needed
10Copyright © XiPeng Xiao 2006. All rights reserved.
The Reliability Factor: QoS Impact of an ASBR FailureThe Reliability Factor: QoS Impact of an ASBR Failure
Impact to that particular carriereBGP and iBGP must re-converge
• It can take many minutes, may cause CPU/memory utilization surge and trigger other problemsTraffic heading to that AS from all edge routers has to be rerouted to another ASBR (#2)
• This will cause major traffic shift inside the carrierMajor congestion may occur at the peering/transit link at ASBR #2
• This may create major QoS impact (packet loss or long delay)• This may violate the contract with the neighbor carrier
The internal traffic shift will cause large jitter and out-of-order delivery. It can also cause internal congestionIf internal congestion occurs, TE will cause further traffic reroute and associated QoS impactWhen TE and other controls are triggered, more software issues can be exposed, which may cause further failures
The neighbor carrier will have a sudden traffic increase/shift. The impact can ripple throughout the Internet, creating QoS impact throughout the InternetUsers will experience inconsistent service quality during the fluxWhen ASBR #1 is restored, there can be another major traffic flux and the associated QoS impact
Impact to that particular carriereBGP and iBGP must re-converge
• It can take many minutes, may cause CPU/memory utilization surge and trigger other problemsTraffic heading to that AS from all edge routers has to be rerouted to another ASBR (#2)
• This will cause major traffic shift inside the carrierMajor congestion may occur at the peering/transit link at ASBR #2
• This may create major QoS impact (packet loss or long delay)• This may violate the contract with the neighbor carrier
The internal traffic shift will cause large jitter and out-of-order delivery. It can also cause internal congestionIf internal congestion occurs, TE will cause further traffic reroute and associated QoS impactWhen TE and other controls are triggered, more software issues can be exposed, which may cause further failures
The neighbor carrier will have a sudden traffic increase/shift. The impact can ripple throughout the Internet, creating QoS impact throughout the InternetUsers will experience inconsistent service quality during the fluxWhen ASBR #1 is restored, there can be another major traffic flux and the associated QoS impact
Failures cause most of the unpredictability in QoS
11Copyright © XiPeng Xiao 2006. All rights reserved.
IP Network’s Reliability GapIP Network’s Reliability Gap
Network Availability
Gro
ss M
argi
n
99.5 99.9 99.95 99.99 99.999
IP
70%
60%
50%
40%
30%
20%
10%
Voice
Source: JPMorgan McKinsey Backbone report Sept. 8, 2000; Avici SystemsSource: JPMorgan McKinsey Backbone report Sept. 8, 2000; Avici Systems
12Copyright © XiPeng Xiao 2006. All rights reserved.
The Software Reliability FactorThe Software Reliability Factor
Causes of Unplanned Downtime in Carrier Networks
Software is by far the largest reliability factor
13Copyright © XiPeng Xiao 2006. All rights reserved.
LessonsLessons
A practical QoS solution must address economic and regulatory issues as well as technical issuesAchieving QoS is much more than applying traffic managementSoftware reliability is a key factor for QoS
A practical QoS solution must address economic and regulatory issues as well as technical issuesAchieving QoS is much more than applying traffic managementSoftware reliability is a key factor for QoS
14Copyright © XiPeng Xiao 2006. All rights reserved.
AgendaAgenda
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
15Copyright © XiPeng Xiao 2006. All rights reserved.
Overview of a New QoS ProposalOverview of a New QoS Proposal
The economic modelThe economic and regulatory benefitsThe technical approachThe technical benefits
The economic modelThe economic and regulatory benefitsThe technical approachThe technical benefits
16Copyright © XiPeng Xiao 2006. All rights reserved.
The New Economic ModelThe New Economic Model
What: Don’t sell traffic classes, sell applications insteadHow: Set tiered bandwidth offering, enable different applications with different amount of bandwidth For residential users
Internet access: 1Mbps, € 20Add video: 20Mbps, € 50Add VoIP and associated equipment (battery equipped home gateway): 25Mbps, € 60
For businessEntry service, 2Mbps, € 200Gold service, 10Mbps, € 500Platinum service, 100Mbps, € 1000
What: Don’t sell traffic classes, sell applications insteadHow: Set tiered bandwidth offering, enable different applications with different amount of bandwidth For residential users
Internet access: 1Mbps, € 20Add video: 20Mbps, € 50Add VoIP and associated equipment (battery equipped home gateway): 25Mbps, € 60
For businessEntry service, 2Mbps, € 200Gold service, 10Mbps, € 500Platinum service, 100Mbps, € 1000
Users are Diffserv-unaware
17Copyright © XiPeng Xiao 2006. All rights reserved.
The Economic and Regulatory BenefitsThe Economic and Regulatory Benefits
Clear selling pointLess regulation uncertaintyClear selling pointLess regulation uncertainty
18Copyright © XiPeng Xiao 2006. All rights reserved.
Clear Selling PointClear Selling Point
How to sell QoS?You don’t sell QoS, you sell services Whoever wants the services pays for the bandwidth associated with the services
Who should pay for QoS, senders or receivers?
Both senders and receivers pay for their bandwidth consumption, as they do today
How to sell QoS?You don’t sell QoS, you sell services Whoever wants the services pays for the bandwidth associated with the services
Who should pay for QoS, senders or receivers?
Both senders and receivers pay for their bandwidth consumption, as they do today
19Copyright © XiPeng Xiao 2006. All rights reserved.
Less Government Regulation UncertaintyLess Government Regulation Uncertainty
Charging model is essentially identical to today’s model which has been accepted by both the industry and the government
Charging model is essentially identical to today’s model which has been accepted by both the industry and the government
20Copyright © XiPeng Xiao 2006. All rights reserved.
Overview of the Technical ApproachOverview of the Technical Approach
Monitor the network, plan the network Classify traffic at the edgePut Diffserv in standby mode under normal conditionUse macro controls to prevent congestion, use micro controls to manage congestionIncrease software reliability to reduce QoS unpredictabilityTake advantage of CDN and application intelligence
Monitor the network, plan the network Classify traffic at the edgePut Diffserv in standby mode under normal conditionUse macro controls to prevent congestion, use micro controls to manage congestionIncrease software reliability to reduce QoS unpredictabilityTake advantage of CDN and application intelligence
21Copyright © XiPeng Xiao 2006. All rights reserved.
Monitor and Plan the NetworkMonitor and Plan the Network
Preventing disease is more effective than curing disease
Provision sufficient capacity, prevent network congestionMonitor link utilization, plan capacity expansionMonitor network delay, trigger early alarmAudit system configuration, audit system securityPlan ahead for the most catastrophic events
Provision sufficient capacity, prevent network congestionMonitor link utilization, plan capacity expansionMonitor network delay, trigger early alarmAudit system configuration, audit system securityPlan ahead for the most catastrophic events
22Copyright © XiPeng Xiao 2006. All rights reserved.
Classify Traffic at the EdgeClassify Traffic at the Edge
Packets are classified and marked at the edgeVoiceVideoBusiness DataBest Effort
Traffic management mechanisms can be applied if access resource is limited
Access resource may not be easily added, e.g. radio spectrumPolicing, shaping, class-based queueing can be used at the access point
Packets are classified and marked at the edgeVoiceVideoBusiness DataBest Effort
Traffic management mechanisms can be applied if access resource is limited
Access resource may not be easily added, e.g. radio spectrumPolicing, shaping, class-based queueing can be used at the access point
23Copyright © XiPeng Xiao 2006. All rights reserved.
Optimize the Normal Case Optimize the Normal Case
Under normal network condition, do not activate Diffserv
No class-based queueing or diffserv-aware TESimplicity leads to higher network availability, preventing most QoS problems from happening
Under normal network condition, do not activate Diffserv
No class-based queueing or diffserv-aware TESimplicity leads to higher network availability, preventing most QoS problems from happening
Optimizing the common case is better than optimizing the rare case
24Copyright © XiPeng Xiao 2006. All rights reserved.
Prevent Congestion, Manage CongestionPrevent Congestion, Manage Congestion
When failure causes congestion …Use inter-domain routing or TE to reduce/eliminate congestion
Inter-domain routing policy adjustment can change traffic distribution more dramatically than intra-domain TEThese are network level controls (macro controls)
If congestion cannot be eliminated, use class-based queueing, WRED to manage congestion
Device configuration should be prepared ahead of time so that they can be activated immediately when congestion appearsThese are device level controls (micro controls)
When failure causes congestion …Use inter-domain routing or TE to reduce/eliminate congestion
Inter-domain routing policy adjustment can change traffic distribution more dramatically than intra-domain TEThese are network level controls (macro controls)
If congestion cannot be eliminated, use class-based queueing, WRED to manage congestion
Device configuration should be prepared ahead of time so that they can be activated immediately when congestion appearsThese are device level controls (micro controls)
25Copyright © XiPeng Xiao 2006. All rights reserved.
Increase Software Reliability with a Modular Architecture Increase Software Reliability with a Modular Architecture
kernel
L2
Policies
L3 LAG
STPRouting ProtocolMPLS
VLANs
watchdog
CLI
SNMP
Modular SoftwareModular Software
L2
PoliciesL3
LAG
STP Routing MPLS
VLANsCLI
SNMP
Monolithic SoftwareMonolithic Software
Automatic process restartAutomatic process restart Complete system rebootComplete system reboot
STP2.0 L2
PoliciesL3
LAG
STP2.0 Routing MPLS
VLANsCLI
SNMPSYSTEM V2.0SYSTEM V2.0
Modular upgradesModular upgrades System-wide upgradesSystem-wide upgrades
STP
No fault containmentNo fault containmentFault containmentFault containment
26Copyright © XiPeng Xiao 2006. All rights reserved.
Increase Software ReliabilityIncrease Software Reliability
Modular software architecture can simplify software design and testing, isolate software faults and automatically repair software faults. It will increase MTBF and reduce MTTR. It is the most important technical factor in determining the QoS of a network
Modular software architecture can simplify software design and testing, isolate software faults and automatically repair software faults. It will increase MTBF and reduce MTTR. It is the most important technical factor in determining the QoS of a network
To err is human. To recover gracefully is QoS
27Copyright © XiPeng Xiao 2006. All rights reserved.
Take Advantage of CDN and Application IntelligenceTake Advantage of CDN and Application Intelligence
Take advantage of CDN, change the rule of game
Move content closer to the usersReduce the role of the network in QoS
Take advantage of application intelligence A certain amount of buffering significantly reduces the impact of jitterAdaptive Codec significantly reduces the impact of throughput variance
Take advantage of CDN, change the rule of game
Move content closer to the usersReduce the role of the network in QoS
Take advantage of application intelligence A certain amount of buffering significantly reduces the impact of jitterAdaptive Codec significantly reduces the impact of throughput variance
28Copyright © XiPeng Xiao 2006. All rights reserved.
The Technical BenefitsThe Technical Benefits
Simplified network operationSimplified network equipmentSimplified QoS interoperability
Simplified network operationSimplified network equipmentSimplified QoS interoperability
29Copyright © XiPeng Xiao 2006. All rights reserved.
Simplified Network OperationsSimplified Network Operations
No need to make different traffic classes perceivably different
All the carriers need to do is minimizing delayFewer controls, less configuration
Fewer human and software errors
No need to make different traffic classes perceivably different
All the carriers need to do is minimizing delayFewer controls, less configuration
Fewer human and software errors
30Copyright © XiPeng Xiao 2006. All rights reserved.
Simplified Network EquipmentSimplified Network Equipment
Simple TE is sufficient Sophisticated TE (e.g. Diffserv-aware TE) is not needed
Simple TE is sufficient Sophisticated TE (e.g. Diffserv-aware TE) is not needed
31Copyright © XiPeng Xiao 2006. All rights reserved.
Simplified QoS InteroperabilitySimplified QoS Interoperability
Because there is no need to create user-perceivable differentiation among traffic classes, the need for QoS interoperability is much reduced. Carriers can just focus on reducing delay, jitter and packet loss for all traffic without worrying about QoS interoperability.
Because there is no need to create user-perceivable differentiation among traffic classes, the need for QoS interoperability is much reduced. Carriers can just focus on reducing delay, jitter and packet loss for all traffic without worrying about QoS interoperability.
32Copyright © XiPeng Xiao 2006. All rights reserved.
A Preliminary Effectiveness StudyA Preliminary Effectiveness Study
Practiced in Global Crossing’s networkUsing network planning and TE to prevent congestion is the themeRound trip delay < 80ms US coast-to-coast
Practiced in Global Crossing’s networkUsing network planning and TE to prevent congestion is the themeRound trip delay < 80ms US coast-to-coast
33Copyright © XiPeng Xiao 2006. All rights reserved.
The Effectiveness of TEThe Effectiveness of TE
Note the difference in link utilization denoted by color
34Copyright © XiPeng Xiao 2006. All rights reserved.
Round-Trip Delay MatrixRound-Trip Delay Matrix
35Copyright © XiPeng Xiao 2006. All rights reserved.
Meet QoS Requirement of Applications Meet QoS Requirement of Applications
ITU G.114 Delay Recommendations One-way Delay Characterization of Quality 0 – 150 ms “acceptable for most user applications” 150 – 400 ms “may impact some applications” Above 400 ms “unacceptable for general network planning purposes”
RT applications can be easily supported, as long as there is no failure
36Copyright © XiPeng Xiao 2006. All rights reserved.
Simplicity vs. ControlSimplicity vs. Control
Cost of capacity goes down fast with Moore’s lawCost of control does not go down as fast because it has a large human factor (e.g. the intelligence of network operators and software developers)Over the long run, relying on capacity is more cost effective than relying on control
Cost of capacity goes down fast with Moore’s lawCost of control does not go down as fast because it has a large human factor (e.g. the intelligence of network operators and software developers)Over the long run, relying on capacity is more cost effective than relying on control
37Copyright © XiPeng Xiao 2006. All rights reserved.
AgendaAgenda
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
The status quo of QoSChallenges of the existing QoS modelA new QoS proposal Conclusions & topics for further research
38Copyright © XiPeng Xiao 2006. All rights reserved.
ConclusionsConclusions
Diffserv should become transparent to usersDiffserv needs not be used in normal network conditionReliability should be emphasized over traffic control
Diffserv should become transparent to usersDiffserv needs not be used in normal network conditionReliability should be emphasized over traffic control
39Copyright © XiPeng Xiao 2006. All rights reserved.
Topics for Further ResearchTopics for Further Research
Quantitative evaluation of overlay networksAn overlay network can provide some benefits. For example, an MPLS layer can offer TE and FRR. However, it also introduces additional control overhead and may increase network convergencetime. Under what circumstances is an overlay network justified?
Save max link utilization Given any network topology, how to determine analytically the max link utilization so that any single failure will not cause congestion? Any double failure?
Transient congestion preventionIn a large network with 10,000s of users, what is the idle capacity needed avoid transient congestion?
Scientific way to set WRED parametersDrop curve’s starting point and slope for each class
Quantitative evaluation of overlay networksAn overlay network can provide some benefits. For example, an MPLS layer can offer TE and FRR. However, it also introduces additional control overhead and may increase network convergencetime. Under what circumstances is an overlay network justified?
Save max link utilization Given any network topology, how to determine analytically the max link utilization so that any single failure will not cause congestion? Any double failure?
Transient congestion preventionIn a large network with 10,000s of users, what is the idle capacity needed avoid transient congestion?
Scientific way to set WRED parametersDrop curve’s starting point and slope for each class
40Copyright © XiPeng Xiao 2006. All rights reserved.
Additional ReadingAdditional Reading
The Rise of the Stupid NetworkDavid Isenberg, http://isen.com/
TCP Processing of the IPv4 Precedence FieldXiPeng Xiao, ftp://ftp.rfc-editor.org/in-notes/rfc2873.txt
41Copyright © XiPeng Xiao 2006. All rights reserved.
IEEE JSAC Special IssueIEEE JSAC Special Issue
“Traffic Engineering for Multi-layer Networks”New deadline
April 15, 2006Contact
[email protected]@ieee.org
42Copyright © XiPeng Xiao 2006. All rights reserved.
Q & AQ & A
43Copyright © XiPeng Xiao 2006. All rights reserved.
Thank you!Thank you!