View
4
Download
0
Category
Preview:
Citation preview
CASE STUDYWHITEPAPER
1
Figure 1
SANDVINE.COM
Reveal network quality with ScoreCardScorecards such as Speedtest.net, Google’s Video Quality Report and Netflix’s Internet
Service Provider (ISP) Speed Index as well as newly launched fast.com broadband
speed testing site, are popular with consumers and the media due to their controversial
nature. However, they offer very little beyond measuring bandwidth - an incomplete
measurement of quality of experience (QoE). Broadband network engineers need data
they can use to isolate and address network problems and shortcomings or to offer
next-gen services for enterprise clients. To gain a better perspective of QoE, service
providers need to understand how much data is traversing their networks and where
the traffic is heaviest.
Application, device type, location, time of day and service plan can all impact Quality of
Experience. When it comes to over-the-top (OTT) applications, different outlets (Skype,
YouTube, Netflix etc.) run different streaming types with differing resolution levels and may
originate from Content Delivery Networks (CDNs) with different levels of robustness or
congestion. When you add WiFi and the wide variety of “Bring Your Own Device” (BYOD)
devices, operating systems and screen sizes, it presents an even more complex measurement
scenario. The figure below is an Netflix ISP Index, which only addresses the average streaming
speed for the networks listed (while figures are real we have changed the names of the ISPs).
Netflix ISP Speed
Index
Speed =/ Quality
WHITEPAPER
CASE STUDYWHITEPAPER
2 SANDVINE.COM
Service Providers need a more granular understanding of these metrics with regard to
their own networks. Approximately 10% of QoE concerns are caused by content, service
availability, device or OSS issues; which engineers can do little about. This paper will explore
how network engineers can use their own QoE scorecard to isolate the location and cause of
problems for better network planning and management and to improve current and next-
generation business class services. It will also explore how Service Providers can fight back
against scorecards publshed by Google and Netflix.
BANDWIDTH METRICS AS A QOE PROXYService Providers are battling for consumer mindshare on broadband services. On one side
are consumers and telecommunications regulatory bodies who are demanding better quality,
greater value and service transparency from broadband operators; on the other side are
service providers’ competitors, ranging from cable operators, DSL and fiber and x (FTTx)
operators (including municipal fiber networks) all the way to fixed-mobile substitution offerings
from mobile operators. To succeed in this environment, service provider network engineers
need more a better view of the actual service being delivered to their customers.
Most service provider networks measure their quality by the speed of a user’s connection,
often by using peak and average bandwidth delivered from their broadband aggregation
or mobile anchoring point. However, this is not a good measurement of quality as it neither
accurately measures the subscriber experience at application level nor gives the operator a
view on whether a subscriber has had a bad experience at any point in the billing period.
Not all applications are bandwidth sensitive, meaning subscribers can have a great experience
even at low bandwidth rates. However, they may require low latency and/or low or no packet
loss. For example, VoIP applications do not need much bandwidth, but low latency and a
fairly robust connection ensure smooth conversations. Additionally, Web browsing does not
usually require high bandwidth, but a poor connection and high latency can cause consumers
to abandon their browsing sessions if pages take longer than 5-7 seconds to load. Gaming,
particularly first person shooters and massively multiplayer online role-playing games
(MMORPGs), require low latency in the network to ensure that the gaming experience is “fair”
and seamless. Even social network sharing applications such as Instagram, Facebook or
YouTube can be a bad experience if the network is poor and uploads take longer than normal
as a result of packet loss.
Average bandwidth is smoothed over an interval period and traffic at non-peak hours may be
tens of Mbps, while less than 1Mbps during peak usage if congestion is not managed and a
high profile online event is occurring. A good example of this scenario is the Super Bowl or
World Cup where many viewers stream the same content simultaneously. In this scenario, the
average bandwidth on the network may appear to be satisfactory, but users on a congested
link during peak hours will not be happy. Peak hours are where the greatest opportunity for
a bad experience to cause churn occurs, especially for heavy video streamers. It also means
that a subscriber may get one (1) second of 20Mbps followed by 60 seconds of 100kbps
utilization. While statistically this may look ok, the user experience will be extremely poor.
Lastly, average bandwidth reporting is also heavily influenced by what devices are accessing
the network, as mobile or Internet of Things (IoT) devices may consume smaller amounts of
bandwidth. This is one of the biggest weaknesses in Netflix’s ISP Scorecard. An HD stream
from a gaming console or smart TV will consume a lot of bandwidth, while the same video
on an iPhone will use much less traffic – even though both cases may result in a good quality
of experience. Inversely, if an HDTV is getting the bandwidth needed by an iPhone for video
streaming, the user will most likely be unsatisfied with the quality of the video.
WHITEPAPER
DOWNLOADING
GAMING
SOCIAL MEDIA
STREAMING VIDEO
UPLOADING
VOICE APPLICATIONS
WEB SURFING
Different applications have different network requirements in order to provide good quality of experience
CASE STUDYWHITEPAPER
3
DEFINING SUBSCRIBE EXPERIENCE METRICSThe Federal Communications Commission (FCC) and the European Union (EU) have selected
several metrics in their quest to ensure that consumers receive good quality experience from
their broadband operators. An excerpt from the FCC filing [Emphasis added in bold]:
“With respect to network performance, we adopt the following enhancements:
l The existing transparency rule requires disclosure of actual network performance. In
adopting that requirement, the Commission mentioned speed and latency as two
key measures. Today we include packet loss as a necessary part of the network
performance disclosure.
l We expect that disclosures to consumers of actual network performance data
should be reasonably related to the performance the consumer would likely
experience in the geographic area in which the consumer is purchasing service.
l We also expect that network performance will be measured in terms of average
performance over a reasonable period of time and during times of peak usage.
l We clarify that, for mobile broadband providers, the obligation in the existing
transparency rule to disclose network performance information for “each broadband
service” refers to separate disclosures for services with each technology (e.g., 3G
and 4G). Furthermore, with the exception of small providers, mobile broadband
providers today can be expected to have access to reliable actual data on
performance of their networks representative of the geographic area in which the
consumer is purchasing service - through their own or third-party testing - that
would be the source of the disclosure. 410 Commission staff also continue to refine
the mobile MBA program, which could at the appropriate time be declared a safe
harbor for mobile broadband providers.”
The excerpt above, from pages 73-74 of FCC-15-24A1* of the official FCC document on
network neutrality, is one of the most critical parts of the announcement for subscribers.
Taken outside the financial aspect of the bright line rules to protect the “free” aspect of the
Internet, this section highlights a growing problem for consumers with broadband services:
Speed is not always a good indicator of service.
We will explore each one of the metrics mentioned above and why that metric is important for
measurement.
* https://apps.fcc.gov/edocs_public/attachmatch/FCC-15-24A1.docx
SANDVINE.COM
CASE STUDYWHITEPAPER
4
Figure 3
Graph of throughput over time,
showing granularity of varying
sample rates
BANDWIDTHBandwidth is an important measure of a subscriber’s broadband experience, and most
broadband plans are priced and marketed based on their peak bandwidth capability. Many
applications, video streaming in particular, are bandwidth-hungry and rely on having access to
a high bandwidth to perform well.
These applications are often the most high profile for a subscriber and are frequently the reason
that a consumer purchases a higher bandwidth plan. A perceived failure to deliver the advertised
speeds will create customer dissatisfaction while a systemic failure to deliver high throughput will
put the customer at risk of churn. The most important bandwidth measurement is a subscriber’s
throughput during peak usage and times of congestion, as it is a measurement that is most
relevant to the operator delivering quality during times of resource constraints on the network.
Bandwidth is also a two-way measurement; some applications, like video streaming, need good
download speeds while others, like social networking sharing or cloud backup services, require
good upload speeds. Although download speed is often the main metric measured by speed
test applications, upload speeds can be important for certain applications.
One of the keys in measuring bandwidth is selecting the right interval. Five minute and fifteen
minute samples are averages and not very effective at detecting individual user level QoE,
but maybe useful for macro level analysis (node, link or network level). A single minute with
60 one-second measurements tells a far different story, with second level peaks and valleys
emerging at the individual user level. Sampling at sub-second intervals is a good balance
between averaging bursts and capturing realized throughput for even very short connections
(like web browsing).
Key in measuring bandwidth is selecting the right interval
Time (Seconds/250 ms)
250 ms sample rate shows 2.5 Mbps
1 sec sample rate shows 1.5 Mbps
5 minute sample rate shows 6 kbps
Thro
ughp
ut M
bps
10 2 3 ... 5 mins
2
1
3
SANDVINE.COM
CASE STUDYWHITEPAPER
5
LATENCY
Latency is an important metric for interactive applications. Few consumer plans offer latency
as a service level agreement (SLA) but it is sometimes included as part of a managed service
offering for business connections. Anyone that has ever played in an online first person shooter
game can tell you that latency “literally” kills! Voice connections that experience high latency
exhibit this by the two speakers talking over each other, creating a very frustrating conversation.
Excessive buffering, congestion or the simple physics of transmission across long distances
(i.e. overseas or satellite connections) on the network can all be the source of network latency.
Some latency, such as transmission latency, cannot always be fixed in the network.
“The radio access networks have build in retransmission capabilities. If the packets are lost
on in the air, the base station controller will simple retransmit the packet to try to secure
it’s delivery. While the other nodes in the IP path of the session are not notified about this
process, the TCP transaction is not acknowledged until the packet delivery is confirmed. This
means that the overall delay of the TCP session increases. This can be detected by systems
monitoring the individual sessions and as the destination is identified a specific cell ID,
conclusions can be taken on the quality delivered by the radio access network. “
PACKET LOSS
Packet Loss can also cause customer dissatisfaction with their broadband connection.
Packet Loss can result in increased buffering and stalls in video streaming, slow web page
load times and jittery voice applications, reducing the goodput (as opposed to throughput) on
the network. Packet loss wastes bandwidth on the network as packets are retransmitted and,
depending on the application types, can create havoc with the subscriber experience. For a
Web shopping session in a browser, packet-loss of even just 1 % can result in double page-
loading times, significantly impacting the QoE.
Packets can be dropped intelligently with correct queuing technique (with active queue
management for example) to ensure that traffic is not randomly dropped across all
connections, but managed so that the applications back off, slowing their rate of
transmission on the network to reduce congestion. Packet Loss percentage is measured by
dividing packets seen by packets lost.
?
SANDVINE.COM
CASE STUDYWHITEPAPER
6
The question is how to gain visibility into these metrics for subscribers. A method used by
some operators or broadband regulators is to measure quality using active probes that may
be distributed around a network. These probes periodically generate traffic to measure the
network quality, often with more than just bandwidth measurements, but they are doing it
at a specific instant in time. An unintended consequence is the act of measuring the quality
may impact the quality of other subscribers. They also do not capture the quality for individual
subscribers, and can’t determine if anything other than point-to-point issues are the source
of the quality degradation. If you read the above excerpt from the FCC filing, that method
of measurement does not meet the FCC’s requirement for “a reasonable period of time and
during times of peak usage”. It also does not add packet loss as a measurement.
What is needed is to measure the actual performance delivered by the network at sub-second
intervals for each subscriber that is active on the network. The measurements should include
the download and upload performance, latency, and loss for all traffic from each subscriber.
Then the actual experience delivered to the subscriber can be calculated at any time during
their billing period, and each location in the network can be scored based on the performance
of the network.
These measurements can be made using passive monitoring of the network traffic.
Furthermore, to gather statistically relevant information, these measurements need to take
place over longer periods of time (weeks, not days) so that the peak hours can be accurately
mapped out (as described above). The metrics are only relevant for the locations and
subscribers that are measured, so if the operator is looking for a network-wide or individual
subscriber measurement, all locations in the network must be covered.
Figure 4
Score effects of latency,
packet loss and poor
throughput on the
performance application categories
Measuring the subscriber experience
For a network-wide or individual
subscriber measurement,
all network locations must
be covered
SANDVINE.COM
CASE STUDYWHITEPAPER
7
DEPLOYMENT SCENARIOSFrom an end-to-end perspective, the measurements should take place as close to the
subscriber as possible but can be taken anywhere in the network. Measuring at the OTT
applications’ location is providing one app’s end-to-end view but does not represent the
scoring of the service provider’s networks in question. Also it leaves out relevant meta-data
so that it does not allow the results to become actionable. At the same time, measuring at the
subscriber’s location would result into many measurement points becoming costly and difficult
to manage. The ideal location is close to the interconnect point between the access network
and the peering partners, at a location where all traffic can be captured.
MOBILE BROADBAND NETWORKS
A deployment in a mobile network will be behind the GGSN or PGW, where the majority of
traffic is captured for the longest period on the network owned by the operator.
Measurements in a mobile core network need to be enriched with information about their
source location in order to become truly valuable. Cell ID can be taken from several sources,
for example the RADIUS update feed. When measurements are enriched with location
information it is possible to group them and understand the impact the location’s cell is
making on the quality of the sessions carried by it.
CABLE BROADBAND NETWORKSCable operators can choose strategic aggregation points in the network to place a capturing
device, alternatively deploy very close to the CMTSes to capture all traffic. A vCPE can be an
attractive solution here, running the packet capturing software virtualized on COTS hardware
hosted together with other software.
Also in cable networks adding the location of the traffic is an important addition to the
measurements in order to locate bad CMTS equipment or geographical areas in the network’s
topology that are causing poor QoE. CMTS awareness in the measurement probe is critical in
order to add this information.
Figure 5
ScoreCard deployment in a mobile
network
SANDVINE.COM
CASE STUDYWHITEPAPER
8
Figure 6
ScoreCard deployment
in a cable network
Figure 7
ScoreCard deployment
in a fixed fiber or DSL
network
FIXED FIBER OR DSL NETWORKSFor DSL and FTTH networks data collection should be done towards the interconnect point
of the network to capture as much of the network path on the service provider’s infrastructure.
In order to capture more on-net traffic, especially traffic between subscribers, a vCPE
structure of smaller virtual experience probes can be used co-located with the network
aggregation infrastructure.
SANDVINE.COM
CASE STUDYWHITEPAPER
9
With the measurements available, any exploration of the subscriber experience now needs
to factor in the expectations of the applications that a subscriber might be using. The table
below highlights the expectations that different applications have from the network and how
each of application type reacts to throughput, latency and loss changes on the network.
As you can see from the table above, simply delivering a good average throughput on the
network will not necessarily result in a good subscriber experience with applications like Web
Surfing, Voice or Gaming. A good average score does not translate to a consistently good
experience either as conditions may vary throughout the day, week, or month. However, if
the operator has collected these key metrics, they can create their own network experience
scorecard that they can use to improve the experience they deliver to their subscribers.
CREATING A NETWORK EXPERIENCE SCORECARD
Once the service provider has collected the metrics for all of their subscribers’ traffic, they
can begin to construct their own scorecard for how their network is delivering. Visualized
the data from multiple perspectives enables views of the entire network, specific locations,
service plans or connection types (3G/4G, DSL generation or DOCSIS versions for example)
so that the service providers can drill down to discover the root cause for any degradation of
performance.
To best visualize the metrics, a transformation matrix that maps throughput, latency, and loss
into a simple letter grade for each application type allows the operator to quickly determine
how their network is performing for the different application types. As applications change
their expectations over time, the matrix must be updated to reflect the new application
landscape. One example of this will be when video streaming shifts from high definition (HD)
to 4K resolution, raising bandwidth requirements for achieving a good score in video. The
scoring of the application experience would require an understanding of how applications
behave on networks and would use a matrix that graded the performance as shown on the
following page:
Application Type Throughput Latency Loss
WebNeeds short
bursts of download performance
High latency leads to slow page
load times
Packet Loss can lead to slow page
load times
Video Sustained throughput delivers good quality
Not usually a concern except for
initial loading of video
Less sensitive to loss unless it affects
throughput
Social MediaNeeds short bursts of download/upload
performance
High latency can slow interactive sharing
experience
Packet Loss can slow interactive sharing
experience
GamingMost games
do not require high bandwidth
High latency leads to lag in
real-time games
Packet Loss leads to lag in
real-time games
UploadSustained
bursts of upload performance
N/A N/A
DownloadSustained
bursts of download performance
N/A N/A
Voice Low throughput requirements
High latency leads to poor voice
experience
Some loss can be tolerated, high
loss leads to perceived latency
Application Types and
Subscriber Experience
Metrics
Scoring the network
SANDVINE.COM
CASE STUDYWHITEPAPER
10
The scoring matrix above is used to transform the throughput, latency, and loss metrics from
the network into application-specific scores. The visualization to the left is an example of what
this looks like.
Operators can now present their ScoreCard in a simple to understand format scaling from A
to F. A score of “C” for Web Surfing likely means that there are some issues with packet loss.
Social Media and especially Real-Time Gaming are very latency sensitive, and their scores
are adjusted to the network’s capabilities accordingly. Streaming video is not impacted by the
latency because of the local buffering, and therefor has the possibility to score a maximum or “A”.
MEASUREMENT INTERVALS
Sub-second intervals are needed in order to capture the quality delivered even on short
session. For instance, a post on social media, or a download of a newspaper site optimized
for speed. Research has shown that 250ms is a good balance between averaging bursts and
capturing realized throughput.
BREAKDOWN TO ENABLE ACTION
With this type of visualization, the service provider can quickly determine if their network is
delivering a good experience for the applications that drive subscriber usage. However, the
visualization should also enable a drill down into the root cause of the network impairment
and enable network engineers to improve the network experience score. Investigation of a
degraded score might reveal a specific location is experiencing systemic congestion and
over-utilization; splitting the node would resolve the issue. Combining the technical issues
with service degradation ensures that the service provider can make the right business
decisions on where to invest in their network to improve their score and have the best return
on investment. For example, investigation may reveal that a specific service plan is delivering
degraded QoE simply because it is delivering exactly the expected throughput for that plan.
For example, a 1Mbps connection will never score an A for video.
To enable breakdown that can enable action commonalities have to be found and scores
should be grouped to find deviations between them. Scores could be grouped by:
l Access Technology (2G/3G/4G, DSL, Cable, FTTH, WiFi)
l Location (Cell ID, City)
l Topology (Points of Presence, Area or Access Point)
l Device (Handset model or brand, Cable/DSL Modem)
l Subscriber Tier
Score Throughput
A Exceptional experience
B Almost perfect, but some slight impairments noticed
C Good experience but noticeable impairments
D Usable with frustrating impairments
E Really poor
F Unusable
ABCDEF
SANDVINE.COM
CASE STUDYWHITEPAPER
11
LOCATION
A break down of the performance per location is showing the score of all the sessions that
originated or terminated in a particular Cell ID, or Access Point. While each individual session
could have been impacted by many different factors, including device type, OTT service
quality etc, looking at all sessions with the same location in common over time shows if there
are commonalities.
Increased delay will mean the radio access network is dropping packet and the base-station
is retransmitting, inducing delay on the overall TCP connection (Figure 7).
If packet loss is measured instead, it means the packets are retransmitted by it’s source.
with many TCP sessions with the same CellID in common experiencing such packet loss, it’s
likely the backhaul link toward the base station is congested and dropping packets (Figure 8).
By measuring in one central location but enriching the measurements with location the root
cause of the problem can be identified.
Figure 8
RAN issues caused by
high latency
Figure 9
ScoreCard identifies problem
RAN from high latency
measurement
SANDVINE.COM
CASE STUDYWHITEPAPER
12
TOPOLOGYA break down by topology is equally important. This will reveal if particular infrastructure
nodes are not performing up to standards. It can also be an indication that there is a miss
configuration of the nodes. Very common is a miss configuration of nod load balancing,
causing just half of the infrastructure to have an impact on the quality of experience.
Figure 10
High packet loss indicates
the problem lies between
the RAN and the
aggregator
Figure 11
See score per VNF instance
B F B DNetwork VNF Network VNF
Network VNF
B F
D
Figure 12
Identify problem areas in network
topology
VNFAs operators move to a virtualized infrastructure, measuring the performance delivered by
each virtualized network function (VNF) is even more important. Real-time enrichment can also
group session scores per VNF, measuring the experience delivered in a next-gen infrastructure.
SANDVINE.COM
CASE STUDYWHITEPAPER
13
Figure 14
Discover scores based on sunscriber
tier
Figure 13
See scores based on
device type
DEVICEDevices can have a huge impact on QoE delivered. With promotions it is crucial for the
operator to understand the QoE delivered by only those brands or models promoted. In
negotiations with 3rd party suppliers it is also key to understand the impact their devices
are making on the overall experience delivered. With real-time data enrichment of the
measurement reports, IMEA codes or device IDs retrieved from HTTP headers can be used to
group and sort the score data.
SUBSCRIBER TIERThrough integration with the CRM system, measurement reports can be enriched with what
rate plan the subscriber belonged to at the time of recording the score. Grouping this data
provides the means for an operator to visualise the quality delivered to gold, silver and bronze
subscribers or separate QoE per reseller channel. Subscribers can be in several groups at the
same time, enabling simultaneous monitoring of demographics, channels and rate plans.
SANDVINE.COM
CASE STUDYWHITEPAPER
14
The operator can use this tool to proactively determine when subscribers are receiving a bad
experience from the network and solve systemic issues before they cause subscriber churn.
Once the root cause is identified, the operator can take action to improve their network in
multiple ways:
l Avoid non-managed congestion (full links)
l Either upgrade them or manage them with traffic management strategies such as active
queue management.
l Don’t let the top 10-20% of subscribers bring the score down for the entire link
l If the bandwidth utilization is fairly divided, then you need more bandwidth
l Fix lossy links at the physical or logical level to reduce packet loss
l Avoid congested backhaul links
l Avoid latency-adding media for backhaul like microwave or satellite
l Move as many subscribers as possible to the latest technology standards, possibly
incentivizing subscribers to upgrade hardware or service tiers
l Move volume away from peak times by making off-peak usage cheaper
l Don’t optimize the delivery for downstream or for bandwidth at the cost of latency and loss
l Promoting devices that you know provide a minimal impact to the QoE
Experience discovery scenarios
SANDVINE.COM
CASE STUDYWHITEPAPER
15
Now that the operator has a view on what experience the network is delivering to their
subscribers, they can use this intelligence to better market to their target subscribers. A
network that is delivering a superior gaming experience can offer gaming packages with an
SLA for latency. A high throughput network with excess capacity in certain locations can offer
video streaming packages to subscribers located in those areas. The service provider can
even market their service packages with a grade. For example, offering an “A” video service
package for a premium over a “C” video package, and thereby setting the expectation of the
subscribers before they purchase the service.
The service provider can also use this to report to regulatory bodies their performance as
described in the FCC Network Neutrality filings. This type of network scoring delivers every
metric that is asked for by the FCC, and shows a strong commitment to their subscribers and
their experience.
ABBREVIATIONSCMTS Cable Modem Termination System
DOCSIS Data Over Cable Service Interface Specification
DSL Digital Subscriber Line
DSLAM Digital Subscriber Line Access Multiplexer
EU European Union
FCC Federal Communications Commission
FTTx Fiber to the x (Curb, Home, Premise, etc)
HD High Definition
ISP Internet Service Provider
Mbps Million bits per second
MMORPG Massively Multiplayer Online Role Playing Game
QoE Quality of Experience
Differentiation with Network Scoring
SANDVINE.COM
CASE STUDYWHITEPAPER
Copyright © 2015 Sandvine Networks. All rights reserved. All other trademarks are property of their respective owners. SANDVINE.COM
ABOUT SANDVINESandvine helps organizations run world-class networks with Active Network Intelligence, leveraging machine learning analytics and closed-loop automation to identify and adapt to network behavior in real-time. With Sandvine, organizations have the power of a highly automated platform from a single vendor that delivers a deep understanding of their network data to drive faster, better decisions. For more information, visit sandvine.com or follow Sandvine on Twitter at @Sandvine.
Copyright © 2018 Sandvine. All rights reserved. All other trademarks are property of their respective owners. SANDVINE.COM
USA 47448 Fremont Blvd, Fremont, CA 94538, USAT. +1 510.230.2777
EUROPE Birger Svenssons Väg 28D 432 40 Varberg, Sweden T. +46 (0)340.48 38 00
CANADA 408 Albert Street, Waterloo, Ontario N2L 3V3, Canada T. +1 (0)519.880.2600
ASIA Ardash Palm Retreat, Bellandur, Bangalore, Karnataka 560103, India T. +91 80677.43333
v20171219
Recommended