1
ShadowStream: Performance Experimentation as aCapability in Production Internet Live Streaming Networks
Present by: Chen Alexandre Tian (HUST)Richard Alimi (Google)Richard Yang (Yale)David Zhang (PPLive)
Live Streaming is Widely Used
2
• Many recent major events live streamed on the Internet
• Many daily events are streamed as well• Justin.tv, livestream, …
State of Art of Live Streaming System
Hybrid system (e.g., Adobe Flash 10.1 and later) CDN seeding P2P with BitTorrent-like protocols
3
Performance of Live Streaming System Become Difficult to Understand/Predict
System software becoming more complex
4
Internet Environment Complexity
5
ADSL Modem BufferPowerBoostInter-ISP throttling……
Misleading results if not considering real network features.
Need Evaluation at Right Scale
6
Misleading results if not considering the target scale.
Key Idea of ShadowStream
The production system provides an ideal evaluation platform: real users, real networks, at scale.
7
Starting Point: Use Experiment Algorithm On Real User
8
First Challenge: How to achieve both accuracy and user protection?
ExperimentCDN Protection
New pieces inject here
Virtual PlaypointMiss record here
UserPlaypoint
Two seconds later:
Issues of CDN Protection
9
• Scale• 100,000 Clients @ 1 mbps rate ->100Gbps• More demand with concurrent test channels
• Network bottleneck• There can be bottlenecks from CDN edge servers to streaming clients
New Idea: Scaling Up with Stable Protection
Observation: there already exists a stable version w/ reasonable performance
10
Issue: Losses of Experiment Accuracy.
Why Loss Accuracy?
11
Converge to a Balance Point
12
We should observe m(θ0), but instead we actually observe m(θ’).
Putting-Together: Cascading Protection for Accuracy and Scalability
13
Q: Any remaining challenge?
Real user behaviors differ from testing behaviors
Idea: transparently orchestrate experimental scenarios from existing, already playing clients
Virtual arrivals/virtual departures
14
Test specificationTriggeringVirtual arrival controlVirtual departure control
Independent Arrivals Achieving Global Arrival Pattern
Peer generate arrival times by drawing random numbers independently according to the same cumulative distribution function.
15
From Idea to System
16
Challenge: How to minimize developers’ engineering efforts?
Streaming Hypervisor
17
Hypervisor API need for each streaming engine getSysTime() getLagRange(), getMaxStartupDelay() writePiece(), getPieceMap()
Computing Windows Bounds
18
• Hypervisor calls getLagRange()
Sharing and Information Flow Control
19
Compositional Software framework
20
Example: Adding an admission control component
Evaluation:Experiment Accuracy & Protection
21
Only CDN as the Protection:
Cascaded Protection:
Evaluation: Experimental Opportunities
SH Sports channel and HN Satellite channel, pplive, September 6, 2010
22
Evaluation: Accuracy of Distributed Arrivals
23
Arrival function from “Performance and Quality-of-Service Analysis of a Live P2P Video Multicast Session on the Internet”. Sachin Agarwal, Jatinder Pal Singh, Aditya Mavlankar, Pierpaolo Bacchichet, and Bernd Girod, In Proceedings of IWQoS 2008. Springer, June 2008
Take Home IdeaMany Internet-scale systems are unique systems that are difficult to build/test.The ShadowStream scheme consists of following key ideas: Conduct shadow experiments using real system,
real users Protection and accuracy present dual challenges
Use Stable for scalable protection Introduce external resources (CDN) to remove
interference on competing resources Create shadow behaviors from real users
24
Thanks for coming!
Questions?
25
Metric of Live Streaming Performance
26
Piece missing ratio
Backup Slides
27
Streaming of the Internet
28
Virtual Sliding WindowA streaming engine has two sliding windows: an upload window (P2P) and a download window (CDN and P2P). Each engine call getSysTime() to Hypervisor, based on real system time and time shifted value, Hypervisor assign a virtual system time to each engine.Each engine calculate x(left) and x(right) of download windowEach engine advances its sliding window at the channel rate μ pieces per second.
29
30
The reasoning behind
31
•CDN see the original miss-ratio/supply-ratio curve•P2P Protection see the curve minus δ
Specification
Define multiple classes of clients (e.g., cable or DSL, estimated upload capacity class, or network location)A class-wide arrival rate function λj(t)Client’s lifetime is determined by the distribution Lx
32
Local Replacement for Uncontrolled Early Departures
Capturing client stateSubstitution
33
Triggering Condition
Predict(t): autoregressive integrated moving average (ARIMA) method that uses both recent testing channel states and the past history of the same program
34
Independent Arrivals Algorithm
35
CDN Capacity and window length
CDN window set to 4 seconds The TCP retransmission timeout is 3
seconds for piece loss 1 extra second for waiting retransmitted
piece
Window length
36
Starting up the engineWhen starting a streaming engine x, the Streaming Hypervisor gives x pointers to its download and upload windows.at time a(s), the client join test channel and Stable engine starts.at time a(e) >a(s), the client join testing, the Experiment Engine and CDN Protection Engine start.After starting, an engine begins to download pieces starting from the target playpoint to the end of its download window.The piece before startup should be protected by CDN, which would be counted by CDN capacity calculation
37
ShadowStream Outline
Motivation and ChallengeExperiment Protection and AccuracyExperiment OrchestrationImplementationEvaluation
38
Client Substitution
Client substitution delay with client dynamics.
39
Backup Slides
40
Sec. 8: Limitation Discussion
(Do we really need this?) If Exp consumes resources while no piece received at all (Give priority to Protection?)Download link are bottleneck
41
Modeling P2P ProtectionGiven experiment engine e, target rate R,
the miss ratio is mR,e(θ) , or, me(θ)
42
Given protection engine e, its target rate is me(θ), the required rescue bandwidth is Θk(me(θ),p)* me(θ)= η(e,p,θ)
P2P Protection no accurate result
43
•If P1 is the protection, there would exist balance point(s)•If P2 is the protection, there would be a negative feed-back loop•In either cases, there is no accuracy at all
44
45
Live Streaming
Live Streaming on Internet Live Audio/Video Content Distribution
on Internet e.g., NBC Winter Olympics 2010 live
Using Microsoft Silverlight® P2P live streaming
46
47
Example: PPLive
From PPLive’s Presentation Founded by Graduate Students from Huazhong
University of Science & Technology PPLive is
An online video broadcasting and advertising network provides online viewing experience comparable to TV
An efficient P2P technique platform and test bench
48
Estimated global installed base 75 million
Monthly active users* 20 million
Daily active users 3.5 million
Peak concurrent users 2.2 million
Monthly average concurrent users 1.5 million
Weekly average usage time 11 hours
Not Yet!
49
50
51
Challenges
How to achieve both experiment accuracy and user protection?How to produce desired experiment pattern?How to minimize developers’ engineering effort?
52
Starting Point: Use Experiment Alg. On Real User
53
A simple example
54
• No user-visible pieces misses • Missing piece 91 is recorded• Piece download assignment is adaptive
55
Three issues delete
56
• Information flow control: Although piece 91 is downloaded by the Protection Engine, it should not be labeled as downloaded in the Experiment Engine.• Duplicate avoidance: Since both Experiment Engine and Protection Engine are running, if their download windows overlap, they may download the same piece.• Experiment feasibility: This lag from realtime is determined when client i joins the test channel with the Protection Engine to make experiment and protection feasible.