1 ShadowStream: Performance Experimentation as a Capability in Production Internet Live Streaming...

Preview:

Citation preview

1

ShadowStream: Performance Experimentation as aCapability in Production Internet Live Streaming Networks

Present by: Chen Alexandre Tian (HUST)Richard Alimi (Google)Richard Yang (Yale)David Zhang (PPLive)

Live Streaming is Widely Used

2

• Many recent major events live streamed on the Internet

• Many daily events are streamed as well• Justin.tv, livestream, …

State of Art of Live Streaming System

Hybrid system (e.g., Adobe Flash 10.1 and later) CDN seeding P2P with BitTorrent-like protocols

3

Performance of Live Streaming System Become Difficult to Understand/Predict

System software becoming more complex

4

Internet Environment Complexity

5

ADSL Modem BufferPowerBoostInter-ISP throttling……

Misleading results if not considering real network features.

Need Evaluation at Right Scale

6

Misleading results if not considering the target scale.

Key Idea of ShadowStream

The production system provides an ideal evaluation platform: real users, real networks, at scale.

7

Starting Point: Use Experiment Algorithm On Real User

8

First Challenge: How to achieve both accuracy and user protection?

ExperimentCDN Protection

New pieces inject here

Virtual PlaypointMiss record here

UserPlaypoint

Two seconds later:

Issues of CDN Protection

9

• Scale• 100,000 Clients @ 1 mbps rate ->100Gbps• More demand with concurrent test channels

• Network bottleneck• There can be bottlenecks from CDN edge servers to streaming clients

New Idea: Scaling Up with Stable Protection

Observation: there already exists a stable version w/ reasonable performance

10

Issue: Losses of Experiment Accuracy.

Why Loss Accuracy?

11

Converge to a Balance Point

12

We should observe m(θ0), but instead we actually observe m(θ’).

Putting-Together: Cascading Protection for Accuracy and Scalability

13

Q: Any remaining challenge?

Real user behaviors differ from testing behaviors

Idea: transparently orchestrate experimental scenarios from existing, already playing clients

Virtual arrivals/virtual departures

14

Test specificationTriggeringVirtual arrival controlVirtual departure control

Independent Arrivals Achieving Global Arrival Pattern

Peer generate arrival times by drawing random numbers independently according to the same cumulative distribution function.

15

From Idea to System

16

Challenge: How to minimize developers’ engineering efforts?

Streaming Hypervisor

17

Hypervisor API need for each streaming engine getSysTime() getLagRange(), getMaxStartupDelay() writePiece(), getPieceMap()

Computing Windows Bounds

18

• Hypervisor calls getLagRange()

Sharing and Information Flow Control

19

Compositional Software framework

20

Example: Adding an admission control component

Evaluation:Experiment Accuracy & Protection

21

Only CDN as the Protection:

Cascaded Protection:

Evaluation: Experimental Opportunities

SH Sports channel and HN Satellite channel, pplive, September 6, 2010

22

Evaluation: Accuracy of Distributed Arrivals

23

Arrival function from “Performance and Quality-of-Service Analysis of a Live P2P Video Multicast Session on the Internet”. Sachin Agarwal, Jatinder Pal Singh, Aditya Mavlankar, Pierpaolo Bacchichet, and Bernd Girod, In Proceedings of IWQoS 2008. Springer, June 2008

Take Home IdeaMany Internet-scale systems are unique systems that are difficult to build/test.The ShadowStream scheme consists of following key ideas: Conduct shadow experiments using real system,

real users Protection and accuracy present dual challenges

Use Stable for scalable protection Introduce external resources (CDN) to remove

interference on competing resources Create shadow behaviors from real users

24

Thanks for coming!

Questions?

25

Metric of Live Streaming Performance

26

Piece missing ratio

Backup Slides

27

Streaming of the Internet

28

Virtual Sliding WindowA streaming engine has two sliding windows: an upload window (P2P) and a download window (CDN and P2P). Each engine call getSysTime() to Hypervisor, based on real system time and time shifted value, Hypervisor assign a virtual system time to each engine.Each engine calculate x(left) and x(right) of download windowEach engine advances its sliding window at the channel rate μ pieces per second.

29

30

The reasoning behind

31

•CDN see the original miss-ratio/supply-ratio curve•P2P Protection see the curve minus δ

Specification

Define multiple classes of clients (e.g., cable or DSL, estimated upload capacity class, or network location)A class-wide arrival rate function λj(t)Client’s lifetime is determined by the distribution Lx

32

Local Replacement for Uncontrolled Early Departures

Capturing client stateSubstitution

33

Triggering Condition

Predict(t): autoregressive integrated moving average (ARIMA) method that uses both recent testing channel states and the past history of the same program

34

Independent Arrivals Algorithm

35

CDN Capacity and window length

CDN window set to 4 seconds The TCP retransmission timeout is 3

seconds for piece loss 1 extra second for waiting retransmitted

piece

Window length

36

Starting up the engineWhen starting a streaming engine x, the Streaming Hypervisor gives x pointers to its download and upload windows.at time a(s), the client join test channel and Stable engine starts.at time a(e) >a(s), the client join testing, the Experiment Engine and CDN Protection Engine start.After starting, an engine begins to download pieces starting from the target playpoint to the end of its download window.The piece before startup should be protected by CDN, which would be counted by CDN capacity calculation

37

ShadowStream Outline

Motivation and ChallengeExperiment Protection and AccuracyExperiment OrchestrationImplementationEvaluation

38

Client Substitution

Client substitution delay with client dynamics.

39

Backup Slides

40

Sec. 8: Limitation Discussion

(Do we really need this?) If Exp consumes resources while no piece received at all (Give priority to Protection?)Download link are bottleneck

41

Modeling P2P ProtectionGiven experiment engine e, target rate R,

the miss ratio is mR,e(θ) , or, me(θ)

42

Given protection engine e, its target rate is me(θ), the required rescue bandwidth is Θk(me(θ),p)* me(θ)= η(e,p,θ)

P2P Protection no accurate result

43

•If P1 is the protection, there would exist balance point(s)•If P2 is the protection, there would be a negative feed-back loop•In either cases, there is no accuracy at all

44

45

Live Streaming

Live Streaming on Internet Live Audio/Video Content Distribution

on Internet e.g., NBC Winter Olympics 2010 live

Using Microsoft Silverlight® P2P live streaming

46

47

Example: PPLive

From PPLive’s Presentation Founded by Graduate Students from Huazhong

University of Science & Technology PPLive is

An online video broadcasting and advertising network provides online viewing experience comparable to TV

An efficient P2P technique platform and test bench

48

Estimated global installed base 75 million

Monthly active users* 20 million

Daily active users 3.5 million

Peak concurrent users 2.2 million

Monthly average concurrent users 1.5 million

Weekly average usage time 11 hours

Not Yet!

49

50

51

Challenges

How to achieve both experiment accuracy and user protection?How to produce desired experiment pattern?How to minimize developers’ engineering effort?

52

Starting Point: Use Experiment Alg. On Real User

53

A simple example

54

• No user-visible pieces misses • Missing piece 91 is recorded• Piece download assignment is adaptive

55

Three issues delete

56

• Information flow control: Although piece 91 is downloaded by the Protection Engine, it should not be labeled as downloaded in the Experiment Engine.• Duplicate avoidance: Since both Experiment Engine and Protection Engine are running, if their download windows overlap, they may download the same piece.• Experiment feasibility: This lag from realtime is determined when client i joins the test channel with the Protection Engine to make experiment and protection feasible.