On the Optimal Scheduling for Media Streaming in Data-driven Overlay Networks Meng ZHANG with Yongqiang XIONG, Qian ZHANG, Shiqiang YANG Globecom 2006

On the Optimal Scheduling for Media Streaming in Data-driven Overlay Networks

Meng ZHANG

with Yongqiang XIONG, Qian ZHANG, Shiqiang YANG

Globecom 2006

Outline

Background Related Work Problem Statement and Formulation Global Optimal Solution Distributed Algorithm Performance Evaluation Conclusion & Future Work

Background

The Internet has witnessed a rapid growth in deployment of data-driven (swarming based) overlay/peer-to-peer network based IPTV systems during recent years.

These products are based on data-driven protocol

Facts of concurrent online users GridMedia: over 230,000, rate 310kbps (achieved by

one server) (developed by our lab) PPLive: 500,000, rate 300-500kbps QQLive: 1,460,000, rate 300-500kbps (not one server)

Background - Data-Driven Protocol Review Aiming to enable large-scale live

broadcasting in the Internet environment Very simple and very similar to that of Bit-

Torrent Two steps in data-driven protocol

The overlay construction The block scheduling

Background - Data-Driven Protocol Review The first step – overlay construction

All the nodes self-organize into a random graph

1 2 3

4 5

root

11 2 4 1 2 3

2 3

4 5 2 31 2

I have block 1,2,4 I have block 1,2,3

I have block 1,2 I have block 2,3

Request block 4 Request block 3

Request block 1 Request block 2

Send block 4 Send block 3

Send block 1 Send block 2

The second step – block scheduling The streaming is divided into blocks Each node has a sliding window containing all the

blocks it is interested in currently

Related Work

To improve data-driven protocol, most recent efforts focus on optimizing overlay construction (i.e. the first step ): Vishnumurthy & Francis (INFOCOM2006):

random graph building under heterogeneous overlay

Liang & Nahrstedt (INFOCOM2006): propose RandPeer, a peer-to-peer QoS-sensitive membership management protocol

Related Work

An problem not well addressed is how to optimize the second step, that is, how to do optimal block scheduling and

maximize the throughput of data-driven protocol under a constructed overlay

Most existent methods are straight forward and ad hoc Chainsaw: pure random way DONet: greedy local rarest-first PALS: round-robin method

Problem Statement and Formulation

2

13 2

23 4

4

4 6 5

343

How to do optimal scheduling to maximize the throughput of the whole overlay?

The real situation is more complicated because different blocks may have different importance and the bottlenecks are not only at the last mile.

Our basic approach: Define priority to different blocks due to their importance Maximize the sum of priorities of all requested blocks

16 2

264

4

3

43 5

32

5

3

2

4

5

2

2

11

65

32

4

1

00

2

21 2 3 4

23 6 4 5Throughput is 4Optimal scheduling,

throughput gain is 25%

Some requestscongestion at node 1

Local Rarest First (LRF) strategy

Problem Statement and Formulation - Priority Definition We use two factors to represent the

significance of a block: rarity factor emergency factor

We define the priority of block j∈Ai for node i∈R as follow: Pj

i = βPR(Σk Nbr(∈ i)hkj)+(1-β)PE(Ci+WT-dji),

Where 0≤β≤1, functions PR(*) (rarity factor) and PE(*) (emergency factor) are both monotonously non-increasing ones

Problem Statement and Formulation - Formulation Decision variable

Global block scheduling problem:

s.t.

1, node should request

packet from nbr

0, otherwise

ikj

i

x j k

1max

i i

i ij kj kj

i j D k NBR

P h xN

a) , b) ,

c) , d) 1,

e) 0,1 .

i ikj i kj k

j A i k Nbr i i j A i

i ikj ki kj

j A i k Nbr i

ikj

x I x O

x E x

x

Notation Definition

N N+1 is the number of overlay nodes, where node 0 is the source node

Ii, the inbound bandwidth of node i

Oi, the outbound bandwidth of node i

Eik, the end-to-end available bandwidth between node i and node k

hkj {0,1}∈ Blocks availability: “akj=1” denotes

node k holds block j; otherwise, “akj=0”

NBRi set of neighbors of node i

τ period of requesting new blocks

WT the exchanging windows size

Ci the current play out time of node i

dji play out time of block j at node i

Di set of all absent blocks in the current exchanging window of node i

Global Optimal Solution Convert the global block scheduling formulation

into an equivalent Min-Cost Flow Problem0

2 3

1

2 3 4 516 7 8 9 3 4 6 9

1 2 5 6 3 6 7 8

capacity, unit costnodeblock

v0s

v1s

v2s

v3s

vS vT

v01n

v21n

v31n

v02n

v32n

v03n

v23n

v12n

v13n

τ E01,0

τ E02,0

τ E23,0

τ E03,0

τ E31,0

τ E32,0

τ E21,0

τ E12,0

τ E13,0

end-to-endconstraints

τ O3,0

τ O1,0

τ O2,0

τ O0,0

outboundconstraints

v11b

v15b

v17b

v18b

v23b

v24b

v27b

v28b

v32b

v34b

v35b

v31b

1,0

1,0

1,01,0

1,0

1,0

1,0

1,0

1,0

1,0

1,0

1,01,0

1,0

1,0

1,0

1,0

1,0

1,0

1,0

1,01,0

1,0

1,0

1,0

block availability

v1r

v2r

v3r

1, -P11/N

1, -P51/N

1, -P71/N

1, -P81/N

1, -P32/N

1, -P42/N

1, -P72/N

1, -P82/N

1, -P13/N

1, -P23/N

1, -P43/N

1, -P53/N

duplicateconstraints

τ I1,0

τ I3,0

τ I2,0

inboundconstraints

∞ ,-M

Global Optimal Algorithm

Proposition: The optimal goal of global block scheduling problem

has the same absolute value as the minimum flow amount of its corresponding min-cost network flow problem. The flow amount on arc (vki

n, vijb) {0, 1} is ∈

just the value of xkji, which is the solution to the optimal

block scheduling.

Algorithm complexity: O(nm(loglogU)log(nC)), where n and m are the number

of vertices and arcs while U and C is the largest magnitude of arc capacity and cost

Distributed Algorithm

We first use a simple way to estimate the bandwidth that is available from each neighbor with historical information.

qki (m): the total number of blocks arrived at node i from neighbor k in the mth period.

Wki(m+1): the estimated bandwidth from node k to

node i

( 1) ( )

1

(3)m

m lki ki

l m M

W q M

Distributed Algorithm With the estimated available

bandwidth, a local block scheduling is performed on each node

It can be also transformed into an equivalent min-cost network flow problem for local optimal request

max (2)i i

j kj kjj A i k Nbr i

P h x

s.t.

1, ,

, 0,1

i ikj kj i

k Nbr i j A i k Nbr i

i ikj ki kj

j A i

x x I

x W x

vTτ I1,0

vS τ W21,0

τ W01,0

τ W31,0

v01n

v21n

v14b

1,0

v16b

v17b

v1r

1, -P41

1, -P61

1,0

v18b

v31n 1,0

1, -P81

1,0

1,0

1,0

1,0

1,0

∞ ,-M

1, -P71

capacity, unit cost

node vertexblock vertex

Distributed Algorithm

Heuristic distributed algorithm: Node i estimates the bandwidth Wki

(m+1) that its neighbor k can allocate it in the (m+1)th period with the traffic received from that neighbor in the previous M periods, as shown in equation (3);

Based on Wki(m+1), node i performs the local block

scheduling (2) using min-cost network flow model. The results xkj

i {0,1} represent whether node ∈ i should request block j from neighbor k;

Send requests to every neighbor.

Performance Evaluation- Compared Scheduling Methods Random Strategy: each node will assign each desired

block randomly to a neighbor which holds that block. Chainsaw uses this simple strategy.

Local Rarest First (LRF) Strategy: A block that has the minimum owners among the neighbors will be requested first. DONet adopts this strategy.

Round Robin (RR) Strategy: All the desired blocks will be assigned to one neighbor in a prescribed order in a round-robin way. If there is multiple available senders, it is assigned to a sender that has the maximum surplus available bandwidth.

Simulation Configuration For a fair comparison, all the experiments use the same

simple algorithm for overlay construction Delivery ratio: to represent the number of blocks that

arrive at each node before playback deadline over the total number of blocks encoded.

DSL nodes: Download bandwidth: 40% 512K, 30% 1M, 30% 2M Upload bandwidth: half of download bandwidth

500 nodes Each node has 15 neighbors Request period: 2 second

Simulation Results

All are DSL nodes with exchanging window of 10 sec and bottlenecks only at the last mile. Group size is 500

250 300 350 400 450 5000.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Streaming Rate (Kbps)

Ave

rage

Del

iver

y R

atio

Global optimal algorithmProposed distributed algorithmDONet methodOne-layer PALS methodChainsaw method

Simulation Results

All are DSL users with exchanging window of 10 sec and end-to-end available bandwidth 10~150Kbps. Group size is 500

250 300 350 400 450 5000.4

0.5

0.6

0.7

0.8

0.9

1

Streaming Rate (Kbps)

Ave

rage

Del

iver

y R

atio

Global optimal algorithmProposed distributed algorithmDONet methodOne-layer PALS methodChainsaw method

Conclusion & Future Work The contributions of this paper are twofold.

First, to the best of our knowledge, we are the first to theoretically address the streaming scheduling problem in data-driven (swarming based) streaming protocol.

Second, we give the optimal scheduling algorithm under different bandwidth constraints, as well as a distributed asynchronous algorithm which can be practically applied in real system and outperforms existent methods by about 10%~80%

Future work How to do optimization over a horizon of several periods, taking

into account the inter-dependence between the periods.

How to do optimal scheduling with scalable video coding (such as layered video coding) or multiple description coding

Thanks

Q&A

Documents

On the Optimal Scheduling for Media Streaming in Data-driven Overlay Networks Meng ZHANG with Yongqiang XIONG, Qian ZHANG, Shiqiang YANG Globecom 2006