BSI P2P Streaming

BSIP2P Streaming

Davide FreyWIDE TeamINRIA

1

Some slides by: Y. Chen, R. Karki, A-M Kermarrec, M. Monod, M. Zhang,

Large-scale broadcast/multicast

Application-level multicast (ALM)

1. Structured peer to peer networks¡ Flooding¡ Tree-based

2. Content streaming (today)¡ Multiple Trees¡ Mesh¡ Gossip

Setting

Regular TV: everything HD

A source produces multimedia contentn viewers (n large)

broadcasting

…

……

IP TV, Web TV, P2P TV, …

vs192K requests/day

78K users/day244K simultaneous users (incl.

VoD)BBC iStats (April 2010)

Streaming Basics

Stream rate s [kbps]

n viewers want to receive s

Demand = Supply

t0 t1 t2 t3

Content split into chunks disseminatio

n

time-critical largeordered

multimedia content…

n viewers (n large)

Intuitive solution

Participants are pure consumer

... scalability …IP Multicast

• “Centralized” solution

Let’s be smarter

“Decentralized” solution

overlay

Participants collaborate…most of them!

Evaluation MetricsStream lag

• Time difference between creation at the source and delivery to the

clients’ player

• Also: delay penalty (delay wrt IP multicast)

Hop count

Stream quality• Maximum 1% jitter means at least 99% of the groups are complete

= 99%-playbackIncomplete groups does not mean “blank”

• Also: delivery-ratio or continuity index

vs

t

Tree-based ALM

Streaming Approaches

s1

s1

s2/2s2/2

s2

s3

s1 is constrained by design

DisconnectionBuild/maintain

tree

Upload of nodes:multiple of s2/z

Partial disconnectionBuild/maintain

z trees

s3 optimal

Connected is not enough

Peer selection,Packet scheduling

Single tree Multiple trees Mesh/Gossip

Addressing the Limitations of Trees

00 MOIS 2011EMETTEUR - NOM DE LA PRESENTATION - 10

Some peers do not forward

Multiple Trees Mesh/Gossip

SplitStream approach

Content divided in stripesEach stripe is distributed on an independent tree

[SOSP 2003 « SplitStream: High-Bandwidth Multicast in Cooperative Environment »]

s2/2s2/2

s2• Load balancing– Internal nodes in one tree are leaves in

others• Reliability

– Failure of a node leads to unavailabilityof x stripes if parents are independentand using appropriate coding protocols

SplitStream approach

Content divided in stripesEach stripe is distributed on an independent tree

[SOSP 2003 « SplitStream: High-Bandwidth Multicast in Cooperative Environment »]

s2/2s2/2

s2• Load balancing– Internal nodes in one tree are leaves in

others• Reliability

– Failure of a node leads to unavailabilityof x stripes if parents are independentand using appropriate coding protocols

Tree-based ALM: unbalanced

B

C

E

F

D

A

G

IN: n kb/secOUT: 2n kb/sec

The SplitStream Forest

B

C

EF

DA

G D

E G

B A F C

A

B C

F G D E

N kb/sec

N/2 kb/sec N/2 kb/sec

SplitStream

Construct one tree/group per data stripe

Each stripe identifier starts with a different

digit (up to 16 independent stripes)

0x 1x fx

…..

SplitStream

Main goal: build and maintain multiple multicast trees in a fully decentralized and reliable way so that

• Each client receives the desired number of stripes• Independent trees • Control upon bandwidth allocation • Reasonable latency and network load

Leverage Scribe/Pastry• Pastry: P2P routing infrastructure (structured,

efficient, reliable) • Scribe: decentralized and efficient tree-based protocol

SplitStream: forest management

Constraints• Limited out-degree potentially increases the tree depth • Load balancing to ensure within trees and between trees• Failure independence of trees Scribe Solution:• Overloaded nodes push descendants downIneffective because trees are correlated:

Leaf in one tree is interior in another

Splitstream Solution: spare capacity tree • Always accept new children• Underloaded nodes join spare capacity tree • Overloaded nodes give up descendants• Discard children with shortest prefix match• Orphans anycast to the spare capacity tree to discover new

parents

Spare capacity tree

A

D

F G

C

E

Anycast: SCRIBEdelivers to physicallyclose node

{0,3,A}Cap: 2 {1,..,F}

Cap: 4

Adopting• Loop checking• Descendant switching

A SCRIBE group -> Tree

Perform DFS looking for correct stripe(s)

Experiments

19

Simulations (average on 10 runs) • Topologies GT, Mercator, MS Corp.• 40000 nodes

Pastry (b=4, leafset = 16)SplitStream : 16 stripesConfigurations in-degree x out-degree

• Impact of spare capacity 16x16, 16x18, 16x32 and 16xNB• Impact of capacity/needs (Gnutella)

Failure resilience • Path diversity • Catastrophic failures (25% of faulty nodes) in a 10,000 node system

Results • Forest construction• Multicast performance

Forest construction: load on each node

00,10,20,30,40,50,60,70,80,9

1

1 10 100 1000 10000

Node Stress

Cum

ulat

ive

prop

ortio

n of

nod

es

16 x NB16 x 3216 x 1816 x 16

Forest construction: load on each node

Configuration 16x16 16x18 16x32 16xNB

Max 2971 1089 663 472

Mean 57.2 52.6 35.3 16.9

Med 49.9 47.4 30.9 12

Load decreases as the spare capacity increases 16xNB: no pushdown nor orphans•16x16: each node contacts the spare capacity tree for 8 stripes on average • Nodes with id close to the spare capacity tree get the highest load

Forest construction: network load

Configuration 16x16 16x18 16x32 16xNB

Max 5893 4285 2876 1804

Mean 74.1 65.2 43.6 21.2

Med 52.6 48.8 30.8 17

Load decreases as the spare capacity increases Maximum approx. 7 times less than centralized system

Measured as the number of msg on physical links

Multicast: link stress

Configuration Centralized(0.43)

Scribe(0.47)

IP(0.43)

16x16(0.98)

16x18 16x32 16xNB

Max 639984 3990 16 1411 1124 886 1616

Mean 128.9 39.6 16 20 19 19 20

Med 16 16 16 16 16 16 16

One message/stripe, no failure

•16xNB : absence of forwarding bounds causes contention on a small•Set of links•Splitstream uses a larger fraction of links but load them less

Delay penalty during multicast

0

2

4

6

8

10

12

14

16

0 0,5 1 1,5 2 2,5

Delay penalty

Cum

ulat

ive

strip

es

RAD (16 x NB)

RAD (16 x 32)

RAD (16 x 18)

RAD (16 x 16)

Path diversity

Configuration 16x16 16x32 16xNB

Max 6.8 6.6 1

Mean 2.1 1.7 1

Med 2 2 1

•Number of lost stripes (at most) on each node when the most significant ancestor is faulty (worst case scenario)

Catastrophic failure (25% of 10,000 nodes are faulty): number of received stripes

• 14 stripes after 30 s • Total repair after less than 3mn

Catastrophic failure (25% of 10,000 nodesare faulty): number of messages

Addressing the Limitations of Trees

00 MOIS 2011EMETTEUR - NOM DE LA PRESENTATION - 28

Some peers do not forward

Multiple Trees Gossip

Mesh vs Gossip

t

. . . . . .

. . . . . .

= = =

= = =

Gossip, f = 2

View:

View:(≥fanout)

Beyond mesh: Gossip

2

2

4

2

32

Can you see any problem?

Gossip-based dissemination

Beyond mesh: Gossip

Gossip-based dissemination

2

2

4

2

32

Great for small updates (e.g., databases)Duplicates are a problem for large content…

Three-Phase Gossip

Testing Gossip for Live StreamingGrid’5000 PlanetLab

Nodes 200 (40*5) 230-300

BW cap Token bucket

(200KB)

Throttling

Transport layer UDP + losses (1-

5%)

UDP

Stream rate s 680 kbps 551 kbps

FEC 5% 10%

Stream (incl. FEC) 714 kbps 600 kbps

Tg (gossip period) 200 ms 200-500 ms

fanout (f) 8 7-8

source’s fanout 5 7

Retransmission ARQ/Claim ARQ

Membership RPS (Cyclon) and full membership

En

vir

on

me

nt

Gossip

Gossip – Theory1. fanout = ln(n) + c

P[connected graph] goes to exp(-exp(-c))

2. Holds as long as the fanout is ln(n) + c on average

0

0,2

0,4

0,6

0,8

1

ln(n)-10 ln(n)-5 ln(n) ln(n)+5 ln(n)+10

c=1 → 69%

c=2 → 87% c=3 → 95%

c=-1 → 7%

c=0 → 37%

Paul Erdős & Alfréd Rényi

Fanout

P[co

nnec

ted

grap

h]

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80

Perc

enta

ge o

f nod

es v

iew

ing

the

stre

amw

ith le

ss th

an 1

% ji

tter

Fanout

offlineviewing20s lag

Gossip Practice

Increasing fanout

Theory• More robust• Faster dissemination

Practice• Heavily requested nodes exceed their bandwidth

PlanetLab (230)700 kbps caps = 600 kbps

Stretching Gossip

36

Fanout

ProactivenessHow often should a node change its fanout partners?

The larger the better?

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000

Perc

enta

ge o

f nod

es v

iew

ing

the

stre

amw

ith le

ss th

an 1

% ji

tter

Change partner every X gossip period(s)

offlineviewing20s lag

Optimal proactiveness

37

∞

PlanetLab (230)700 kbps caps = 600 kbps

f = 7

Different dissemination tree for each chunk:• Ultimate way of

splitting the stream

Gossip is load-balancing…

38

Proposals arrive randomly• Nodes pull from first proposal

Highly-dynamic

S

p1

q

p2

p3

S qS

q

Node q will serve f nodes whp Node q will serve f nodes wlp

. . .

… but the world is heterogeneous!

3 classes (691kbps avg):

Load-balancing

Capability

512kbps85%

3Mbps5%

1Mbps10%

0102030405060708090

100

0 5 10 15 20 25 30 35 40 45 50 55 60

Perc

enta

ge o

f nod

es (C

DF)

Stream lag (s)

Percentage of nodes receiving at least 99% of the

stream

Standard gossip – 691kbps

No cap

Standard gossip – flat 691 kbps

vs

How to cope with heterogeneity?

Goal: contribute according to capability

Propose more = serve more• Increase fanout…

… and decrease it too!

Such that• average fanout (favg) ≥ initial fanout = ln(n) + c

Heterogeneous Gossip - HEAP

q and r with bandwidths bq > br

• q should upload bq / br times as much as r

Who should increase/decrease its contribution?… and by how much?

How to ensure reliability?• How to keep favg constant?

Capability

Contribute according to capability

HEAP

Total/average contribution is equal in both homogeneous and

heterogeneous settings

fq = finit ∙ bq /bavg

…ensuring the average fanout is constant and equal to finit =

ln(n) + c

bavg

Capability

HEAP

Get bavg with (gossip) aggregation• Advertize own and freshest received capabilities• Aggregation follows change in the capabilities

Get n with (gossip) size estimation• Estimation follows change in the system

Join/leaveCrashes…

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55 60

Perc

enta

ge o

f nod

es (C

DF)

Stream lag (s)

Percentage of nodes receiving at least 99% of the stream

Stream lag reduction

Standard gossip – 691kbps

HEAP – 691kbps

No cap

Standard gossip – flat 691kbps

Quality improvement

Stream lag of 10s

0

10

20

30

40

50

60

70

80

90

100

Standard Gossip HEAP

Jitter-free percentage of the stream

512kbps1Mbps3Mbps

Stream lag

For those who can have a jitter-free stream

0

5

10

15

20

25

30

35

40

45


Average stream lag to obtain a jitter-free stream

512kbps

1Mbps

3Mbps

Stre

am la

g (s

)

0

512

1024

1536

2048

2560

3072


Average bandwidth usage by bandwidth class

512kbps 1Mbps

3Mbps

Proportional contribution

99.8

9%

91.5

6% 48.4

4%

94.3

8%

90.5

8%

87.5

6%

20% nodes crashing

0

20

40

60

80

100

0 30 60 90 120 150

Perc

enta

ge o

f nod

es re

ceiv

ing

each

gr

oup

Failure of 20% of the nodes at time t=60s

HEAP - 12s lag

Standard Gossip - 20s lag

Standard Gossip - 30s lag

Stream time (s)

About Bandwidth Limitation• Token Bucket• Leaky Bucket

By Graham.Fountain at English Wikipedia, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=35271394

By Graham.Fountain [CC BY 3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Unbounded Leaky Bucket

50

Bounded Leaky Bucket

51

Token Bucket

Summary

Multiple Trees• Effective but hard to split bw perfectly

Mesh• Easier to build but efficiency – delay tradeoff• Packet scheduling can improve performance

Gossip• Improves over mesh by making it dynamic

Pull-Push (we have not seen this in the course, but you can read the following slides)

• Use mesh to identify trees

iGridMedia

Pull-based protocols are effective• Select neighbors from unstructured overlay• Periodically notify neighbors of available packets• Neighboring nodes request packets

Nearly optimal • bandwidth utilization• Throughput

Without intelligent scheduling and bw measurement

TradeoffPull-based streaming

• The near-optimality is achieved at the cost of tradeoff between control overhead and delay.

9Delay

Cont

rol o

verh

ead

Depends on how frequently the notifications are sent.

Pull-based method: Protocol

13

All the nodes self-organize into a random graph.

root

1 2 3

54

Overlay Construction

Contact rendezvous point

Randomly find set of partners • RPS can be used

Build (static) random graph

Push/Pull MethodPull-based method: Protocol

• Each node periodically sends buffer map packets to notify all its neighbors about the packets it has in its buffer.

15

root

1 2 3

54

1 2 4 1 2 3

2 31 2

I have 1,2 I have 2,3

I have 1,2, 4 I have 1,2, 3

• Pull Part

Pull-Push method

Split stream as in SplitstreamPull-push hybrid method: Protocol• Overlay construction is done as before.

36

1. Partition stream evenly into n sub streams.

Pull-Push method

Peers periodically ask for buffer maps

Pull according to buffer maps

Once a node received a packet in group 0 of one packet

party• Send subscription for corresponding substream

Sender will push all packets in the same substream

Pull-Push method

Stop requesting maps when 95% delivery rate with pushed

packets

If delivery rate drops, request againPull-push hybrid method: Protocol

41

95%

Del

iver

y Ra

tio

0%Pushed packets

Stop requesting for buffer mapsStart requesting for buffer maps

Note: figure is only approximate.

•When over 95% packets are pushed, the node will stop requesting for buffer maps.•When delivery ratio drops below 95%, start requesting again.•Pushed but lost packets are “pulled” after a timeout.

Performance

Considerably

smaller delays

Pull-push hybrid method : Evaluation by simulation – Results

44

Playback delays are considerably smaller in push-pull method.

Overhead

Much smaller than

for pull-only

Pull-push hybrid method : Evaluation by simulation – Results

45

The overhead of push-pull hybrid method is much smaller than that of pull-based method.

PlanetLab

Push-pull hybrid method: Evaluation on PlanetLab• Configuration is the same as before.

46

Summary

Multiple Trees• Effective but hard to split bw perfectly

Mesh• Easier to build but efficiency – delay tradeoff• Packet scheduling can improve performance

Gossip• Improves over mesh by making it dynamic

Pull-Push• Use mesh to identify trees

References

M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh, "SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October, 2003.

M. Castro, M. B. Jones, A-M. Kermarrec, A. Rowstron, M. Theimer, H. Wang and A. Wolman, "An Evaluation of Scalable Application-level Multicast Built Using Peer-to-peer overlays", Infocom 2003, San Francisco, CA, April, 2003.

Zhang, X.Z.X. et al., 2005. CoolStreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming. Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 3(c), p.2102-2111. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1498486.

Picconi, F. & Massoulie, L., 2008. Is There a Future for Mesh-Based live Video Streaming? 2008 Eighth International Conference on PeertoPeer Computing, p.289-298. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4627291.

Zhang, M.Z.M. et al., 2008. iGridMedia: Providing Delay-Guaranteed Peer-to-Peer Live Streaming Service on Internet. IEEE GLOBECOM 2008 2008 IEEE Global Telecommunications Conference, p.1–5. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4698112.

Meng Zhang; Qian Zhang; Lifeng Sun; Shiqiang Yang; , "Understanding the Power of Pull-Based Streaming Protocol: Can We Do Better?," Selected Areas in Communications, IEEE Journal on , vol.25, no.9, pp.1678-1694, December 2007

Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod. Boosting Gossip for Live Streaming. P2P 2010, Aug 2010, Delft, Netherlands.

Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod; Koldehofe Boris; Mogensen Martin; Vivien Quéma. Heterogeneous Gossip. Middleware 2009, Dec 2009, Urbana-Champaign, IL, United States.

Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod; Vivien Quéma. Stretching Gossip with Live Streaming. DSN 2009, Jun 2009, Estoril, Portugal.

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1498486

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4627291

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4698112

http://hal.inria.fr/inria-00517384/PDF/PID1038858.pdf

http://hal.inria.fr/inria-00436125/PDF/heap7452.pdf

http://hal.inria.fr/inria-00436130/PDF/proactive.pdf

Documents

BSI P2P Streaming