Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BSIP2P Streaming
Davide FreyWIDE TeamINRIA
1
Some slides by: Y. Chen, R. Karki, A-M Kermarrec, M. Monod, M. Zhang,
Large-scale broadcast/multicast
Application-level multicast (ALM)
1. Structured peer to peer networks¡ Flooding¡ Tree-based
2. Content streaming (today)¡ Multiple Trees¡ Mesh¡ Gossip
Setting
Regular TV: everything HD
A source produces multimedia contentn viewers (n large)
broadcasting
…
……
IP TV, Web TV, P2P TV, …
vs192K requests/day
78K users/day244K simultaneous users (incl.
VoD)BBC iStats (April 2010)
Streaming Basics
Stream rate s [kbps]
n viewers want to receive s
Demand = Supply
t0 t1 t2 t3
Content split into chunks disseminatio
n
time-critical largeordered
multimedia content…
n viewers (n large)
Intuitive solution
Participants are pure consumer
... scalability …IP Multicast
• “Centralized” solution
Let’s be smarter
“Decentralized” solution
overlay
Participants collaborate…most of them!
Evaluation MetricsStream lag
• Time difference between creation at the source and delivery to the
clients’ player
• Also: delay penalty (delay wrt IP multicast)
Hop count
Stream quality• Maximum 1% jitter means at least 99% of the groups are complete
= 99%-playbackIncomplete groups does not mean “blank”
• Also: delivery-ratio or continuity index
vs
t
Tree-based ALM
Streaming Approaches
s1
s1
s2/2s2/2
s2
s3
s1 is constrained by design
DisconnectionBuild/maintain
tree
Upload of nodes:multiple of s2/z
Partial disconnectionBuild/maintain
z trees
s3 optimal
Connected is not enough
Peer selection,Packet scheduling
Single tree Multiple trees Mesh/Gossip
Addressing the Limitations of Trees
00 MOIS 2011EMETTEUR - NOM DE LA PRESENTATION - 10
Some peers do not forward
Multiple Trees Mesh/Gossip
SplitStream approach
Content divided in stripesEach stripe is distributed on an independent tree
[SOSP 2003 « SplitStream: High-Bandwidth Multicast in Cooperative Environment »]
s2/2s2/2
s2• Load balancing– Internal nodes in one tree are leaves in
others• Reliability
– Failure of a node leads to unavailabilityof x stripes if parents are independentand using appropriate coding protocols
SplitStream approach
Content divided in stripesEach stripe is distributed on an independent tree
[SOSP 2003 « SplitStream: High-Bandwidth Multicast in Cooperative Environment »]
s2/2s2/2
s2• Load balancing– Internal nodes in one tree are leaves in
others• Reliability
– Failure of a node leads to unavailabilityof x stripes if parents are independentand using appropriate coding protocols
Tree-based ALM: unbalanced
B
C
E
F
D
A
G
IN: n kb/secOUT: 2n kb/sec
The SplitStream Forest
B
C
EF
DA
G D
E G
B A F C
A
B C
F G D E
N kb/sec
N/2 kb/sec N/2 kb/sec
SplitStream
Construct one tree/group per data stripe
Each stripe identifier starts with a different
digit (up to 16 independent stripes)
0x 1x fx
…..
SplitStream
Main goal: build and maintain multiple multicast trees in a fully decentralized and reliable way so that
• Each client receives the desired number of stripes• Independent trees • Control upon bandwidth allocation • Reasonable latency and network load
Leverage Scribe/Pastry• Pastry: P2P routing infrastructure (structured,
efficient, reliable) • Scribe: decentralized and efficient tree-based protocol
SplitStream: forest management
Constraints• Limited out-degree potentially increases the tree depth • Load balancing to ensure within trees and between trees• Failure independence of trees Scribe Solution:• Overloaded nodes push descendants downIneffective because trees are correlated:
Leaf in one tree is interior in another
Splitstream Solution: spare capacity tree • Always accept new children• Underloaded nodes join spare capacity tree • Overloaded nodes give up descendants• Discard children with shortest prefix match• Orphans anycast to the spare capacity tree to discover new
parents
Spare capacity tree
A
D
F G
C
E
Anycast: SCRIBEdelivers to physicallyclose node
{0,3,A}Cap: 2 {1,..,F}
Cap: 4
Adopting• Loop checking• Descendant switching
A SCRIBE group -> Tree
Perform DFS looking for correct stripe(s)
Experiments
19
Simulations (average on 10 runs) • Topologies GT, Mercator, MS Corp.• 40000 nodes
Pastry (b=4, leafset = 16)SplitStream : 16 stripesConfigurations in-degree x out-degree
• Impact of spare capacity 16x16, 16x18, 16x32 and 16xNB• Impact of capacity/needs (Gnutella)
Failure resilience • Path diversity • Catastrophic failures (25% of faulty nodes) in a 10,000 node system
Results • Forest construction• Multicast performance
Forest construction: load on each node
00,10,20,30,40,50,60,70,80,9
1
1 10 100 1000 10000
Node Stress
Cum
ulat
ive
prop
ortio
n of
nod
es
16 x NB16 x 3216 x 1816 x 16
Forest construction: load on each node
Configuration 16x16 16x18 16x32 16xNB
Max 2971 1089 663 472
Mean 57.2 52.6 35.3 16.9
Med 49.9 47.4 30.9 12
Load decreases as the spare capacity increases 16xNB: no pushdown nor orphans•16x16: each node contacts the spare capacity tree for 8 stripes on average • Nodes with id close to the spare capacity tree get the highest load
Forest construction: network load
Configuration 16x16 16x18 16x32 16xNB
Max 5893 4285 2876 1804
Mean 74.1 65.2 43.6 21.2
Med 52.6 48.8 30.8 17
Load decreases as the spare capacity increases Maximum approx. 7 times less than centralized system
Measured as the number of msg on physical links
Multicast: link stress
Configuration Centralized(0.43)
Scribe(0.47)
IP(0.43)
16x16(0.98)
16x18 16x32 16xNB
Max 639984 3990 16 1411 1124 886 1616
Mean 128.9 39.6 16 20 19 19 20
Med 16 16 16 16 16 16 16
One message/stripe, no failure
•16xNB : absence of forwarding bounds causes contention on a small•Set of links•Splitstream uses a larger fraction of links but load them less
Delay penalty during multicast
0
2
4
6
8
10
12
14
16
0 0,5 1 1,5 2 2,5
Delay penalty
Cum
ulat
ive
strip
es
RAD (16 x NB)
RAD (16 x 32)
RAD (16 x 18)
RAD (16 x 16)
Path diversity
Configuration 16x16 16x32 16xNB
Max 6.8 6.6 1
Mean 2.1 1.7 1
Med 2 2 1
•Number of lost stripes (at most) on each node when the most significant ancestor is faulty (worst case scenario)
Catastrophic failure (25% of 10,000 nodes are faulty): number of received stripes
• 14 stripes after 30 s • Total repair after less than 3mn
Catastrophic failure (25% of 10,000 nodesare faulty): number of messages
Addressing the Limitations of Trees
00 MOIS 2011EMETTEUR - NOM DE LA PRESENTATION - 28
Some peers do not forward
Multiple Trees Gossip
Mesh vs Gossip
t
. . . . . .
. . . . . .
= = =
= = =
Gossip, f = 2
View:
View:(≥fanout)
Beyond mesh: Gossip
2
2
4
2
32
Can you see any problem?
Gossip-based dissemination
Beyond mesh: Gossip
Gossip-based dissemination
2
2
4
2
32
Great for small updates (e.g., databases)Duplicates are a problem for large content…
Three-Phase Gossip
Testing Gossip for Live StreamingGrid’5000 PlanetLab
Nodes 200 (40*5) 230-300
BW cap Token bucket
(200KB)
Throttling
Transport layer UDP + losses (1-
5%)
UDP
Stream rate s 680 kbps 551 kbps
FEC 5% 10%
Stream (incl. FEC) 714 kbps 600 kbps
Tg (gossip period) 200 ms 200-500 ms
fanout (f) 8 7-8
source’s fanout 5 7
Retransmission ARQ/Claim ARQ
Membership RPS (Cyclon) and full membership
En
vir
on
me
nt
Gossip
Gossip – Theory1. fanout = ln(n) + c
P[connected graph] goes to exp(-exp(-c))
2. Holds as long as the fanout is ln(n) + c on average
0
0,2
0,4
0,6
0,8
1
ln(n)-10 ln(n)-5 ln(n) ln(n)+5 ln(n)+10
c=1 → 69%
c=2 → 87% c=3 → 95%
c=-1 → 7%
c=0 → 37%
Paul Erdős & Alfréd Rényi
Fanout
P[co
nnec
ted
grap
h]
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80
Perc
enta
ge o
f nod
es v
iew
ing
the
stre
amw
ith le
ss th
an 1
% ji
tter
Fanout
offlineviewing20s lag
Gossip Practice
Increasing fanout
Theory• More robust• Faster dissemination
Practice• Heavily requested nodes exceed their bandwidth
PlanetLab (230)700 kbps caps = 600 kbps
Stretching Gossip
36
Fanout
ProactivenessHow often should a node change its fanout partners?
The larger the better?
0
10
20
30
40
50
60
70
80
90
100
1 10 100 1000
Perc
enta
ge o
f nod
es v
iew
ing
the
stre
amw
ith le
ss th
an 1
% ji
tter
Change partner every X gossip period(s)
offlineviewing20s lag
Optimal proactiveness
37
∞
PlanetLab (230)700 kbps caps = 600 kbps
f = 7
Different dissemination tree for each chunk:• Ultimate way of
splitting the stream
Gossip is load-balancing…
38
Proposals arrive randomly• Nodes pull from first proposal
Highly-dynamic
S
p1
q
p2
p3
S qS
q
Node q will serve f nodes whp Node q will serve f nodes wlp
. . .
… but the world is heterogeneous!
3 classes (691kbps avg):
Load-balancing
Capability
512kbps85%
3Mbps5%
1Mbps10%
0102030405060708090
100
0 5 10 15 20 25 30 35 40 45 50 55 60
Perc
enta
ge o
f nod
es (C
DF)
Stream lag (s)
Percentage of nodes receiving at least 99% of the
stream
Standard gossip – 691kbps
No cap
Standard gossip – flat 691 kbps
vs
How to cope with heterogeneity?
Goal: contribute according to capability
Propose more = serve more• Increase fanout…
… and decrease it too!
Such that• average fanout (favg) ≥ initial fanout = ln(n) + c
Heterogeneous Gossip - HEAP
q and r with bandwidths bq > br
• q should upload bq / br times as much as r
Who should increase/decrease its contribution?… and by how much?
How to ensure reliability?• How to keep favg constant?
Capability
Contribute according to capability
HEAP
Total/average contribution is equal in both homogeneous and
heterogeneous settings
fq = finit ∙ bq /bavg
…ensuring the average fanout is constant and equal to finit =
ln(n) + c
bavg
Capability
HEAP
Get bavg with (gossip) aggregation• Advertize own and freshest received capabilities• Aggregation follows change in the capabilities
Get n with (gossip) size estimation• Estimation follows change in the system
Join/leaveCrashes…
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40 45 50 55 60
Perc
enta
ge o
f nod
es (C
DF)
Stream lag (s)
Percentage of nodes receiving at least 99% of the stream
Stream lag reduction
Standard gossip – 691kbps
HEAP – 691kbps
No cap
Standard gossip – flat 691kbps
Quality improvement
Stream lag of 10s
0
10
20
30
40
50
60
70
80
90
100
Standard Gossip HEAP
Jitter-free percentage of the stream
512kbps1Mbps3Mbps
Stream lag
For those who can have a jitter-free stream
0
5
10
15
20
25
30
35
40
45
Standard Gossip HEAP
Average stream lag to obtain a jitter-free stream
512kbps
1Mbps
3Mbps
Stre
am la
g (s
)
0
512
1024
1536
2048
2560
3072
Standard Gossip HEAP
Average bandwidth usage by bandwidth class
512kbps 1Mbps
3Mbps
Proportional contribution
99.8
9%
91.5
6% 48.4
4%
94.3
8%
90.5
8%
87.5
6%
20% nodes crashing
0
20
40
60
80
100
0 30 60 90 120 150
Perc
enta
ge o
f nod
es re
ceiv
ing
each
gr
oup
Failure of 20% of the nodes at time t=60s
HEAP - 12s lag
Standard Gossip - 20s lag
Standard Gossip - 30s lag
Stream time (s)
About Bandwidth Limitation• Token Bucket• Leaky Bucket
By Graham.Fountain at English Wikipedia, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=35271394
By Graham.Fountain [CC BY 3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons
Unbounded Leaky Bucket
50
Bounded Leaky Bucket
51
Token Bucket
Summary
Multiple Trees• Effective but hard to split bw perfectly
Mesh• Easier to build but efficiency – delay tradeoff• Packet scheduling can improve performance
Gossip• Improves over mesh by making it dynamic
Pull-Push (we have not seen this in the course, but you can read the following slides)
• Use mesh to identify trees
iGridMedia
Pull-based protocols are effective• Select neighbors from unstructured overlay• Periodically notify neighbors of available packets• Neighboring nodes request packets
Nearly optimal • bandwidth utilization• Throughput
Without intelligent scheduling and bw measurement
TradeoffPull-based streaming
• The near-optimality is achieved at the cost of tradeoff between control overhead and delay.
9Delay
Cont
rol o
verh
ead
Depends on how frequently the notifications are sent.
Pull-based method: Protocol
13
All the nodes self-organize into a random graph.
root
1 2 3
54
Overlay Construction
Contact rendezvous point
Randomly find set of partners • RPS can be used
Build (static) random graph
Push/Pull MethodPull-based method: Protocol
• Each node periodically sends buffer map packets to notify all its neighbors about the packets it has in its buffer.
15
root
1 2 3
54
1 2 4 1 2 3
2 31 2
I have 1,2 I have 2,3
I have 1,2, 4 I have 1,2, 3
• Pull Part
Pull-Push method
Split stream as in SplitstreamPull-push hybrid method: Protocol• Overlay construction is done as before.
36
1. Partition stream evenly into n sub streams.
Pull-Push method
Peers periodically ask for buffer maps
Pull according to buffer maps
Once a node received a packet in group 0 of one packet
party• Send subscription for corresponding substream
Sender will push all packets in the same substream
Pull-Push method
Stop requesting maps when 95% delivery rate with pushed
packets
If delivery rate drops, request againPull-push hybrid method: Protocol
41
95%
Del
iver
y Ra
tio
0%Pushed packets
Stop requesting for buffer mapsStart requesting for buffer maps
Note: figure is only approximate.
•When over 95% packets are pushed, the node will stop requesting for buffer maps.•When delivery ratio drops below 95%, start requesting again.•Pushed but lost packets are “pulled” after a timeout.
Performance
Considerably
smaller delays
Pull-push hybrid method : Evaluation by simulation – Results
44
Playback delays are considerably smaller in push-pull method.
Overhead
Much smaller than
for pull-only
Pull-push hybrid method : Evaluation by simulation – Results
45
The overhead of push-pull hybrid method is much smaller than that of pull-based method.
PlanetLab
Push-pull hybrid method: Evaluation on PlanetLab• Configuration is the same as before.
46
Summary
Multiple Trees• Effective but hard to split bw perfectly
Mesh• Easier to build but efficiency – delay tradeoff• Packet scheduling can improve performance
Gossip• Improves over mesh by making it dynamic
Pull-Push• Use mesh to identify trees
References
M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh, "SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October, 2003.
M. Castro, M. B. Jones, A-M. Kermarrec, A. Rowstron, M. Theimer, H. Wang and A. Wolman, "An Evaluation of Scalable Application-level Multicast Built Using Peer-to-peer overlays", Infocom 2003, San Francisco, CA, April, 2003.
Zhang, X.Z.X. et al., 2005. CoolStreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming. Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 3(c), p.2102-2111. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1498486.
Picconi, F. & Massoulie, L., 2008. Is There a Future for Mesh-Based live Video Streaming? 2008 Eighth International Conference on PeertoPeer Computing, p.289-298. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4627291.
Zhang, M.Z.M. et al., 2008. iGridMedia: Providing Delay-Guaranteed Peer-to-Peer Live Streaming Service on Internet. IEEE GLOBECOM 2008 2008 IEEE Global Telecommunications Conference, p.1–5. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4698112.
Meng Zhang; Qian Zhang; Lifeng Sun; Shiqiang Yang; , "Understanding the Power of Pull-Based Streaming Protocol: Can We Do Better?," Selected Areas in Communications, IEEE Journal on , vol.25, no.9, pp.1678-1694, December 2007
Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod. Boosting Gossip for Live Streaming. P2P 2010, Aug 2010, Delft, Netherlands.
Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod; Koldehofe Boris; Mogensen Martin; Vivien Quéma. Heterogeneous Gossip. Middleware 2009, Dec 2009, Urbana-Champaign, IL, United States.
Davide Frey; Rachid Guerraoui; Anne-Marie Kermarrec; Maxime Monod; Vivien Quéma. Stretching Gossip with Live Streaming. DSN 2009, Jun 2009, Estoril, Portugal.