View
219
Download
0
Tags:
Embed Size (px)
Citation preview
1
Failure RecoveryFailure Recoveryfor Priority for Priority
Progress MulticastProgress MulticastJung-Rung HanJung-Rung Han
Supervisor: Charles KrasicSupervisor: Charles Krasic
2
Multicast?Multicast? One-to-Many deliveryOne-to-Many delivery
scalablescalable conserve bandwidthconserve bandwidth E.g. Digital TVE.g. Digital TV
IP multicastIP multicast Many issues: security, billing, moneyMany issues: security, billing, money
Application level multicastApplication level multicast Dedicated content distribution networkDedicated content distribution network Peer-to-Peer / End system multicastPeer-to-Peer / End system multicast
3
QStreamQStream
Priority Progress Streaming (PPS)Priority Progress Streaming (PPS) Adaptive to network conditionsAdaptive to network conditions Using TCPUsing TCP Speg: Scalable mpeg, a Speg: Scalable mpeg, a
“progressive” codec“progressive” codec Priority Progress Multicast (PPM)Priority Progress Multicast (PPM)
Makes a tree of PPSMakes a tree of PPS
4
Priority Progress Priority Progress MulticastMulticast
Store-and-ForwardStore-and-Forward Fragments dataFragments data Flow controlFlow control Single tree multicastSingle tree multicast
5
Presentation OutlinePresentation Outline
MotivationMotivation BackgroundBackground Description of ApproachDescription of Approach EvaluationEvaluation Conclusions and Future WorkConclusions and Future Work
6
Motivation:Motivation:Single Tree vs. Graph Single Tree vs. Graph
MulticastMulticastSingle treeSingle tree AdvantagesAdvantages
SimplerSimpler Less overheadLess overhead
DisadvantagesDisadvantages Vulnerable to Vulnerable to
failurefailure Unutilized Unutilized
bandwidthbandwidth
GraphGraph AdvantagesAdvantages
Resilient to failureResilient to failure Higher bandwidth Higher bandwidth
utilizationutilization DisadvantagesDisadvantages
More overheadMore overhead ComplexComplex
Hard to implementHard to implement Hidden issuesHidden issues
7
Presentation OutlinePresentation Outline
MotivationMotivation BackgroundBackground Description of ApproachDescription of Approach EvaluationEvaluation Conclusions and Future WorkConclusions and Future Work
8
Background:Background:Some multicast “streaming” Some multicast “streaming”
systemssystems QStream: Single Tree BasedQStream: Single Tree Based Bullet:Bullet:
Single Tree as backbone, Peering Single Tree as backbone, Peering connections form Meshconnections form Mesh
SplitStream:SplitStream: Multiple TreesMultiple Trees
9
SplitStreamSplitStream
10
Presentation OutlinePresentation Outline
MotivationMotivation BackgroundBackground Description of ApproachDescription of Approach EvaluationEvaluation Conclusions and Future WorkConclusions and Future Work
11
Distributed Tree Distributed Tree ManagementManagement
Tree Join operation,Tree Join operation,also dealing with failurealso dealing with failure
Key issues: Scalability, Delay, Key issues: Scalability, Delay, Security…Security…
Our approach: use Distributed Hash Our approach: use Distributed Hash Table (DHT)Table (DHT)
Bamboo DHT and ReDiR HierarchyBamboo DHT and ReDiR Hierarchy((Recursive Distributed Rendezvous)Recursive Distributed Rendezvous)
Reuse one DHT for all mcast session. Reuse one DHT for all mcast session. Removes hotspot when nodes join at Removes hotspot when nodes join at the same timethe same time
12
Failure RecoveryFailure Recovery Failure: a node in multicast tree Failure: a node in multicast tree
disappearsdisappears Important to Single tree approach Important to Single tree approach Less important to multi-source.Less important to multi-source. Our goal: hide the impact of failure Our goal: hide the impact of failure Ultimately, no pause in video playbackUltimately, no pause in video playback
Our approach: pre-emptively deal with Our approach: pre-emptively deal with failure to select a replacement with failure to select a replacement with highest Eligibility.highest Eligibility.
13
““Eligibility”Eligibility”
A node’s capability to be a good A node’s capability to be a good forwarding node.forwarding node.
Bandwidth, delay, uptime, distance Bandwidth, delay, uptime, distance from the root, etc.from the root, etc.
This is another sub-area of research This is another sub-area of research that deals with predicting and that deals with predicting and evaluating the quality of a evaluating the quality of a connection that is inherently connection that is inherently variable. Vivaldi, iPlanevariable. Vivaldi, iPlane
14
Eligibility propagationEligibility propagation Goal: a node has replacement for its Goal: a node has replacement for its
parent, based on eligibility informationparent, based on eligibility information Only allows leaf node to be a Only allows leaf node to be a
replacement candidate.replacement candidate. A leaf node’s eligibility propagates upA leaf node’s eligibility propagates up All internal node keeps track of the All internal node keeps track of the
highest leaf node reported from highest leaf node reported from downstream and select a replacement downstream and select a replacement for itselffor itself
Report the chosen node to its direct Report the chosen node to its direct childrenchildren
15
Now we know what Now we know what to do when a to do when a failure occursfailure occurs
But we still need something…But we still need something…
16
Failure DetectionFailure Detection
TCP’s failure detection is inadequateTCP’s failure detection is inadequate Application level heartbeatApplication level heartbeat Heartbeat interval is major concernHeartbeat interval is major concern
False positive vs. DelayFalse positive vs. Delay TCP vs. UDP heartbeatTCP vs. UDP heartbeat
17
Presentation OutlinePresentation Outline
MotivationMotivation BackgroundBackground Description of ApproachDescription of Approach EvaluationEvaluation Conclusions and Future WorkConclusions and Future Work
18
EvaluationEvaluation Multi-dimensional test space:Multi-dimensional test space:
Roundtrip TimeRoundtrip Time Heartbeat intervalHeartbeat interval Competing trafficCompeting traffic Wide vs. Narrow treeWide vs. Narrow tree Long vs. Short treeLong vs. Short tree Failure rateFailure rate Adaptation window sizeAdaptation window size Different video quality metricsDifferent video quality metrics
19
EmulabEmulab www.emulab.netwww.emulab.net Network testbedNetwork testbed Hundreds of machinesHundreds of machines Allows users high degree of freedomAllows users high degree of freedom
Network topologyNetwork topology Traffic shaping: BW, delay, loss rateTraffic shaping: BW, delay, loss rate OS modificationsOS modifications
All done through web interface and All done through web interface and SSHSSH
20
Minimum Tree – Emulab Minimum Tree – Emulab TopologyTopology
21
Minimum Tree – Minimum Tree – Multicast TreeMulticast Tree
22
Minimum Tree – BW Minimum Tree – BW graphgraph
23
Medium Size Tree – Emulab TopoMedium Size Tree – Emulab Topologylogy
24
Medium Size Tree – Medium Size Tree – MulticastMulticast
25
Medium Size Tree – BW Medium Size Tree – BW graphgraph
26
Presentation OutlinePresentation Outline
MotivationMotivation BackgroundBackground Description of ApproachDescription of Approach EvaluationEvaluation Conclusions and Future WorkConclusions and Future Work
27
ConclusionsConclusions A single tree approach can deal with A single tree approach can deal with
failures (probably)failures (probably) Video playback is not interruptedVideo playback is not interrupted Impact of failure is second order Impact of failure is second order
concern to TCP dynamicsconcern to TCP dynamics Many other evaluations can be doneMany other evaluations can be done
Different BW and RTTDifferent BW and RTT Bigger treeBigger tree Varying degree of competing trafficVarying degree of competing traffic Higher failure rateHigher failure rate
28
Future WorkFuture Work
Evaluation of Distributed Tree Evaluation of Distributed Tree Management approachManagement approach
Continued evaluation of failure Continued evaluation of failure recovery under different conditionsrecovery under different conditions
Self adjusting tree to optimize Self adjusting tree to optimize bandwidth usagebandwidth usage
Scaling window sizeScaling window size
29
Final CommentFinal Comment Evaluating the system is hardEvaluating the system is hard
Many variablesMany variables Unexpected resultsUnexpected results
Using EmulabUsing Emulab Availability affected by time of day and Availability affected by time of day and
paper submission deadlinepaper submission deadline Nodes do malfunction: do linktest often, Nodes do malfunction: do linktest often,
but takes significantly longer with bigger but takes significantly longer with bigger experiment!experiment!
One run of an experiment takes 25 minutesOne run of an experiment takes 25 minutes Tip: Use a lot of scripts!Tip: Use a lot of scripts!
30
RReeDDiirr