Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
EfficientPointtoMultipointTransfersAcrossDatacenters
MohammadNoormohammadpour1,CauligiS.Raghavendra1,SriramRao2,SrikanthKandula2
1UniversityofSouthernCalifornia,2Microsoft
Source:https://azure.microsoft.com/en-us/overview/datacenters/how-to-choose/ (Jun14,2017)
2
• DedicatedWANnetworksforasingleorganization
• Connectmanydatacenters– Increasedreliability– Loadbalancing– Contentisusuallyservedbydatacentersclosesttousers
• LowerRTTtousers ⇒ Higheraveragethroughput(TCP)• Lesshopstousers ⇒ SavesWANbandwidth
Inter-DatacenterNetworks
Source:S.Jainetal.,“B4:ExperiencewithaGlobally-DeployedSoftwareDefinedWAN”,ACMSIGCOMM2013
Source:C.Hongetal.,“AchievingHighUtilizationwithSoftware-DrivenWAN”,ACMSIGCOMM2013
3
Needdatadeliveryfromonepointtomultiplepoints
Application
CDN,Web
DataRecovery
Search
Recommendation, Ads
Databases
Geo-Distributed DataAnalytics
Reasonfordeliverytomultiple datacenters
Gettingclosertousers
Makingbackupcopies
Synchronizationofstate
Globalloadbalancing
Inputfornext processingstages
4
• Anabstractionmodel– Singlesource
• Contentislocatedonasourcedatacenter– Receiversarefixedoncetransferbegins
• Nojoin/leaves
PointtoMultipoint(P2MP)Transfers
A
B
C
D
XXXX
5
• Usuallyperformedasseparateunicasttransfers– Wastesbandwidthandcanincreasecompletiontimes
• Multicasting– Network-driven(e.g.IPMulticast)
• Locallyandgraduallybuilttreesfarfromoptimal• Noloaddistributionmanagement• Complexsessionmanagementprotocols
– Client-driven(e.g.OverlayNetworks)• Limitedvisibilityintonetworkstatus• Limitedcontroloverrouting
• UsingStore-and-Forward– Storageandbandwidthcostsonintermediatedatacenters– Canleadtoexcessivedelays– Moreengineeringwork(runningagents,chunking,etc.)
P2MPTransfersTodayB
C
D
A1
2
2
2
A
B
C
D
1
2
2
2
A
B
C
D
12
3
3
4
A
B
C
D
1
2
2
2
6
• Sendtraffictoalldestinationsoveraforwardingtree– Savesbandwidth– Acontrollerwithglobalviewofnetworkstatuscanexamineoptions– Selectionaccordingtocurrentnetworkloadconditionsandtransferparameters
• Userate-allocation andrate-limiting– Aslottedtimelinewithfixedratesduringtimeslots– Rate-allocationatcontrolleraccordingtoavailablebandwidth– Rate-limitingatend-points
• Maincontribution– Forwardingtreeselection
• Weight/Costassignmenttoedges
OurSolution:DCCast
A
B
C
D
TEServer
7
• Update()– Isexecutedattheendofeverytimeslot
• Dispatchesrate-allocationstoend-points(i.e.,senders)forrate-limiting
• Allocate(𝑅)– Isexecuteduponarrivalofatransferrequest𝑅
1. Selectsaforwardingtree𝑇 forrequest𝑅2. Performsrate-allocationover𝑇
DCCastProcedures
DC1
DC2
DC3
Update()
Allocate(R)RateAllocationDatabase
Rates
Requests
TEServer
8
• Assumeadirectedinter-datacentergraph𝐺– 𝐿& isthetotaloutstandingamountoftrafficallocatedoveranyedge𝑒
– Uponarrivalofrequest𝑅 withsizeof𝑉),everyedge𝑒 getsaweightof𝑊& = 𝑉) +𝐿&– 𝑅’s forwardingtreeisobtainedbyfindingaminimumweightSteinerTree– Fastheuristicsavailablethatoftenprovideresultsclosetooptimal
SelectionofForwardingTrees
t
Rate(edge𝑒)
…𝑡/ 𝑡0 𝑡1 𝑡2 𝑡3 𝑡4
…
𝐿& = 5𝑏𝑙𝑢𝑒𝑎𝑟𝑒𝑎𝑠�
�
𝑡4=>
𝐶&
A
B
C
D
R(Size=4)
57
3
1 42
9
A
B
C
D
R 911
7
5 86
13
A
B
C
D
57
7
1 86
13
R
9
• Anyforwardingtreehasacostthatissumofedgeweights– Usingthiscostassignmentwestayawayfrom
• Highlyloadededges• Largetrees
• Implicationsofthiscostassignment– Smallertreesforlargerrequests(𝑉) ≫ 𝐿& ⇒ 𝑊& ≈ 𝑉))– Treesareselectedaccordingtoedgeloadsforsmallerrequests(𝑉) ≪ 𝐿& ⇒ 𝑊& ≈ 𝐿&)
AnalysisofDCCastforwardingtreeselection
10
• Complexproblem:Trade-offs– Staticpolicies:FCFS,ALAP(aslateaspossible)
• Morepredictability– Dynamicpolicies:SRPT,FairSharing
• Bettermeantimes(byresolvingpriorityinversion)
• WeusedFCFSpolicy– Simple,noraterecalculations– Guaranteedcompletiontimesgivennofailures– Senderssendatmaximumavailable ratestartingnexttimeslot
• Calculationofavailableratesacrosstimeslotsovertrees– 𝐴& 𝑡 istheavailablerateoveredge𝑒 attime𝑡– Maximumrateoftree𝑇 attime𝑡 is𝑟D 𝑡 = min
&∈D(𝐴& 𝑡 )
Rate-allocation
A
R1R2R3
11
• EvaluatedTechniques– SelectionofForwardingTrees(Random,MINMAX,DCCast)– Rate-allocationpolicy(FCFSandSRPT)– DCCast(P2MP)vs.Point-to-Point(P2P-FCFSandP2P-SRPT)
• PerformanceMetrics– MeanTCT– TailTCT– Totalbandwidthusage
• TrafficPatterns– Artificiallygenerated
• Poissonarrivals• Exponentialtransfersizedistribution
Evaluation
12
• Weconsideredthreeapproaches– Randomlyselectingaforwardingtree(Random)– Pickingthetreewithminimalmaximum𝐿& overanyedge(MINMAX)
• Greedyapproach• Methodusedinmanyresearchwork(minimizingmaximumutilization)
– Pickingthetreewithminimalsumof𝑊& (DCCast)
• Results– Overallbandwidthusage(notshown)
• Sameforallschemes– MeanandTailTCT
• DCCast<MINMAX< Random
Evaluation:SelectionofForwardingTrees
13
• DCCastlimitsloadbalancingforimprovedBWsavings– MINMAXdoesnotaccountfornumberofedges– MINMAXdoesnotaccountforrequestvolume
• DCCastcostassignmentmakesiteasiertofindtrees– Edgedecomposablecosts
BenefitsofDCCastcostassignmentoverMINMAX
𝐿& = 18
10 1
1 1
SmallRequestwithvolumeof1 LargeRequestwithvolumeof10
MINMAX
DCCast
18
11 2
2 2
18
11 2
2 2
18
20 11
11 11
28
10 1
1 114
• WeproposeduseofFCFSforDCCast– Simpleschedulingandresourcesguaranteedonescheduled– ButhowmuchwillitloseonMeanTCT?
• SRPTisthebestpolicyformeantimes– Challengingtoimplement:Treeevictionandraterecalculation
asnewrequestsarrive– Starvationofverylargetransfers
• Results– FCFSperformsslightlybetterinTailtimes– FCFSincreasesmeantimesby50%
Evaluation:SchedulingPolicy
15
• PropertiesofP2P-SRPTscheme– BasedonK-Shortestpaths(foreverytransfer)– UsesSRPT policytoachievebestMeanTCT– RatesarecalculatedusingLinearProgramming
• Results– BothTailtimesandBWUsageimprovedbyupto50%usingDCCast– DCCastbetterinMeantimeswhenmakinglargernumberofcopies
Evaluation:ComparisonwithPoint-to-Point(P2P)
16
• Manyinter-datacentertransfersfollowtheP2MPabstractionmodel– Oneobjectistobedeliveredtomanydestinations– Sourceanddestinationsknownuponarrivaloftransfers– Nojoins/leaves
• PerformeveryP2MPtransferjointlyusingaforwardingtree– Achievebandwidthsavingsandreducetailtimes
• Opportunisticallyanddynamicallyselectforwardingtrees– Allowingallavailablepathstobepotentiallyused– Theoppositewouldbepre-calculatingandusingK-MinimalTrees
Summary
17
Thankyou!
• ImprovingMeanTCT– Multipletreeseachconnectedtoasubsetofreceivers(addressingtheslowreceiver)– Paralleltreestosamesubsetsofreceivers(increasingthroughput)– SRPTwithonlyBWpreemption(treesselecteduponrequestarrivals)– Combiningforwardingtreeswithstore-and-forward– Applyingbatchingtechniquesforburstyarrivalpatterns(e.g.applySJFpolicytobatches)– Applyingthefair-sharingpolicy(ratherthanFCFS)
• Evaluationusingrealtracesofinter-datacentertraffic– Chooseschedulingpolicyaccordingtotrafficpatterns
• Handlingfailures– Proactiveapproaches(leavingsparecapacity,backuptrees)– Reactiveapproaches(reschedulingaffectedtransfers,localactivation)
FutureWork&Discussion
19