Deploying data streaming applicaons in the...

Preview:

Citation preview

Deployingdatastreamingapplica2onsintheFog

ValeriaCardellini

cardellini@ing.uniroma2.it

UniversityofRomeTorVergata

2ndWorkshop“ThroughtheFog”–Pisa,Italy

Datastreamprocessing(DSP)

•  Avarietyoflow-latencyandlocaLon-awareapplicaLonsindiversedomains:–  SituaLon-awareapplicaLons(e.g.,intelligenturbantransport,surveillance,andtrafficcongesLon)

–  Socialdatamining

•  Require–  ConLnuousreal-2meprocessingofunboundeddatastreamsgeneratedbymulLple,distributedsources

–  ToextractvaluableinformaLoninaLmelyandreliablemanner

1V.Cardellini-ThroughtheFog2017

Inanewdistributedenvironment•  Toincreasescalabilityandavailability,reducelatency,networktraffic,andpowerconsumpLon

– Edge/fogcompu2ng(“thecloudclosetotheground”):manymicrodatacenterslocatedatthenetworkedge

Exploitdistributedandnear-edgecomputaLon

…thatposesoldandnewchallenges

•  Networklatenciesaresignificant•  CompuLngandnetworkingresourcesareheterogeneous(e.g.,businessconstraints,capacitylimits,…)

•  CompuLngandnetworkresourcesarenotalwaysavailable

•  Datacannotbeprocessedeverywhere•  …

3V.Cardellini-ThroughtheFog2017

Goalofthetalk

•  GiveaflavorofsomechallengesandtheirpossiblesoluLonsthatarisewhendeployingdatastreamprocessingapplicaLonsinafog/edgeenvironment

4V.Cardellini-ThroughtheFog2017

DSPapplicaLonbasics

•  Anetworkofoperatorsconnectedbydatastreams,atleastonedatasourceandonedatasink

•  Representedbyadirectedgraph–  GraphverLces:operators–  Graphedges:datastreams–  Usuallydirectedacyclicgraph(DAG)

•  Operator:–  Processingelementthattransformsoneormoreinputstreamsintoanotherstream

–  Canbestatelessorstateful5V.Cardellini-ThroughtheFog2017

Challenge1:Operatorplacement

•  HowtoassigntheDSPoperatorstocompuLngnodeswhicharedistributedinaFogenvironment

6V.Cardellini-ThroughtheFog2017

1 23

4 6

5

(1,2)

(1,2) (1,2) (2,3)(2,4)

(3,5)(4,5)

(4,6)

(4,6)

(2,4)(2,3)

(3,5)

(4,5)

(4,6)

Thebeginning:DistributedStorm

•  CurrentDSPsystems(e.g.,Storm,Flink,Heron)aredesignedtoruninsingledatacenters

•  OuriniLalgoal:toextendStormforalarge-scaledistributedandheterogeneousenvironment

7

V.Cardellini,V.Grassi,F.LoPres2,M.Nardelli,DistributedQoS-awareschedulinginStorm.DEBS’15.V.Cardellini-ThroughtheFog2017

NetworklatencyesLmaLon

•  HowtoprovideanefficientesLmaLonofthenetworkdelaybetweenpairsofnodes?

•  Useanetworkcoordinatessystem– Topredictlatencieswithoutperformingdirectmeasurements•  E.g.,Vivaldinetworkcoordinates:decentralizedandgossip-basedscheme

8V.Cardellini-ThroughtheFog2017

Operatorplacementpolicies

•  Operatorplacement:NP-hardproblem•  Severalplacementpoliciesinliterature(mainlyheurisLcs)thataddresssuchproblembut– DifferentassumpLons(systemmodel,applicaLontopology,QoSahributesandmetrics,…)

– DifferentobjecLves– Noteasilycomparable

9V.Cardellini-ThroughtheFog2017

ODP:OpLmalDSPPlacement•  WeproposeODP–  CentralizedpolicyforopLmalplacementofDSPapplicaLons

–  FormulatedasIntegerLinearProgramming(ILP)problem

•  Ourgoals:–  Tocomputetheop2malplacement(ofcourse!)

–  Toprovideaunifiedgeneralformula2onoftheplacementproblemforDSPapplicaLons(butnotonly!)

–  ToconsidermulLpleQoSaNributesofapplicaLonsandresources

–  ToprovideabenchmarkforheurisLcs

10

V.Cardellini,V.Grassi,F.LoPres2,M.Nardelli,Op2malOperatorPlacementforDistributedStreamProcessingApplica2ons,DEBS’16. V.Cardellini-ThroughtheFog2017

ODP:modelDSPapplica2on

11

Operators• CirequiredcompuLngresources• RiexecuLonLmeperdataunit

Datastreams• λi,j dataratefromoperatoritoj

V.Cardellini-ThroughtheFog2017

ODP:modelCompu2ngandnetworkresources

12

(Logical)Networklinks• du,vnetworkdelayfromutov

• Bu,v bandwidthfromutov

• Au,vlinkavailability

Compu2ngresources• Cuamountofresources

• Su processingspeed• Auresourceavailability

V.Cardellini-ThroughtheFog2017

13

DecisionvariablesWheretomapoperatorsanddatastreams

OpLmalDSPPlacementModel

i

j

xi,u=1

y(i,j),(u,v)=1

xj,v=1

u

z

v

w

ODP:model

V.Cardellini-ThroughtheFog2017

ODP:someQoSmetrics

•  Latency– Maxend-to-enddelaybetweensourcesanddesLnaLons

14

R

•  Availability–  Prob.thatalloperators/linksareupandrunning

•  Latencyandbandwidth–  Inter-nodetraffic–  Networkusage

•  Inflightbytes Σlinks∈lrate(l)Lat(l)V.Cardellini-ThroughtheFog2017

15

Latency

Availability

Networkbandwidthandnodecapacityconstraints

Assignmentandintegerconstraints

ODP:OpLmalDSPPlacementModelODP:ILPformulaLonTunableknobstosettheopLmalplacementgoals

V.Cardellini-ThroughtheFog2017

ODPandApacheStorm•  WecanuseODP

–  todeterminetheopLmalplacement

–  asbenchmarktoevaluateexisLngheurisLcs

16

ODP

V.Cardellini-ThroughtheFog2017

ODP:BenchmarkforplacementheurisLcsDistributedplacementheurisLcthatminimizesnetworkusage

17P.Pietzuchetal.,Network-awareoperatorplacementforstream-processingsystems,ICDE‘06.

Pietzuchetal.:

V.Cardellini-ThroughtheFog2017

Challenge2:placementandreplicaLon

•  ExploitapplicaLon-levelparallelismbyreplicaLonoperators

18V.Cardellini-ThroughtheFog2017

A B

A

A

A

Split Merge

OperatorplacementandreplicaLon

V.Cardellini-ThroughtheFog2017 19

ODRP:OpLmalDSPReplicaLonandPlacement•  WeproposeODRP

–  CentralizedpolicyforopLmalreplicaLonandplacementofDSPapplicaLons

–  FormulatedasIntegerLinearProgramming(ILP)problem

•  Ourgoals:–  TojointlydeterminetheopLmalnumberofreplicaandtheirplacement

–  ToconsidermulLpleQoSaNributesofapplicaLonsandresources

–  Toprovideaunifiedgeneralformula2on

–  ToprovideabenchmarkforheurisLcs

20

V.Cardellini,V.Grassi,F.LoPres2,M.Nardelli,Op2maloperatorreplica2onandplacementfordistributedstreamprocessingsystems.ACMPerf.Eval.Rew.,2017. V.Cardellini-ThroughtheFog2017

ODRPperformance

V.Cardellini-ThroughtheFog2017 21

sinkoperatorsource

RabbitMQRedis

data source parser filterByCoordinates

metronome

computeRouteID

partialRankcountByWindow globalRank

0.001

0.01

0.1

1

10

100

1000

20 40 60 80 100 120

Response time (s)

Source data rate (tuples/s)

S-ODP_RS-ODRP_R

DSPapplicaLon:DEBS2015GrandChallenge

Challenge3:runLmedeployment

•  ManyfactorsmaychangeatrunLme,e.g.,–  LoadvariaLons,QoSahributesofresources,costofresources(e.g.,duetodynamicpricingschemes),networkcharacterisLcs,nodemobility,…

•  HowtoadapttheplacementandreplicaLonwhenchangesoccur?Exploitself-adap2vedeployment

22V.Cardellini-ThroughtheFog2017

Self-adapLvedeployment

23

•  MAPE(Monitor,Analyze,PlanandExecute)

•  Planphase:howtoreconfiguretheapplicaLondeployment

V.Cardellini-ThroughtheFog2017

ReconfiguraLonchallenges

24

•  Reconfiguringthedeploymenthasanonnegligiblecost!

•  CanaffectnegaLvelyapplicaLonperformanceintheshortterm–  ApplicaLonfreezingLmescausedbyoperator

migraLonandscaling,especiallyforstatefuloperators

PerformreconfiguraLononlywhenneeded

TakeintoaccounttheoverheadformigraLngandscalingtheoperators

V.Cardellini-ThroughtheFog2017

ElasLcstatefulmigraLoninStorm

•  WedevelopmechanismsforelasLcstatefulmigraLoninApacheStorm

Supervisor Supervisor Supervisor Supervisor

worker

process

worker

process

worker

slot

worker

slot

worker

slot

worker

slot

worker

process

worker

process

worker

process

worker

process

worker

process

worker

process

DDS DDS DDS DDS

Network

schedulerMigrationNotifier

ElasticityManager

Nimbus ZooKeeper

25V.Cardellini-ThroughtheFog2017V.Cardellini,M.Nardelli,D.Luzi,Elas2cstatefulstreamprocessinginStorm,HPCS‘16.

EDRP:ElasLcDSPReplicaLonandPlacement

•  UnifiedframeworkfortheQoS-awareiniLaldeploymentandrunLmeelasLcitymanagementofDSPapplicaLons

•  Wemodelreconfigura2oncosts–  RelatedtomigraLngorscalingin/outtheoperators

•  CentralizedpolicyformulatedasIntegerLinearProgramming(ILP)problem

V.Cardellini-ThroughtheFog2017 26

V.Cardellini,F.LoPres2,M.Nardelli,G.RussoRusso,Op2maloperatordeploymentandreplica2onforelas2cdistributeddatastreamprocessing,underreview,2017.

EDRPperformance

V.Cardellini-ThroughtheFog2017 27

WithreconfiguraLon

penalLes:Availability:95.5%DownLme:<90s

WithoutreconfiguraLon

penalLes:Availability:81.5%DownLme:370s

Futurework•  StudyefficientheurisLcstodealwithlargeprobleminstances

•  Dealwithuncertainty:takeuncertaintyofparametersintoaccountanddesignrobustplacementalgorithms

•  StudyhowtodeploymulLplecompeLngapplicaLonsintheFog

•  IntegrateplacementdecisionwithSDN– WithSDN,networkintothecontrolloop

•  Studycross-layerstrategiesthatinvolvemulLpleBigdataframeworksintheFog–  E.g.,Heron+ApacheAurora+Mesos

28V.Cardellini-ThroughtheFog2017

Acknowledgments–Co-authors

29

VincenzoGrassi FrancescoLoPresL MaheoNardelli

Thankyou!AnyquesLons?

ValeriaCardellinicardellini@ing.uniroma2.it

hhp://www.ce.uniroma2.it/~valeriaV.Cardellini-ThroughtheFog2017

Recommended