Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
BigControl:InfrastructureforCollaborativeDeviceSwarms
GuruParulkarOnBehalfof
StanfordPlatformLabandCollaborators
Collaborators
MacSchawagerAeronautics&Astronautics
Mykel KochenderferAeronautics&Astronautics
Balaji PrabhakarElectricalEng/Computer Science
PeterBailisComputerScience
PatHanrahanElectricalEng/Computer Science
ClarificationandDisclaimer!
• Itisabroadinterdisciplinaryresearchprogram
– Participants(will) includeseveralprofessorswithdiverseexpertiseand10sofgraduatestudents
– Weareatthebeginning oftheprogram
– Wehaveavisionandhigh levelideasbutdon’thaveallthesolutionsandanswers
• ClearlyIdon’thavetheexpertisetopresentthebroadresearchagenda
– Sobenicetome!!!
EmergenceofautonomousdevicesTheywillbepartofourlivesinnearfuture
Sensing+Actuation+Computing+AI
AutonomousVehicles:car,truck Quadcopter/Drone/UAV
RobotsPhysicalspaceswithsensors
connectedtothenet
Forallimages,copyrightsapply.
SwarmsofAutonomousDevices• Wewillhaveinnearfuture“swarmsofautonomousdevices”
ownedandoperatedby– Individuals– Corporations– GovernmentagenciesToachieveavarietyoftasks
• Theswarmsofautonomousdeviceswillhavetosharephysicalspacesandcoordinateamongthemselvestoachievetheirtasks
Coordinationandcontrolofswarmsofautonomousdevicesrepresentsahugechallengeandabigopportunity!
P2PApproachtoCoordinationandControl
• P2Pcontroldifficulttoscale,reason&doglobalplanning/optimizationwith• Everycontrolappneedstosolveadistributedsystemproblem• Ithasn’tworkedinothercomputerandnetworkingsystems
WehavedecidednottofocusonP2Pcontrolandcoordination
LogicallyCentralizedControlandCoordination
CentralizedControlandPlanning– Swarmscanaccomplishbiggerandbetterthings
ControlPlatform
App App App
Network
ControlPlatform
App App App
Network
BigControlPlatform
App App App
Network
LocalandRemote(LogicallyCentralized)ControlReflexiveandPlannedControl
Ourkeyassumptions
• Localautonomouscontrolresponsiblefor
– Stabilityandcollision/disasteravoidance–
– Veryfastreactiontime– “reflexivebehavior”
Lotofon-going researchanddevelopment
• Logicallycentralizedcontrol– ourfocus
– Withaglobalviewandforagloballyoptimaloperation forswarm(s)ofdevices
– Relativelyslowerreactiontime– “plannedbehavior”
BigControlPlatform
App App App
Network
GlobalView
LogicallyCentralizedControl:KeyBenefits
• Canusemorecomputinghorsepower
– Forbettercoordination andcontrol
• Canleverageorintegratelargedatasets
– Historicdata
– Potentiallyrealtimesimulation results
• Createsaglobalview
– Helpswithoptimaloperationof swarm(s)
– Simplifiesapplicationdevelopment
– Makesiteasiertoreasonaboutsystem
BigControlPlatform
App App App
Network
GlobalView Historic&Simulation
Data
ImaginethePossibilities• Rushhourcommuteinamajormetroarea
– Coordinationandcontrolofmillioncars,trafficlights,navigation,people,…– Coordinationandcontrolcansignificantlyimprovehighwayutilization,improvecommute
time,andmakethecommutetimemoreproductive
• Alargedistributioncenter– 10sofmillionsofpackagesinabigwarehouses– 10Kormoredrones– Track,organize,movemillionsofpackagesinabigwarehouse
• 3DScanningofstructures,campuses,cities,…– Afleetofdronestocapturepartofstructure/campus/city– Acoordinatedtrajectories toobservethetargetfromdifferentvantagepoints
• Automatedagriculture– Afleetofdronestoobservethousandsofindividualplansdaily– Afleetofcrop-tendingdronestoprovidespecialized services:watering,pruning,spraying
ImaginethePossibilities:DisasterRecoveryAGrandChallenge
• Example:HurricaneSandyorTohuku Earthquake• Millionsofpeopleand10sofsquaremilesaffected
– Cut-offwithoutwater,food,electricity,communication, roads• Afleetoftrucksandboats
– eachwith1000sofdrones,power,wirelessbasestations• Coordinatedsearchwithaswarmofdrones
– Scourtheaffectedregion,assessthedamage,locatesurvivors, identify regionsrequiring immediatehelptoguide therelief
• Mobilizingautomatedemergencyresponsevehicles(ERVs)– Coordinatedwithdrones fornavigationtoareasrequiring relief
• Smallpackagedelivery– Medicalsupplyandfood topeopleandreliefworkers
BIGCONTROLPLATFORM(BCP)
BigControlasaPlatform• BigControlPlatform
– ThisdecadewillbeaboutBigControlasthelastdecadewasaboutBigData
• LikeMapReduce andSpark,wewanttobuildBCPasaplatform– Itwillsolvesignificantproblems– Simpler frameworkfordeveloperstomake
iteasiertowritecontrolappsWithsomeconstraints
• Differentiatingattributes– Scale– Collaborationamongautonomous devices– Lowlatency
BigControlPlatform
App App App
Network
GlobalView
BigControlPlatform(BCP)Requirements
BigControlPlatform
App App App
Network
GlobalView Historic&Simulation
Data
Realtimesensing
Realtimecontrol
• Realtimesensing– Location,velocity, trajectory– Environmentalparameters– StreamingdataHighvolumenoisydata
• HistoricdatainthestoreHugedatasets
• Globalvieworstate– Mostaccuratestateofdevices inthe
physicalspace– Assessment ofthephysical spaceComputationally veryintensive
• Realtimecontrol– Arevisedplan(actions) todevices:
location,velocity, trajectory,whattosense,…
Lowlatencycomputationallyintensivewithpredictability
BCPrequiresaddressingmanytoughchallenges
BCPInternals:DistributedStateManagementBig Control: Infrastructure for Collaborative Device Swarms
cations. BCP will solve problems such as distributed data management, data fusion, and behavior planning,and it will offer simplified frameworks such as a declarative query language for behavior planning.
Wired and Wireless Networks
Distributed State Management,Notifications
Planning & Control
StableTrajectoryTracking
Data Ingestion,Inference Engine
ActiveSensing
AdaptiveOptimal
Scheduling
DeepReinforcmtLearning
Declarative Planning Global Views of Swarm
Devices
Big Control Platform (BCP)
PackageTracking
LargeScale
MappingCommute
ManagementDisasterRelief
Applications
DataReposi-tories
Real-Time
Sensing
Real-Time
Actuation
Figure 1: System overview for managing collaborative deviceswarms.
Figure 1 shows the overall environment formanaging device swarms and the architecture ofBCP. BCP comprises three layers. The lowestlayer is responsible for distributed state manage-ment and communication, including communica-tion with swarm devices and high-performance no-tifications among the BCP elements. The middlelayer provides two modules that will underlie all ap-plications. The first module is responsible for dataingestion and fusion, and includes an inference en-gine to build consistent global views from the noisydata provided by swarm devices. The second mod-ule is a planning engine, whose function for BigControl applications is similar to that of a relationaldatabase for Big Data applications. It will allow ag-gregate behavioral goals to be specified a declara-tive language, and it will then compute detailed de-vice behaviors to implement those goals. The thirdlayer uses these facilities to create domain-specificservices such as active sensing and deep reinforce-ment learning.
BCP will execute on large clusters of servers inboth edge and core datacenters. In order to handleimmense device swarms, it must scale across hundreds or thousands of servers. BCP must also integratewith massive existing backend datastores, such as those used for Big Data applications. Finally, the facilitiesprovided by BCP must be general purpose, so they can be used for many different applications on a singleswarm, as well as applications on different swarms and applications that encompass multiple swarms.
4 ResearchEnabling large-scale end-to-end swarm applications such as the disaster recovery Grand Challenge will
require a diverse collection of research challenges to be solved and integrated, from low-level communi-cation to high-level multi-agent optimization, path planning, and spatial data management. The sectionsbelow summarize our five major research thrusts, consisting of the BCP system, three supporting areas (lowlatency datacenter, network infrastructure, and security), and applications. Because of space limitations, thedescriptions here are necessarily brief; our full proposal will provide more details.4.1 Big Control Platform (BCP)
Most of the research for this Expedition will be focused on designing and implementing the three layersof BCP, which were introduced in Section 3. These layers present very different research challenges, yetthey must all work together to enable Big Control applications.4.1.1 Bottom Layer: Scalable State Management and Notifications
One of the critical functions of a centralized control plane such as BCP is the ability to create, observeand maintain system state and to communicate changes in that state among the elements of the application.Large swarms will have large amounts of state (1TB or more, not including sensing data) and must supportrapid changes in that state (millions of updates per second). Furthermore, this state must be accessible tolarge numbers of servers in order to meet the computational requirements of a large swarm. Big Controlapplications require a different approach to state management than Big Data applications because of the
4
• BCPisademandingdatacenterapplication– Clusterswithlowlatencycommunication
• Distributedstatemanagement– Stateoftheswarmonmanyservers
• 1TB+&millions ofopspersecond
– Differentstatemayrequiredifferentconsistency,replication,anddurability forthebestperformance
– Accessible tomodulesandappsonmanyservers
• Notifications– Eventnotification todifferentcomponents
– Highthroughput,lowlatency,fault-tolerant
• Highavailability– Allowindividualmoduleorservertofail
– In-serviceupgrades
BCPInternals:DataIngestionandInferenceBig Control: Infrastructure for Collaborative Device Swarms
cations. BCP will solve problems such as distributed data management, data fusion, and behavior planning,and it will offer simplified frameworks such as a declarative query language for behavior planning.
Wired and Wireless Networks
Distributed State Management,Notifications
Planning & Control
StableTrajectoryTracking
Data Ingestion,Inference Engine
ActiveSensing
AdaptiveOptimal
Scheduling
DeepReinforcmtLearning
Declarative Planning Global Views of Swarm
Devices
Big Control Platform (BCP)
PackageTracking
LargeScale
MappingCommute
ManagementDisasterRelief
Applications
DataReposi-tories
Real-Time
Sensing
Real-Time
Actuation
Figure 1: System overview for managing collaborative deviceswarms.
Figure 1 shows the overall environment formanaging device swarms and the architecture ofBCP. BCP comprises three layers. The lowestlayer is responsible for distributed state manage-ment and communication, including communica-tion with swarm devices and high-performance no-tifications among the BCP elements. The middlelayer provides two modules that will underlie all ap-plications. The first module is responsible for dataingestion and fusion, and includes an inference en-gine to build consistent global views from the noisydata provided by swarm devices. The second mod-ule is a planning engine, whose function for BigControl applications is similar to that of a relationaldatabase for Big Data applications. It will allow ag-gregate behavioral goals to be specified a declara-tive language, and it will then compute detailed de-vice behaviors to implement those goals. The thirdlayer uses these facilities to create domain-specificservices such as active sensing and deep reinforce-ment learning.
BCP will execute on large clusters of servers inboth edge and core datacenters. In order to handleimmense device swarms, it must scale across hundreds or thousands of servers. BCP must also integratewith massive existing backend datastores, such as those used for Big Data applications. Finally, the facilitiesprovided by BCP must be general purpose, so they can be used for many different applications on a singleswarm, as well as applications on different swarms and applications that encompass multiple swarms.
4 ResearchEnabling large-scale end-to-end swarm applications such as the disaster recovery Grand Challenge will
require a diverse collection of research challenges to be solved and integrated, from low-level communi-cation to high-level multi-agent optimization, path planning, and spatial data management. The sectionsbelow summarize our five major research thrusts, consisting of the BCP system, three supporting areas (lowlatency datacenter, network infrastructure, and security), and applications. Because of space limitations, thedescriptions here are necessarily brief; our full proposal will provide more details.4.1 Big Control Platform (BCP)
Most of the research for this Expedition will be focused on designing and implementing the three layersof BCP, which were introduced in Section 3. These layers present very different research challenges, yetthey must all work together to enable Big Control applications.4.1.1 Bottom Layer: Scalable State Management and Notifications
One of the critical functions of a centralized control plane such as BCP is the ability to create, observeand maintain system state and to communicate changes in that state among the elements of the application.Large swarms will have large amounts of state (1TB or more, not including sensing data) and must supportrapid changes in that state (millions of updates per second). Furthermore, this state must be accessible tolarge numbers of servers in order to meet the computational requirements of a large swarm. Big Controlapplications require a different approach to state management than Big Data applications because of the
4
• Sensingdataishighvolumeandnoisy
• Ingestionmustrapidly
– heal/curethedata,indexit,combine itwithotherrelevantdatasuchasmaps,pointsofinterest,etc,andrender it
– Inferencetodetectandreportanomalies
Toconstructtheglobalview
• Overallapproachinvolvesreconstructingspatio-temporalprocessesfromnoisysnapshots
• Leveragedistributedstatemanagementandlowlatencycommunication
BCPInternals:DeclarativePlanningBig Control: Infrastructure for Collaborative Device Swarms
cations. BCP will solve problems such as distributed data management, data fusion, and behavior planning,and it will offer simplified frameworks such as a declarative query language for behavior planning.
Wired and Wireless Networks
Distributed State Management,Notifications
Planning & Control
StableTrajectoryTracking
Data Ingestion,Inference Engine
ActiveSensing
AdaptiveOptimal
Scheduling
DeepReinforcmtLearning
Declarative Planning Global Views of Swarm
Devices
Big Control Platform (BCP)
PackageTracking
LargeScale
MappingCommute
ManagementDisasterRelief
Applications
DataReposi-tories
Real-Time
Sensing
Real-Time
Actuation
Figure 1: System overview for managing collaborative deviceswarms.
Figure 1 shows the overall environment formanaging device swarms and the architecture ofBCP. BCP comprises three layers. The lowestlayer is responsible for distributed state manage-ment and communication, including communica-tion with swarm devices and high-performance no-tifications among the BCP elements. The middlelayer provides two modules that will underlie all ap-plications. The first module is responsible for dataingestion and fusion, and includes an inference en-gine to build consistent global views from the noisydata provided by swarm devices. The second mod-ule is a planning engine, whose function for BigControl applications is similar to that of a relationaldatabase for Big Data applications. It will allow ag-gregate behavioral goals to be specified a declara-tive language, and it will then compute detailed de-vice behaviors to implement those goals. The thirdlayer uses these facilities to create domain-specificservices such as active sensing and deep reinforce-ment learning.
BCP will execute on large clusters of servers inboth edge and core datacenters. In order to handleimmense device swarms, it must scale across hundreds or thousands of servers. BCP must also integratewith massive existing backend datastores, such as those used for Big Data applications. Finally, the facilitiesprovided by BCP must be general purpose, so they can be used for many different applications on a singleswarm, as well as applications on different swarms and applications that encompass multiple swarms.
4 ResearchEnabling large-scale end-to-end swarm applications such as the disaster recovery Grand Challenge will
require a diverse collection of research challenges to be solved and integrated, from low-level communi-cation to high-level multi-agent optimization, path planning, and spatial data management. The sectionsbelow summarize our five major research thrusts, consisting of the BCP system, three supporting areas (lowlatency datacenter, network infrastructure, and security), and applications. Because of space limitations, thedescriptions here are necessarily brief; our full proposal will provide more details.4.1 Big Control Platform (BCP)
Most of the research for this Expedition will be focused on designing and implementing the three layersof BCP, which were introduced in Section 3. These layers present very different research challenges, yetthey must all work together to enable Big Control applications.4.1.1 Bottom Layer: Scalable State Management and Notifications
One of the critical functions of a centralized control plane such as BCP is the ability to create, observeand maintain system state and to communicate changes in that state among the elements of the application.Large swarms will have large amounts of state (1TB or more, not including sensing data) and must supportrapid changes in that state (millions of updates per second). Furthermore, this state must be accessible tolarge numbers of servers in order to meet the computational requirements of a large swarm. Big Controlapplications require a different approach to state management than Big Data applications because of the
4
• Providehigherlevelabstractionstoappdevelopers
– Whatvs.how
– E.g.AskforIRreadingsfromspecificlocationvs.precisecommandsforindividualdevicestosurveyagivenarea
• PlanningEnginetranslateshigherleveldeclarativeplantospecificcommandsforvariousdevices
– Similartoqueryplanning inrelationaldatabases
BCPInternals:AppServicesBig Control: Infrastructure for Collaborative Device Swarms
cations. BCP will solve problems such as distributed data management, data fusion, and behavior planning,and it will offer simplified frameworks such as a declarative query language for behavior planning.
Wired and Wireless Networks
Distributed State Management,Notifications
Planning & Control
StableTrajectoryTracking
Data Ingestion,Inference Engine
ActiveSensing
AdaptiveOptimal
Scheduling
DeepReinforcmtLearning
Declarative Planning Global Views of Swarm
Devices
Big Control Platform (BCP)
PackageTracking
LargeScale
MappingCommute
ManagementDisasterRelief
Applications
DataReposi-tories
Real-Time
Sensing
Real-Time
Actuation
Figure 1: System overview for managing collaborative deviceswarms.
Figure 1 shows the overall environment formanaging device swarms and the architecture ofBCP. BCP comprises three layers. The lowestlayer is responsible for distributed state manage-ment and communication, including communica-tion with swarm devices and high-performance no-tifications among the BCP elements. The middlelayer provides two modules that will underlie all ap-plications. The first module is responsible for dataingestion and fusion, and includes an inference en-gine to build consistent global views from the noisydata provided by swarm devices. The second mod-ule is a planning engine, whose function for BigControl applications is similar to that of a relationaldatabase for Big Data applications. It will allow ag-gregate behavioral goals to be specified a declara-tive language, and it will then compute detailed de-vice behaviors to implement those goals. The thirdlayer uses these facilities to create domain-specificservices such as active sensing and deep reinforce-ment learning.
BCP will execute on large clusters of servers inboth edge and core datacenters. In order to handleimmense device swarms, it must scale across hundreds or thousands of servers. BCP must also integratewith massive existing backend datastores, such as those used for Big Data applications. Finally, the facilitiesprovided by BCP must be general purpose, so they can be used for many different applications on a singleswarm, as well as applications on different swarms and applications that encompass multiple swarms.
4 ResearchEnabling large-scale end-to-end swarm applications such as the disaster recovery Grand Challenge will
require a diverse collection of research challenges to be solved and integrated, from low-level communi-cation to high-level multi-agent optimization, path planning, and spatial data management. The sectionsbelow summarize our five major research thrusts, consisting of the BCP system, three supporting areas (lowlatency datacenter, network infrastructure, and security), and applications. Because of space limitations, thedescriptions here are necessarily brief; our full proposal will provide more details.4.1 Big Control Platform (BCP)
Most of the research for this Expedition will be focused on designing and implementing the three layersof BCP, which were introduced in Section 3. These layers present very different research challenges, yetthey must all work together to enable Big Control applications.4.1.1 Bottom Layer: Scalable State Management and Notifications
One of the critical functions of a centralized control plane such as BCP is the ability to create, observeand maintain system state and to communicate changes in that state among the elements of the application.Large swarms will have large amounts of state (1TB or more, not including sensing data) and must supportrapid changes in that state (millions of updates per second). Furthermore, this state must be accessible tolarge numbers of servers in order to meet the computational requirements of a large swarm. Big Controlapplications require a different approach to state management than Big Data applications because of the
4
• ActiveSensing– Quantifyprecisionofcurrentknowledgeand
takeactionstoimproveit– Requiresfusionofdatafrommultiplesensors–
off-linealgorithmstoday– Ourgoalistoexploreon-linealgorithms
• AdaptiveOptimalScheduling– Overallplanfortheswarm– Leveragesreinforcementlearning
• DeepReinforcementLearning– Tocopewithmanyuncertainties
• StableTrajectoryTracking– Basedonfeedbackfromactualsenseddata– Incrementalcontroltostayonstabletrajectory
BCPRequiresLowLatencyDataCenter
• End-to-endlatency~10ms• Apps+BCPcomputationonserversshould
belowlatencyandpredictable• Low-LatencyRPC:1-10usec
– Newtransportprotocolwithareceiverdrivencongestionandflowcontrol
– Kernelby-pass– Polling toavoidinterruptoverheads– Minimalstateforscalability
• Core-awarethreadscheduling– Kerneldoesscheduling ofcores– Applicationdoesitsownthreadscheduling
BigControlPlatform
App App App
Network
GlobalView
BCPRequiresLowLatencyDataCenter
Lowlatencystorage• Currentmemoryabstractionsoptimizedfor
limitedmemory,hightemporallocality,andslowpersistentstorage
• Flashandnon-volatilememoryprovidefasthighcapacitystorage
• Newmemoryabstractions(singlelevel),APIsandstoragerepresentationsforfastaccess
BigControlPlatform
App App App
Network
GlobalView
SoftwareDefinedNetworksForLowLatencyandPredictability
• Bigcontrolrequiresnetworkstobetunedfor“control”– Lowlatency(1ms)andpredictability– Highly reliableconnectivity
• Softwaredefinednetworks– Programmabledataandcontrolplanesto
achievelowlatencywithQoS orpredictability
– Slicingfordifferentappstoallowisolationandcustomization
– Eachslicecanbeprogrammedtoachieveitsperformance targets
BigControlPlatform
App App App
Network
GlobalView
Security
• Secureconnectivity– Protectcommunicationamongswarmparticipantsfromtempering– Eachpacketcannotbedigitallysigned– tooexpensive– Designnewswarm-integrityandsimplekeymanagement techniques
• Protectionagainstdronesensorattacks– Sensorattacks:disablesensorsonDrones(sound wavestodisablegyroscope, GPS
spoofing todisablepositioning system)– Swarmcollectivescanfindwaystoberesilienttosensorattacks(byzantinefailureson
asmallsetofdrones)– Designrelevantsynchronizationandbyzantinefaulttolerantprotocols
WhyStanfordPlatformLab?• OurDNAistotakeonsuchchallengingprogramswithpotentialforhugeimpact• Aninterdisciplinary researchagendarequires
– Breadthanddepthofexpertiseinmultiple fields– easytofindatStanford– Combinationoffoundational researchandscalablehighperformancesystems–
something wehaveahistoryofdoing• Wehaveastrongcollaborationwiththeindustry
– Cloudproviders, vendors,andother stakeholders– Togetherwecancreategeneralpurposeplatformsandsolutions fortheindustry to
useandbuildon• Expectedresults
– Architectures,abstractions,algorithmsandotherfundamental contributions– Opensourceplatformsandcommunities toenablemoreinnovation inthisarea– Successfultechnology transfertoimpactpracticeofthefield– Integrationofresearchandeducation
Summary
• BigControlisanexcitingopportunitytodefineanewcomputingparadigmthatwillenableapplicationswithfar-reachingimpactsonthesociety
• BigControlanditsapplicationsposemanytoughchallengesthatareworthyofourfocusforthenextfewyears
• Excitingtimesahead…
• Timetojumpinwithbothfeet!!
BreakoutSessions
• FirstBreakoutSession– ApplicationsofBCP
• SecondBreakoutSession– Candidatestartupprojects– BCPTechnologiesandComponents
• DistributedStateManagement• Dataingestionandinference• Declarativeprogramminglanguagesforplanningandcontrol
– Lowlatencydatacenter– SDNforlowlatencyandQOS– End-to-endcontrol– whatitwouldlooklike(latencybudgets)
• Doashowofhands