Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Edge-Cloud Converged System for Machine Inference and Learning
S A T Y A M V A G H A N I . V P & G M , I O T A N D A I
O C T O B E R 2 0 1 8
BackgroundThe Intelligent Edge and some use cases
A key change: Edge trumps Cloud
2017 2020
Cloud traffic
8.6 ZB15.3 ZB
2017 2020
IoT data
256 ZB
600 ZB
Sources: Cisco Global Cloud Index, Memoori
Key consequence: Intelligent Edge
AFTER
BEFORE
Data IngestionReal-time
Processing
Long-term
Processing
Real-time
Processing
Long-term
ProcessingData Ingestion
SENSORS
SENSORS
CLOUD
CLOUD
EDGE COMPUTING
IoT GATEWAY
Use case: ‘Amazon Go’ for restaurants
EDGE CLOUDx 100s
Apps &
models
Anomalies
Machine Inference
Use case: Product quality check
EDGE CLOUDx 100s
x 10s
✓✘
✓
Apps &
models
Insights
Machine Inference,
Analytics,
Actuation
Our journey and learningsBuilding a PoC is easy, operationalizing it at industrial scale is H-A-R-D
The Intelligent Edge is Not Ready
TWO PROBLEMS PREVENT WIDESPREAD ADOPTION
• Distributed infrastructure burden
• AI operationalization at industrial scale
A Smart Airport Example
TOPOLOGY
Airport 1 (of 10)
2000 x
100 x
250 x
A Smart Airport use case
OBJECT OF INTEREST
Look for a red car at airport(s)
| 10
SFO AIRPORT (EDGE) CLOUD
...
10:41AM: redcar=0
10:42AM: redcar=1
10:43AM: redcar=0
10:44AM: redcar=0
...
...
10:41AM: SFO, redcar=0
10:42AM: SFO, redcar=1
10:43AM: SFO, redcar=0
10:44AM: SFO, redcar=0
...
IoT infrastructure burden
APPS SPAN EDGE AND CLOUD AT PLANET SCALE
| 11
Train model to recognize redcar
Deploy model to selected edges
Runtime for car recognition model
Persistence for surveillance feed
Sampling surveillance feed to match model
input requirements
Persistence of image recognition
time series output Data mover to move time series output to
cloud
Persistence for time series data in cloud
App runtime in cloud
Securitybusiness logic
infrastructure madness
Introducing Xi IoT
PLATFORM
MIDDLEWARE
DEVOPS
BUSINESS LOGIC
Xi IoT
Xi IoT: High level architecture
Persistence
Runtime
Data Ingestion
Sensor Actuation
Streaming Data Service
Unstructured Data Service
Pub Sub Service
Data Bus
Machine Inference
Streaming Analytics
Code (Containers/FaaS)
Control Plane
SENSORS
Machine Training
Long term Analytics
Custom Code
Streaming Data Service
Unstructured Data Service
Structured Data Service
EDGE CLOUD
LONG TERM PROCESSINGREAL TIME PROCESSINGINGRESS
Operations Console
Developer Console SaaS
ML-specific learnings
#1: Handle inference hardware diversity
EDGE EDGE EDGE
GPU ASIC FPGA
SaaS
Model
1Model
2
Model
3
ML FRAMEWORK 1 ML FRAMEWORK 2 Model Onboarding
COMPILER 1 COMPILER 2 Model DeploymentCOMPILER 3
Model
1Model
2
Model
1Model3
Model
1
…
…
#2: Fit model to Edge constraints
SaaS
ML FRAMEWORK 1 ML FRAMEWORK 2 Model Onboarding
COMPILER 1 COMPILER 2 Model DeploymentCOMPILER 3
…
…
TensorRT Quantization NN Pruning
Fitting
Strategy
Model
Library
…
Model Development
Edge Infrastructure
EDGE
#3: Make inference hardware multi-tenant
Static allocation
Unclear benefit
No sharing Dynamic allocation
Dynamic rebalancing
EDGE EDGE
GPU GPU GPU
VM VMVMCtr/
FnCtr/
Fn
Ctr/
Fn
vGPUsINFERENCE MUX
Ctr/
FnCtr/
Fn
Ctr/
Fn
Centr
aliz
ed in
fere
nce
resourc
e m
anagem
ent
H/W
capabili
ty input
#4: Stretch inferencing across nodes
EDGE
GPU
Ctr/
FnCtr/
Fn
Ctr/
Fn
Centr
aliz
ed in
fere
nce
resourc
e m
anagem
ent
H/W
capabili
ty input
Hard to scale perf with use
No HA
Easier to scale perf with use
Additional HA benefit
EDGE
GPU
Ctr
/Fn
Ctr/
Fn
Centr
aliz
ed in
fere
nce
resourc
e m
anagem
ent
EDGE
GPU
Ctr/
Fn
INFERENCE MUX INFERENCE MUX
#5: Advanced resource management
Workload re-distribution due to contention or user defined objectives
EDGE
GPU
Model
1
Model
2
CPU
Model
3
EDGE
GPU
Model
1
Model
2
CPU
Model
3
SaaS
User Policies Edge constraints Global Resource Manager
SaaS orchestrated
transition
#6: Update models at planet-scale
GOOD: APP EMBEDDED
MODELS
• Hard to construct/maintain
• Hard to update
• Hard to share
BETTER: DIS-AGGREGATED
MODELS & CONTAINERS
• Hard to construct/maintain
• Easy to update
• Easy to share
BEST: FUNCTIONS & MODELS-
as-ARGUMENTS
• Easy to construct/maintain
• Easy to update, roll back
• Easy to share
SaaSFitting Strategy Model Library
EDGE EDGE EDGE
Container
Model
ContainerModel Function Model
BEFORE
Two different silos to operate: user management,
security, data management, infrastructure
#7: Converge learning & inferencing flows
AFTER
Unified control and data plane for learning and
inferencing: uniform user management, security, data
management, end-to-end infrastructure.
OT data can be easily fed back to learning.
LEARNING
PLATFORM
LEARNING
APPS
DATA LAKE
EDGE
PLATFORMLEARNING
PLATFORMPUB/PVT
CLOUD
EDGE
PLATFORM
INFERENCE
APPS
OT DATA
SOURCES Xi IoT DATA BUS: OT DATA & DATA LAKE
INFERENCE
APPS
LEARNING
APPS
Xi IoT CONTROL PLANE
OTHER
APPS
Takeaways
1 Success of machine inferencing in Enterprise IoT greatly depends on success of the Intelligent Edge
2 Many obstacles in operationalizing machine inference; important to get past them via systems software instead of human intervention
3 Nutanix solved these problems while creating Xi IoT, with positive validation in retail, manufacturing, oil & gas verticals
Thank you