NetML: An NFV Platform with Efficient Support for Machine Learning Applications
Cloud Server
Edge Server Pre-process the videos, learn and infer from streaming videos
Edge Server
High Resolution Video feed requires huge bandwidth
Obstacle!
Msg: Obstacle Ahead.RTT <10 msMsg: Slow
Down / Detour Msg: Traffic ahead
Edge Server:Infer the data from vehicles
Cloud Server
Cloud Server’s RTT > 10msToo slow to React for changing highway conditions
Edge Network for ML processing
Ø IoT devices produce a lot of data that is processed with ML algorithms
Ø Challenges with IoT hardware§ IoT hardware have limited resources such as processing,
storage, power etc.§ ML Processing has to be offloaded to a distant server§ Offloading adds latency, consumes network bandwidth
Ø Solution: Process ML applications in Edge Network§ Edge servers: more compute & storage than IoT devices§ Edge servers closer to IoT devices: reduces the network
latency, enabling real-time processing§ Reduce backbone network bandwidth consumption§ Edge servers can aggregate data from various devices to
generate a holistic view
Network Function Virtualization Platform
Ø NFV is framework to virtualize network function (NFs): load balancers, firewall, IDS etc.
Ø We are using OpenNetVM (ONVM) NFV platform in a COTS edge server to host NFs and ML applications in containers.
Ø ONVM uses the DPDK library and shared memory optimized for fast packet processing.
Ø Packets that arrive in the edge server go into a shared memory location before beingaccessed by NFs and ML applications.
ML Application
Challenges of using GPU in Edge Servers
Ø ML applications exploit GPUs for speeding up computation.
Ø But: getting data to/from GPUs causes additional latency
Ø Challenges with GPU§ GPU are PCIe resident devices and
all data is transferred to GPU via DMA using PCIe BUS
§ NVIDIA GPU’s require the data to be stored in page-locked “pinned” host memory to initiate DMA
§ Initiating data transfer from host one packet at a time has high overhead
DPDK Shared Memory[Pinned]
Image data in Packets
GPU DMA Engine
Deep Neural Network Image Buffer
Output
Transfer data from the buffer using single
cudaMemcpy()
Transfer data from the GPU using cudaMemcpy()
DMA Engine
Ø We used a data transfer kernel using NVIDIA CUDA’s universal virtual addressing (UVA) to access and transfer the data from payload of the packet to the GPU.
Our Solution (NetML)
DPDK Shared Memory[Pinned]
Image data in Packets
GPU DMA Engine
Deep Neural Network Image Buffer
Output
Data transfer kernel initiates DMA from GPU side utilizing
UVA
Transfer data from the GPU using
cudaMemcpy()
Data transfer kernel
DMA Engine
Pointers to the packet payload
Results
Figure : Taken from https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/Machine Learning Section
First Packet Arrives
cudaMemcpy of image data
0.9 ms
Copying Into Host Side Buffer
Machine Learning Section
First Packet Arrives
Data Moving Kernel
0.2 ms
NetML
K. K. RamakrishnanUniversity of California, Riverside
Aditya DhakalUniversity of California, Riverside
4.44.54.64.74.84.95
5.15.25.35.4
200 300 400 500 600 700 800
InferenceTime(Milliseconds)
Packet Size (Bytes)
Inference Time per Image in Pytorch
Copying MethodNetML
0
1000
2000
30004000
5000
6000
7000
8000
Copying Method NetML
Timeinmicroseconds
Method Used for Inference
Time Spent on Each Phase of Execution
Receving DataTransferring to GPU
Processing Time
50000
100000
150000
200000
250000
300000
350000
400000
450000
512 bytes 768 bytes
CPU
Consumption(Cycles)
Packet Size (Bytes)
CPU cycles consumed for moving each image to GPU
Copying MethodNetML