Download pdf - NetML: An NFV Platform with Efficient Support for Machine ... · NetML: An NFV Platform with Efficient Support for Machine Learning Applications Cloud Server Edge Server Pre-process

NetML: An NFV Platform with Efficient Support for Machine Learning Applications

Cloud Server

Edge Server Pre-process the videos, learn and infer from streaming videos

Edge Server

High Resolution Video feed requires huge bandwidth

Obstacle!

Msg: Obstacle Ahead.RTT <10 msMsg: Slow

Down / Detour Msg: Traffic ahead

Edge Server:Infer the data from vehicles

Cloud Server

Cloud Server’s RTT > 10msToo slow to React for changing highway conditions

Edge Network for ML processing

Ø IoT devices produce a lot of data that is processed with ML algorithms

Ø Challenges with IoT hardware§ IoT hardware have limited resources such as processing,

storage, power etc.§ ML Processing has to be offloaded to a distant server§ Offloading adds latency, consumes network bandwidth

Ø Solution: Process ML applications in Edge Network§ Edge servers: more compute & storage than IoT devices§ Edge servers closer to IoT devices: reduces the network

latency, enabling real-time processing§ Reduce backbone network bandwidth consumption§ Edge servers can aggregate data from various devices to

generate a holistic view

Network Function Virtualization Platform

Ø NFV is framework to virtualize network function (NFs): load balancers, firewall, IDS etc.

Ø We are using OpenNetVM (ONVM) NFV platform in a COTS edge server to host NFs and ML applications in containers.

Ø ONVM uses the DPDK library and shared memory optimized for fast packet processing.

Ø Packets that arrive in the edge server go into a shared memory location before beingaccessed by NFs and ML applications.

ML Application

Challenges of using GPU in Edge Servers

Ø ML applications exploit GPUs for speeding up computation.

Ø But: getting data to/from GPUs causes additional latency

Ø Challenges with GPU§ GPU are PCIe resident devices and

all data is transferred to GPU via DMA using PCIe BUS

§ NVIDIA GPU’s require the data to be stored in page-locked “pinned” host memory to initiate DMA

§ Initiating data transfer from host one packet at a time has high overhead

DPDK Shared Memory[Pinned]

Image data in Packets

GPU DMA Engine

Deep Neural Network Image Buffer

Output

Transfer data from the buffer using single

cudaMemcpy()

Transfer data from the GPU using cudaMemcpy()

DMA Engine

Ø We used a data transfer kernel using NVIDIA CUDA’s universal virtual addressing (UVA) to access and transfer the data from payload of the packet to the GPU.

Our Solution (NetML)

DPDK Shared Memory[Pinned]

Image data in Packets

GPU DMA Engine

Deep Neural Network Image Buffer

Output

Data transfer kernel initiates DMA from GPU side utilizing

UVA

Transfer data from the GPU using

cudaMemcpy()

Data transfer kernel

DMA Engine

Pointers to the packet payload

Results

Figure : Taken from https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/Machine Learning Section

First Packet Arrives

cudaMemcpy of image data

0.9 ms

Copying Into Host Side Buffer

Machine Learning Section

First Packet Arrives

Data Moving Kernel

0.2 ms

NetML

K. K. RamakrishnanUniversity of California, Riverside

Aditya DhakalUniversity of California, Riverside

4.44.54.64.74.84.95

5.15.25.35.4

200 300 400 500 600 700 800

InferenceTime(Milliseconds)

Packet Size (Bytes)

Inference Time per Image in Pytorch

Copying MethodNetML

0

1000

2000

30004000

5000

6000

7000

8000

Copying Method NetML

Timeinmicroseconds

Method Used for Inference

Time Spent on Each Phase of Execution

Receving DataTransferring to GPU

Processing Time

50000

100000

150000

200000

250000

300000

350000

400000

450000

512 bytes 768 bytes

CPU

Consumption(Cycles)

Packet Size (Bytes)

CPU cycles consumed for moving each image to GPU

Copying MethodNetML