1
NetML: An NFV Platform with Efficient Support for Machine Learning Applications Cloud Server Edge Server Pre-process the videos, learn and infer from streaming videos Edge Server High Resolution Video feed requires huge bandwidth Obstacle! Msg: Obstacle Ahead. RTT <10 ms Msg: Slow Down / Detour Msg: Traffic ahead Edge Server: Infer the data from vehicles Cloud Server Cloud Server’s RTT > 10ms Too slow to React for changing highway conditions Edge Network for ML processing Ø IoT devices produce a lot of data that is processed with ML algorithms Ø Challenges with IoT hardware § IoT hardware have limited resources such as processing, storage, power etc. § ML Processing has to be offloaded to a distant server § Offloading adds latency, consumes network bandwidth Ø Solution: Process ML applications in Edge Network § Edge servers: more compute & storage than IoT devices § Edge servers closer to IoT devices: reduces the network latency, enabling real-time processing § Reduce backbone network bandwidth consumption § Edge servers can aggregate data from various devices to generate a holistic view Network Function Virtualization Platform Ø NFV is framework to virtualize network function (NFs): load balancers, firewall, IDS etc. Ø We are using OpenNetVM (ONVM) NFV platform in a COTS edge server to host NFs and ML applications in containers. Ø ONVM uses the DPDK library and shared memory optimized for fast packet processing. Ø Packets that arrive in the edge server go into a shared memory location before being accessed by NFs and ML applications. ML Application Challenges of u sing GPU in Edge Servers Ø ML applications exploit GPUs for speeding up computation. Ø But: getting data to/from GPUs causes additional latency Ø Challenges with GPU § GPU are PCIe resident devices and all data is transferred to GPU via DMA using PCIe BUS § NVIDIA GPU’s require the data to be stored in page-locked “pinned” host memory to initiate DMA § Initiating data transfer from host one packet at a time has high overhead DPDK Shared Memory[Pinned] Image data in Packets GPU DMA Engine Deep Neural Network Image Buffer Output Transfer data from the buffer using single cudaMemcpy() Transfer data from the GPU using cudaMemcpy() DMA Engine Ø We used a data transfer kernel using NVIDIA CUDA’s universal virtual addressing (UVA) to access and transfer the data from payload of the packet to the GPU. Our Solution (NetML) DPDK Shared Memory[Pinned] Image data in Packets GPU DMA Engine Deep Neural Network Image Buffer Output Data transfer kernel initiates DMA from GPU side utilizing UVA Transfer data from the GPU using cudaMemcpy() Data transfer kernel DMA Engine Pointers to the packet payload Results Figure : Taken from https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/ Machine Learning Section First Packet Arrives cudaMemcpy of image data 0.9 ms Copying Into Host Side Buffer Machine Learning Section First Packet Arrives Data Moving Kernel 0.2 ms NetML K. K. Ramakrishnan University of California, Riversid e Aditya Dhakal University of California, Riverside 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 200 300 400 500 600 700 800 Inference Time (Milliseconds) Packet Size (Bytes) Inference Time per Image in Pytorch Copying Method NetML 0 1000 2000 3000 4000 5000 6000 7000 8000 Copying Method NetML Time in microseconds Method Used for Inference Time Spent on Each Phase of Execution Receving Data Transferring to GPU Processing Time 50000 100000 150000 200000 250000 300000 350000 400000 450000 512 bytes 768 bytes CPU Consumption (Cycles) Packet Size (Bytes) CPU cycles consumed for moving each image to GPU Copying Method NetML

NetML: An NFV Platform with Efficient Support for Machine ... · NetML: An NFV Platform with Efficient Support for Machine Learning Applications Cloud Server Edge Server Pre-process

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NetML: An NFV Platform with Efficient Support for Machine ... · NetML: An NFV Platform with Efficient Support for Machine Learning Applications Cloud Server Edge Server Pre-process

NetML: An NFV Platform with Efficient Support for Machine Learning Applications

Cloud Server

Edge Server Pre-process the videos, learn and infer from streaming videos

Edge Server

High Resolution Video feed requires huge bandwidth

Obstacle!

Msg: Obstacle Ahead.RTT <10 msMsg: Slow

Down / Detour Msg: Traffic ahead

Edge Server:Infer the data from vehicles

Cloud Server

Cloud Server’s RTT > 10msToo slow to React for changing highway conditions

Edge Network for ML processing

Ø IoT devices produce a lot of data that is processed with ML algorithms

Ø Challenges with IoT hardware§ IoT hardware have limited resources such as processing,

storage, power etc.§ ML Processing has to be offloaded to a distant server§ Offloading adds latency, consumes network bandwidth

Ø Solution: Process ML applications in Edge Network§ Edge servers: more compute & storage than IoT devices§ Edge servers closer to IoT devices: reduces the network

latency, enabling real-time processing§ Reduce backbone network bandwidth consumption§ Edge servers can aggregate data from various devices to

generate a holistic view

Network Function Virtualization Platform

Ø NFV is framework to virtualize network function (NFs): load balancers, firewall, IDS etc.

Ø We are using OpenNetVM (ONVM) NFV platform in a COTS edge server to host NFs and ML applications in containers.

Ø ONVM uses the DPDK library and shared memory optimized for fast packet processing.

Ø Packets that arrive in the edge server go into a shared memory location before beingaccessed by NFs and ML applications.

ML Application

Challenges of using GPU in Edge Servers

Ø ML applications exploit GPUs for speeding up computation.

Ø But: getting data to/from GPUs causes additional latency

Ø Challenges with GPU§ GPU are PCIe resident devices and

all data is transferred to GPU via DMA using PCIe BUS

§ NVIDIA GPU’s require the data to be stored in page-locked “pinned” host memory to initiate DMA

§ Initiating data transfer from host one packet at a time has high overhead

DPDK Shared Memory[Pinned]

Image data in Packets

GPU DMA Engine

Deep Neural Network Image Buffer

Output

Transfer data from the buffer using single

cudaMemcpy()

Transfer data from the GPU using cudaMemcpy()

DMA Engine

Ø We used a data transfer kernel using NVIDIA CUDA’s universal virtual addressing (UVA) to access and transfer the data from payload of the packet to the GPU.

Our Solution (NetML)

DPDK Shared Memory[Pinned]

Image data in Packets

GPU DMA Engine

Deep Neural Network Image Buffer

Output

Data transfer kernel initiates DMA from GPU side utilizing

UVA

Transfer data from the GPU using

cudaMemcpy()

Data transfer kernel

DMA Engine

Pointers to the packet payload

Results

Figure : Taken from https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/Machine Learning Section

First Packet Arrives

cudaMemcpy of image data

0.9 ms

Copying Into Host Side Buffer

Machine Learning Section

First Packet Arrives

Data Moving Kernel

0.2 ms

NetML

K. K. RamakrishnanUniversity of California, Riverside

Aditya DhakalUniversity of California, Riverside

4.44.54.64.74.84.95

5.15.25.35.4

200 300 400 500 600 700 800

InferenceTime(Milliseconds)

Packet Size (Bytes)

Inference Time per Image in Pytorch

Copying MethodNetML

0

1000

2000

30004000

5000

6000

7000

8000

Copying Method NetML

Timeinmicroseconds

Method Used for Inference

Time Spent on Each Phase of Execution

Receving DataTransferring to GPU

Processing Time

50000

100000

150000

200000

250000

300000

350000

400000

450000

512 bytes 768 bytes

CPU

Consumption(Cycles)

Packet Size (Bytes)

CPU cycles consumed for moving each image to GPU

Copying MethodNetML