µTVM: Deep Learning on Bare-Metal Devicespatelp1/misc/utvm-arm...Host The Move Towards the Edge µTVM: Deep Learning on Bare-Metal Devices Logan Weber, Pratyush Patel, and Tianqi

Host

The Move Towards the Edge

µTVM: Deep Learning on Bare-Metal DevicesLogan Weber, Pratyush Patel, and Tianqi Chen

Run + measure

Compile program

Improveimplementation

Return timing feedback

TVM + Program

AutoTVM

The Problem

µTVM’s Approach

uwplse.org • sampl.cs.washington.edu • tvm.ai

{weberlo, patelp1, tqchen}@cs.uw.edu

Logan Weber Pratyush Patel Tianqi Chen This work is supported by the Semiconductor Research

Corporation (SRC) and DARPA

@ADA_Centeradacenter.org

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM CUDAµTVM

• Plug directly into TVM as a backend

• Use the compiler to emit code for the device

• Gain access to TVM models and optimization

With the astounding success of machine learning in general, many researchers and practitioners have turned their attention to the edge. FPGA Arduino

Bare-metal devices are commonly found on the edge, because they are cheap and energy-efficient.

Execution Model AutoTVM Compatibility

Programming these devices is extremely difficult due to:

• Resource constraints

• Lack of compute power

• Lack of on-device memory management

• Restricted language and runtime support

• Tedious debugging

Conv2D

Device

…

The host drives control flow!

…

Computation is done on the device!

Host

Only send metadata

Contact and Acknowledgements

Hardware Backend

The End Goal

The current execution model is slow, but it allows us to use TVM’s automatic tensor program optimizer.

Deploy

AutoTVM

High-Level Optimizer

Device

Standalone Runtime

Optimized Operators

OptimizedModel

MaxPool

Conv2D

MatMul

Model

Documents

µTVM: Deep Learning on Bare-Metal Devicespatelp1/misc/utvm-arm...Host The Move Towards the Edge µTVM: Deep Learning on Bare-Metal Devices Logan Weber, Pratyush Patel, and Tianqi