1
Host The Move Towards the Edge μTVM: Deep Learning on Bare-Metal Devices Logan Weber, Pratyush Patel, and Tianqi Chen Run + measure Compile program Improve implementation Return timing feedback TVM + Program AutoTVM The Problem μTVM’s Approach uwplse.org • sampl.cs.washington.edu • tvm.ai {weberlo, patelp1, tqchen}@cs.uw.edu Logan Weber Pratyush Patel Tianqi Chen This work is supported by the Semiconductor Research Corporation (SRC) and DARPA @ADA_Center adacenter.org High-Level Differentiable IR Tensor Expression IR AutoTVM VTA FPGA ASIC LLVM CUDA μTVM • Plug directly into TVM as a backend • Use the compiler to emit code for the device • Gain access to TVM models and optimization With the astounding success of machine learning in general, many researchers and practitioners have turned their attention to the edge. FPGA Arduino Bare-metal devices are commonly found on the edge, because they are cheap and energy- efficient. Execution Model AutoTVM Compatibility Programming these devices is extremely difficult due to: • Resource constraints • Lack of compute power • Lack of on-device memory management • Restricted language and runtime support • Tedious debugging Conv2D Device The host drives control flow! Computation is done on the device! Host Only send metadata Contact and Acknowledgements Hardware Backend The End Goal The current execution model is slow, but it allows us to use TVM’s automatic tensor program optimizer. Deploy AutoTVM High-Level Optimizer Device Standalone Runtime Optimized Operators Optimized Model MaxPool Conv2D MatMul Model

µTVM: Deep Learning on Bare-Metal Devicespatelp1/misc/utvm-arm...Host The Move Towards the Edge µTVM: Deep Learning on Bare-Metal Devices Logan Weber, Pratyush Patel, and Tianqi

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: µTVM: Deep Learning on Bare-Metal Devicespatelp1/misc/utvm-arm...Host The Move Towards the Edge µTVM: Deep Learning on Bare-Metal Devices Logan Weber, Pratyush Patel, and Tianqi

Host

The Move Towards the Edge

µTVM: Deep Learning on Bare-Metal DevicesLogan Weber, Pratyush Patel, and Tianqi Chen

Run + measure

Compile program

Improveimplementation

Return timing feedback

TVM + Program

AutoTVM

The Problem

µTVM’s Approach

uwplse.org • sampl.cs.washington.edu • tvm.ai

{weberlo, patelp1, tqchen}@cs.uw.edu

Logan Weber Pratyush Patel Tianqi Chen This work is supported by the Semiconductor Research

Corporation (SRC) and DARPA

@ADA_Centeradacenter.org

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM CUDAµTVM

• Plug directly into TVM as a backend

• Use the compiler to emit code for the device

• Gain access to TVM models and optimization

With the astounding success of machine learning in general, many researchers and practitioners have turned their attention to the edge. FPGA Arduino

Bare-metal devices are commonly found on the edge, because they are cheap and energy-efficient.

Execution Model AutoTVM Compatibility

Programming these devices is extremely difficult due to:

• Resource constraints

• Lack of compute power

• Lack of on-device memory management

• Restricted language and runtime support

• Tedious debugging

Conv2D

Device

The host drives control flow!

Computation is done on the device!

Host

Only send metadata

Contact and Acknowledgements

Hardware Backend

The End Goal

The current execution model is slow, but it allows us to use TVM’s automatic tensor program optimizer.

Deploy

AutoTVM

High-Level Optimizer

Device

Standalone Runtime

Optimized Operators

OptimizedModel

MaxPool

Conv2D

MatMul

Model