AIIA DNN benchmark

DNN processor benchmark for Inference at the edge基于端侧推断任务的深度神经网络处理器基准测试首轮评估结果发布

2019.3

AIIA DNN benchmarkV0.5 evaluation results

I. AIIA DNN Benchmark简介 About AIIA DNN Benchmark

II. V0.5版本评估方案简介 Introduction of Version 0.5

III. 评测指标及场景 Metrics and scenarios

IV. 致谢 Acknowledgement

V. 评测结果发布 Results

VI. 结果分析 Interpretation

Content

I AIIA DNN benchmark

Processor hardware system

Machine learning framework

Application scenariosMetrics

Time

PowerOutput

OutputNetwork model

Test data

……Input

Input

Accuracy

Throughput

Cost….

Metrics N

Evaluation Framework:评测框架：

About us: Provide selection reference for application companies, and provide third-party evaluation results for chip companies.关于我们：为应用企业提供选型参考，为芯片企业提供第三方评测结果Aims: The goal of AIIA DNN benchmarks is to objectively reflect the current state of AI accelerator capabilities, and all metrics are designed to provide an objective comparison dimension. 目标：在芯片发展过程中，基于清晰指标的技术竞争可以帮助企业快速进步。AIIA DNN benchmark致力于客观地反应AI加速器能力现状，所有度量指标旨在提供客观的比对维度Evaluation method: step-by-step, version iterations, training and inference, terminal and cloud 工作方式：「版本迭代、不断丰富、不断完善」,训练+推断，端+云

Participant:参与单位：

I Terminal-based deep neural network processor application scenario端侧深度学习处理器应用场景

Voice interaction语音交互

Smart camera智能摄像机

Smart phone智能手机

VR&AR虚拟现实&增强现实

ADAS高级驾驶辅助系统

UAV无人机深度学习对移动端计算芯片提出了新需求，催生AI加速芯片

CPU GPU DSP NPU APU

深度学习移动端框架Deep learning mobile framework

Tengine/TFLite/Caffe2go/MACE/NCNN

应用场景Application scenarios

硬件加速Hardware acceleration

AndroidNN自有架构

proprietary architectureHiAI /SNPE/NeuroPilot

device

start

model

AIIA DNN benchmarkV0.5 Tools

II Version 0.5：Evaluation methods of DNN processor benchmark for Inference at the edge0.5版本：基于端侧推断任务的深度神经网络处理器基准测试评测方法

supported system platforms: Android（ Version 0.5 ）V0.5版本支持系统：安卓

classification分类

Object detection目标识别

Super-Resolution超分辨率

Semantic segmentation语义分割

Two evaluation metrics两类关键评测指标

Four typical application scenario四种典型应用场景

+Nine network models九种网络模型

FPS时间

Algorithm performance算法性能

III Metrics and scenarios 评测指标及应用情景

No Application scenarios Test data（1k frames image from…）

network metrics

1.1 classification ImageNet

Mobilenetv2 fps, top1,top51.2 classification Resnet101

1.3 classification VGG161.4 classification Inception v32.1 Object detection VOC2012 ssd_vgg16(caffe) fps, mAP ,

mIoU2.3 Object detection ssd_mobilenet_v1(caffe)2.4 Object detection ssd_mobilenet_v2(caffe)3 Super-Resolution VOC2012 vdsr fps, PSNR

4.1 Semantic segmentation VOC2012 Deeplabv3+ fps4.2 Semantic segmentation fcn

III Metrics and scenarios 评测指标及应用情景

IV AIIA DNN Benchmark tools were mainly supported by :

HK 南京华科广发

IV AIIA DNN Benchmark Acknowledgement致谢

楚庆汪玉郑南宁

V. Version 0.5 Results

Key words

推断任务端侧首次区分整型与浮点

Inference at the edge int8 fp16 fp32

速度+算法性能指标fps with algorithm performance

开源Open source

Technical details

Test 1: Image classification This task represents a conventional ImageNet challenge where the goal is to classify images into 1000 categories. Three kinds of networks were chosen. The accuracy on the same ImageNet dataset is shown below.

The considered networks represent the most popular and commonly used architectures that can be currently deployed on smartphones

networksCPU Test image

size

preprocess

Top1 Top5 mean var

Caffe

mobilenetv2 70.50% 90.40% 224x224 BGR=(103.94,116.78,123.68) 58.823589

resnet101 71.90% 90.60% 224x224 BGR= (123.15,115.90,103.06) 1

vgg16 65.90% 87.70% 224x224 BGR= (122.58,116.55,103.89) 1

Tensorflow inceptionv3 74.90% 92.70% 299x299 BGR= (127.5,127.5,127.5) 1

Test 2: Object detection Detecting objects in images using a single deep neural network named SSD. mAP and mIoU are calculated on the VOC2012 dataset. Test 3: Image Super-Resolution The goal of the super-resolution task is to reconstruct the original image from its downscaled version. In this test we consider a downscaling factor of 3, and image restoration is performed by the VDSR network.Test 4: Image Semantic SegmentationIn contrast to image classification, the goal of this task is to get a pixel-level image understanding, fcn and deeplabv3+ are used as test network.

processor Kirin 980

description Mobile terminal chip for Mate 20 series

process 7nm

CPU [email protected][email protected][email protected]

GPU Mali-G76@720MHz

interface USB Type-C

system Android 9

supported mobile framework

HiAI/AndroidNN

year 2018

DUT information

Test 1: Image classification场景1：图像分类

fps fcn deeplabv3+

kirin 980(fp16) 1.39 Do not support

Test 2: Object detection场景2：目标识别

Test 3: Image Super-Resolution场景3：图像超分辨率

Test 4: ImageSemanticSegmentation场景4：图像语义分割

processor RK3399

description AI development board

process 28nm

CPU Cortex-A72MP2 + Cortex-A53MP4

GPU Mali-T860MP4

interface USB2.0, USB Type-C

system Android, Ubuntu, Debian9

supported mobile framework Tengine

year 2017

informationDUT

ROC-RK3399-PC

FPS

Algorithm performance

Test : Image classification测试场景：图像分类

In the case of less precision loss, the speed of int8 is significantly higher than that of fp32 under CPU with Tengine framework. 在损失较少精度的情况下，int8的速度在Tengine架构的CPU上比fp32有显著的提升。

VI. Interpretation 分析At present, different hardware solutions for AI computing at the edge lead to different performance of each chip platform under different models and different metrics .

当前，使用不同的模型以及度量指标，不同端侧AI计算的硬件解决方案会为每种芯片平台带来不同的性能表现。

There are still no standard and reliable tools for quantizing networks trained even for image classification. There would be two different ways of development in this area, which are quantization and float networks. And this is the first time that we try to separate the quantized and float models in the benchmark. 目前，仍然没有标准和可靠的工具来量化模型，即使是最简单的图像分类训练的网络。对于移动端部署深度学习的话，将有两种不同的发展方式，即量化和浮点网络。这也是我们第一次尝试在基准测试中将量化和浮点模型分开评测。

fps TFLITE TENGINE TENGINE（int8)inception_v3 0.76 1.29 2.2mobilenet_v2 8.47 13.45 17.414resnet101_v2 1.03 1.1 1.935vgg16 　NULL 0.67 1.115

top1 TFLITE TENGINE TENGINE（int8)

inception_v3 75.50% 77.80% 77.50%mobilenet_v2 69.80% 72.70% 73.30%resnet101_v2 73.80% 75.50% 75.10%vgg16 　NULL 68.90% 68.20%

top5 TFLITE TENGINE TENGINE（int8)

inception_v3 91.00% 93.50% 93.80%mobilenet_v2 86.60% 91.30% 91.30%resnet101_v2 89.80% 93.10% 93.00%vgg16 　 89.40% 89.60%

Work Plan Application scenarios iteration

场景迭代Rich evaluation object: Voice interaction/ADAS/smart camera

评测对象丰富：语音、自动驾驶、安防metrics expansion: Power

指标扩充Benchmark demo update

Benchmark demo更新，增加Linux系统

Release Version 1.0 guidelines :

发布V1.0评估方案Guidelines of artificial intelligence chip benchmark Part 1:Metrics and evaluation methods for terminal-based deep neural network processor benchmark

人工智能芯片测试评估规范：第1部分：人工智能端侧芯片基准测试指标要求和评估方法

Iteration benchmark result——

迭代结果发布2019 Artificial Intelligence Developer Conference

AIIA 2019人工智能开发者大会……

Release Benchmark v0.5 evaluation method: DNN processorbenchmark for inference at the cloud

云端推断v0.5版本发布

Open source &Join us

Github : https://github.com/AIIABenchmark/AIIABenchmark

About us: http://www.aiiaorg.cn/

The open source AIIA DNN benchmark project currently supports Android, but it is easier to promote all POSIX compatible

systems. Users can add a framework or a new model based on the project documentation. The project will be continuously

updated.

开源的AIIA DNN benchmark项目目前支持安卓系统，但比较容易推广所有

POSIX 兼容系统。用户可以根据项目文档，自行添加一个框架或者一个新模型。

项目将持续更新。 Embracing open source is an important part of AIIA. We have been advocating embraced open source, actively participating

in several major open source projects and welcoming more developers to participate.

拥抱开源是AIIA 的重要组成部分，我们一直在倡导拥抱开源，积极参与多个重

大的开源项目也欢迎更多的开发者参与进来。

开源AIIA DNN benchmark项目

Thanks

Contact：[email protected]

Documents

AIIA DNN benchmark