27
Edge AI with TI Jacinto Processors 高效邊緣智能 - TI Jacinto 處理器 August 2021 Andre Tseng & Rio Chan 1

Edge AI with TI Jacinto Processors CN

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edge AI with TI Jacinto Processors CN

Edge AI with TI Jacinto Processors 高效邊緣智能 - TI Jacinto 處理器

August 2021

Andre Tseng & Rio Chan

1

Page 2: Edge AI with TI Jacinto Processors CN

Webinar agenda

• Edge AI system challenge

• Introducing TDA4x processors for practical embedded edge AI systems

• TI Edge AI software, tools and services for accelerated edge AI development

• TI Edge AI Cloud Service Demo

• Getting started

2

Page 3: Edge AI with TI Jacinto Processors CN

Embedded edge AI technology | Unlimited possibilities

3

Factory Automation Retail Automation

Smart building &

cities Industrial transport Healthcare

Agriculture Construction

Aerospace &

Defense DeliveryLogistics

Smart Cameras, Autonomous Machines & Robots

AI is influencing broad applications

New use cases in existing applications

Page 4: Edge AI with TI Jacinto Processors CN

Edge AI system challenge

CSI-2 RX /

USB

ENCODE

ISP / DECODEVISUALIZATION /

DISPLAY

CAPTURE

RAW /

RTSP

CAMERA

PROCESS /

DECODE

DEWARP

SCALE,,

CROP, …

CLASSIFY,

DETECT,

….

ONSCREEN

DISPLAY

COMPRESS

STREAM /

STORAGE

IMAGE

PROCESSING

DEEP

LEARNING

OUTPUTDEPTH &

MOTION

ESTIMATION,

COMPUTER

VISION

PROCESSING

Diverse workload, lot of compute horsepower for real-time processing, but at lower system-

power

Robust AI system, with functional safety security, at lower system-cost & complexity

Complex vision pipeline, multi-camera image processing, classical computer vision and AI

Page 5: Edge AI with TI Jacinto Processors CN

Building a practical edge AI system

5

Meet power, thermal &

physical constrains

Deliver speed, latency and

accuracy requirements at consistently, robustly and

reliably, even under harsh

environments

Optimized system cost Easy to develop software

development kit

Optimized on 4 vectors

Performance Power, size & weight CostFast development

cycle

Page 6: Edge AI with TI Jacinto Processors CN

TDA4x processor architecture for practical embedded edge AI systems

Page 7: Edge AI with TI Jacinto Processors CN

Safety MCUCAN-FD

LIN

SPI

Radar/

LIDARMCU

Ethernet Switch

PCIe

Image

Sensor

Applications Processor

GPU (analytics)

Generalcompute

Typical architecture

ISP

PCIe

GPU (analytics)

ISP

Ethernet Switch

Generalcompute

Safety MCUMCU

disparity

TDA4x processors enable practical embedded edge AI

TDA4x processors

LPDDR4

Multi-core A72

DSP

Safety MCU

Deep Learning Accelerator

CSI-2 ports, USB 3, Ethernet & PCIe

Switches

Large Internal mem, highspeed bus

Imaging & Vision acceleratorsISP | LDC| MSC| NF | SDE | DPF

Jacinto™ TDA4x

Security Accelerators

Video codecaccelerator

GPU Display

Multi-core MCU

SIL-3

SIL-2

AI with high-performance, at low power and optimized

system cost

Programming with Industry standard APIs

Page 8: Edge AI with TI Jacinto Processors CN

C7x + MMA | Industry’s most efficient deep learning accelerator

▪ C7x DSP + Matrix Multiply Accelerator (MMA)– Programable accelerator for tensor, vector and scalar processing

▪ Smart memory architecture results in up to 90% utilization of the accelerator and DDR BW savings

– High bandwidth interconnect, Large internal memory, 4D

programmable DMA, Data forward engine

▪ Self sustained for deep learning work-loads– No dependency on host ARM, GPU, has its own DMA engine and

memory sub-system

CScalarVector

L1I

L1D

L2

MSMC

C7x Core

Safety Prefetch

Safety Hist/LUT

Safety

Firewall DMA

Coherence

Safety

StreamingEngine

MMU

Safety

MMA

L3 DCUDMA

8 TOPS, Int-8, 80 GFLOPS @ 1GHz, per core

High FPS/TOPS

Designed for Lower power

Enables Fan-less design

512-bit wide, 64 GB/sec

Designed for Functional Safety

ECC on data memory Using TI Proprietary Technology

Lowest #of DDR interfaces &

bandwidth

Page 9: Edge AI with TI Jacinto Processors CN

Reimagine what’s possible with TDA4x processors

9

741

162

385

0

100

200

300

400

500

600

700

800

Mobilenet V1(224x224)

Resnet-50 V 1.5(224x224)

SSD-MobileNets-v1(300x300)

Fra

mes p

er

sec (FPS

)

MLPerf 0.7 Benchmarks

12.5

0

2

4

6

8

10

12

14

Fra

mes P

er

Sec (

FPS

SSD-ResNet34

(1200x1200)

22

10296

58

0

20

40

60

80

100

120

Resnet 50-v1

( 1MP)

MobileNet v1

(1MP)

MobileNet v2

(1MP)

IncpetionNet v1

(1 MP)

Fra

mes p

er sec (F

PS

)*

Feature Extraction Networks1MP (1024x1024) resolution

** 5-10% performance boost expected with future optimizations

DL inference performance on TDA4VM (8 TOPS), 8-bit fixed-point, Batch size 1, single 32-bit LPDDR4

Page 10: Edge AI with TI Jacinto Processors CN

De-warping engine Image Pyramid Imaging sub system

Accelerated imaging and computer vision

WDR w ith 3 exposure

50% higher Bit-depth

3A statics support

Support 180 and 360 FOV 10 different scales per input True 2D Bilateral f iltering

• Accelerate 10x 2 MP @ 30 fps camera, real-time

• Replace FPGA & custom ISP Chips, free up CPU MHz

• Reduce system power, latency and BOM cost

Page 11: Edge AI with TI Jacinto Processors CN

Accelerated computer vision

Stereo Depth Estimation

• Depth estimation from tw o different view s

• Confidence score for each disparity output

• Scalable, 2MP, 192 disparities, 80 MP/s

• 2D motion vector f ield estimation given tw o images

• Confidence score for each f low vector output

• Scalable, 2 MP, 150 MP/s

Dense Optical Flow

• Accelerate Depth and Motion Perception on multiple cameras in < 0.5 W

• Free CPU MHz

• Reduce system power, latency and BOM cost

Page 12: Edge AI with TI Jacinto Processors CN

Architecture

TDA4x processors functional safety

12

High Speed Interconnect 16nm FF

ASIL D

Security AccelerationCrypto: AES, 3DES, SHA, PKA, RNG

Encode

Decode

Video AccelerationEncode

Decode

Ethernet SwitchUp to 8 PortsETHERNET

MMA-+ * =

C7x DSP

32k/48K L1

512KB L2

ASIL B

Po

we

r Is

ola

tio

n

ARM Cortex A7x48k/32K each

Arm Cortex A7x48k/32K each

1M shared L2

MSM

C

Display Subsystem

1x eDP + 1x DSI

Capture Subsystem

2x CSI2 4L RX 2.5 Gbps

1x CSI2 4L TX 2.5 Gbps

GPIO

IPC

IOMMU

UDMASMMU Debug

Timers WDTSystem Services

Hardware Diagnostics

8MB L3 RAM/Cache w/ECC011100

100010001111

32b LPDDR4-4266

Arm Cortex R5F

32K/32K L1

64KB RAM

Arm

Cortex R5F32K/32K L1

64KB RAM Lock S

tep

0.5 MB SRAM

MCU Island

DMSC

Device Management &

Security Controller

Hardware Diagnostics

CRC

RTI

ESM

DCC

BIST

Atm

Cortex R5F32K/32K L1

64KB RAM

Atm

Cortex R5F

32K/32K L1

64KB RAM Lock S

tep

1MB SRAM

2x OSPI (XIP)

XIP

2x ADC

3x I2C*

3x SPI*

UDMA

2x UART*

2x I3C*

GPIO

2x

RGMII/RMII

Furian GE8430 GPU

MMA32k/32K L1

288KB L2

C66xDSP

MMA

-+ * =

32k/32K L1

288KB L2

C66xDSP

Connectivity Network

4x PCIe 14x

Serial

1x I3C1x SDIO

4x McSPI2x UART

5x McASP

3x I2C

Storage

GPMC 1x SD 3.0

1x UFS 2.x 1x eMMC5.x

1x MediaLB

2x USB RGMII/RMII

Arm

Cortex R5F

32K/32K L1

64KB RAM

Arm

Cortex R5F

32K/32K L1

64KB RAM Lo

ck S

tep

0.5 MB SRAM

Vision Processing ACCISP NF, REMAP, MSC

Depth & Motion PACDense Optical

Flow STEREO

Architecture Software Collateral

• ASIL-D/SIL-3 Systematic Capability • Built-in Hardware Diagnostics• ASIL-D/SIL-3 Safety MCU Island• ASIL-B/SIL-2 Main Domain• FFI, ECC, Clock Comparators• Voltage & Temperature Monitors

• TUV Certified Safety Software Process• Safety Diagnostics Reference & Examples• Self Test Libraries• SW FMEDAs, Code Coverage, Traceability

Reports • Compliance Support Packages

• Compiler Qualification Kit

• Device Safety Manual • Configurable FMEDA • Safety Analysis Report• Safety Assessment Certificate • Trainings • Whitepapers & Application

Page 13: Edge AI with TI Jacinto Processors CN

TI Edge AI software, tools and services for accelerated edge AI development

Page 14: Edge AI with TI Jacinto Processors CN

Industry standard APIs and Framework

TI Edge AI processor

Deep learning

Python & C++ application layer

Applications

Imaging Vision VideoArm® Cortex®-A

Multi-camera AI processing Sensor fusion Secure cloud connection

TensorFlow Lite ONNX RT OpenCVGStreamer DockerTVM

Graphics

OpenGLES

Hardware

Accelerators

Fast development cycle with industry standard APIs

DSP

TI tools and middleware for HW accelerator

Multi-video AI

processing

Page 15: Edge AI with TI Jacinto Processors CN

AI in your system | Three steps

Train anywhere, Develop anywhere Compile & Optimize for TI SoCUsing industry standard Compilers/RTs

Deploy on TI SoCUsing industry standard APIs

• Common representation • Post Training Quantization • Calibration• Optimization

• Compilation

TFLite / ONNX-RT/ TVM Compiler

TI Edge AI Processor

TFLite RT /ONNX-RT/Neo-AI-DLR

TIDL RunTime

Linux

Cortex-A DLA

RTOS

TIDL Library

1. TI Model Zoo (60+ models)• Model Selection tool• New weights

2. Own model

Optional QAT (Quantization artifacts tool) from TI

https://github.com/TexasInstruments/jacinto-ai-devkit

Accelerated inference using open-source industry standard RunTime Engines

Out of box optimized inference support for 60+ modelsDL Tools & software to reduces

model development time

1 2 3

Page 16: Edge AI with TI Jacinto Processors CN

TI Edge AI Cloud for faster edge AI evaluation

Collect latency, FPS, accuracy, DDR BW and Power benchmarks in minutes

• < 1 min to explore & compere performance : Model Selection tool

• < 5 min to experience SW & evaluate HW : TI Model Zoo examples

• < 30 min to evaluate custom models : Custom model examples

• 1 hr+ to benchmark performance : TI Model Zoo examples

In-minutes evaluation

Available now at https://dev.ti.com/edgeai !!

Free on-line service, enable deep learning evaluation in minutes

EVM FarmUsers

Evaluate TI SoC DL capabilities in remote EVM

farm using web browser, Jupyter Notebook

No EVM buy

Page 17: Edge AI with TI Jacinto Processors CN

Fast evaluation and easy to program software

< 1 min < 5 min 1 hr+ < 30 min

Use custom model and open the same program that ran on the PC

TI Model Zoo

Page 18: Edge AI with TI Jacinto Processors CN

Extensive and pre-trained | Models available ready to use

▪ TI’s Model zoo: 60 plus models to choose from

▪ Select type of function: Classification, Detection

or Segmentation

▪ Select the runtime: Tflite or TVM or ONNX-RT

Page 19: Edge AI with TI Jacinto Processors CN

Model porting | PC to embedded device

http://softw are-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/07_03_00_07/exports/docs/tidl_j7_02_00_00_07/ti_dl/docs/user_guide_html/index.html

Model Inference on TI SoC

Page 20: Edge AI with TI Jacinto Processors CN

Deep learning evaluation with TI Model Zoo in < 5 min

• Example Jupyter Notebook & Python scripts for TFLite, ONNX-RT and Neo-AI-DLR

• Select models from TI Model Zoo, run pre-compiled artifacts and evaluate SW and HW

Page 21: Edge AI with TI Jacinto Processors CN

Bring your own model and deploy in < 30 min

• Jupyter Notebook & Python scripts for TFLite, ONNX-RT and Neo-AI-DLR

• Bring your own model, compile and deploy & get automatic acceleration

Page 22: Edge AI with TI Jacinto Processors CN

Getting started

Page 23: Edge AI with TI Jacinto Processors CN

TDA4VM processor Arm® CPU core(s) 2 Arm® Cortex®-A72 64-bit, @ 1.8 GHz

Co-processor(s) MCU island of 2 Arm Cortex-R5F (lockstep opt), SoC main of 4 Arm Cortex-R5F (lockstep opt)

Neural network accelerator C7x DSP w/ MMA, 8 TOPS

Computer vision accelerator(s)ISP, Image rectification, Multi-scaling, Noise Filtering@ 720 MP/s

Depth (80 MP/s) and motion (150 MP/s),

Decode 4K60 H.264/H.265

Encode 3x1080p30 H.264

GPU 100 GFLOPS

Display 2 DPI, 1 DSI, 1 EDP

Ethernet MAC & PCIe 8-port 2.5Gb switch, 4x2L PCIe gen-3

SecurityCryptographic acceleration, Debug security, Device identity, Isolation firewalls, Secure boot &

storage & programming, Software IP protection, Trusted execution environment

Rating Automotive & Industrial

Operating temperature range -40 to 125

23https://www.ti.com/lit/gpn/tda4vmRefer to TDA4VM for full specification:

Page 24: Edge AI with TI Jacinto Processors CN

Processor SDK | Out-of-box AI demos

Semantic SegmentationObject DetectionImage Classification

8x 2MP @ 30 fps real-time image

processing- Room to process 2 more cameras

• Demonstrate RAW to RGB

processing

• Image distortion correction

• Flexible programming sub-system

Image pre-processingHardware accelerated

Deep LearningHardware accelerated

▪ Out-of-box example for image classification, object detection and semantic segmentation

▪ Model zoo: 60+ pre-trained TF, PyTorch, TFLite, ONNX and MXNet models validated on TI

processors

Demonstrate simultaneous execution of multiple models

Semantic Segmentation (Resolution: 768x384)

Object Detection(Resolution 512x512)

Object Detection(Resolution 512x512)

▪ Image Pre-processing demos

▪ Deep Learning demos

Extensive demosAv ailable now!

Available in SDK: Demo applications for deep learning & image processing

Page 25: Edge AI with TI Jacinto Processors CN

Getting started | Add intelligence with embedded Edge AI from Texas Instruments

Full development

Software development kits

Support

Product Folder: https://www.ti.com/product/TDA4VM

TDA4 EVM: http://www.ti.com/tool/TDA4VMXEVM

TI Processor SDK – Seamlessly reuse and migrate Linux, Linux-RT and TI-RTOS software across TI processors

http://www.ti.com/tool/PROCESSOR-SDK-DRA8X-TDA4X

https://e2e.ti.com

Please also let us know any specific topics you want us to cover in the futureMore Information: [email protected]

TI Edge AI Cloud evaluation

Zero-cost & in-minutes evaluation of TDA4VM hardware

https://dev.ti.com/edgeai

Page 26: Edge AI with TI Jacinto Processors CN

©2021 Texas Instruments Incorporated. All rights reserved.

Page 27: Edge AI with TI Jacinto Processors CN

IMPORTANT NOTICE AND DISCLAIMERTI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCEDESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANYIMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRDPARTY INTELLECTUAL PROPERTY RIGHTS.These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriateTI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicablestandards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants youpermission to use these resources only for development of an application that uses the TI products described in the resource. Otherreproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third partyintellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims, damages,costs, losses, and liabilities arising out of your use of these resources.TI’s products are provided subject to TI’s Terms of Sale (https:www.ti.com/legal/termsofsale.html) or other applicable terms available eitheron ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’sapplicable warranties or warranty disclaimers for TI products.IMPORTANT NOTICE

Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265Copyright © 2021, Texas Instruments Incorporated