Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Arthur Ruder
Implementation of Artificial Intelligence in FPGAs
Embedded Computing Conference 2019
3 September 2019Enclustra GmbH
FPGA Design Center
Customer-Specific
Design Services
FPGA Design Center
Customer-Specific
Design Services
FPGA Solution Center
Standard Products:
FPGA/SoC Modules and IP Solutions
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
Wireless Communications(Software Defined Radio)
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
Wireless Communications(Software Defined Radio)
Smart Cameras(Computer Vision, Image Processing)
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
Wireless Communications(Software Defined Radio)
Embedded interfaces(PCIe, USB, AXI, Ethernet, etc.)
Smart Cameras(Computer Vision, Image Processing)
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
Wireless Communications(Software Defined Radio)
Embedded interfaces(PCIe, USB, AXI, Ethernet, etc.)
Test and Measurement(Sensors, Data Acquisition, DSP)
Smart Cameras(Computer Vision, Image Processing)
FPGA Design Center
Customer-Specific
Design Services
Wired Networks and Switching(HPC Interconnects)
Wireless Communications(Software Defined Radio)
Embedded interfaces(PCIe, USB, AXI, Ethernet, etc.)
Test and Measurement(Sensors, Data Acquisition, DSP)
Smart Cameras(Computer Vision, Image Processing)
Drive/Motion Control(BLDC, Multi-Axis, Medical Testing)
FPGA Solution Center
Standard Products:
FPGA/SoC Modules and IP Solutions
FPGA Solution Center
Standard Products:
FPGA/SoC Modules and IP Solutions
Mars FPGA/SoC Module Family
Mercury(+) FPGA/SoC Module Family
FPGA Solution Center
Standard Products:
FPGA/SoC Modules and IP Solutions
Mars FPGA/SoC Module Family
Mercury(+) FPGA/SoC Module Family
FPGA Manager
3 September 2019- 13 -
Interest in AI
https://www.forbes.com/sites/louiscolumbus/2018/01/12/10-charts-that-will-change-your-perspective-on-artificial-intelligences-growth/#589705f94758
- 13 -
17-July-2019
- 14 -
AI/Machine Learning Applications
• Computer vision
• Image detection
• Image classification
• …
• Language processing
• Speech recognition
• Translation
• …
• Recommendation systems
• …
14
3 September 2019
- 15 -
Overview: What is machine learning? 27.08.2019
3 September 2019
- 16 -
Artificial Intelligence
Overview: What is machine learning? 27.08.2019
3 September 2019
- 17 -
Artificial Intelligence
Machine Learning
Overview: What is machine learning? 27.08.2019
3 September 2019
- 18 -
Artificial Intelligence
Machine Learning
Overview: What is machine learning? 27.08.2019
mo
re a
dva
nce
d
con
cep
ts
3 September 2019
- 19 -
Artificial Intelligence
Machine Learning
Overview: What is machine learning? 27.08.2019
Supervised
Learning
mo
re a
dva
nce
d
con
cep
ts
Deep learning neural networks
• Convolutional
• Recurrent
• LSTM
• …
Regression
• Linear
• Logistic
3 September 2019
- 20 -
Artificial Intelligence
Machine Learning
Overview: What is machine learning? 27.08.2019
Supervised
Learning
mo
re a
dva
nce
d
con
cep
ts
Unsupervised
Learning
Reinforcement
Learning
Deep learning neural networks
• Convolutional
• Recurrent
• LSTM
• …
Regression
• Linear
• Logistic
3 September 2019
- 21 -
Neural Networks
input layer:
e.g. pixels
hidden layer 1
output layer:
e.g. probabilityhidden layer 2
𝑤1
𝑤2
𝑤3
𝑥1
𝑥2
𝑥3
𝑎
𝑎𝑎
21
3 September 2019
- 22 -
Training neural networks- 22 -
forward-propagation
back-propagation
Untrained neural network
ResNet50Result:
cat
Label:
dog
For one picture: image classification (cat or dog)
Labelled data
3 September 2019
- 23 -
Training neural networks- 23 -
forward-propagation
back-propagation
Untrained neural network
ResNet50Result:
cat
Label:
dog
For one picture: image classification (cat or dog)
7.7 billion operations
~35 MB parameter storage
Labelled data
3 September 2019
- 24 -
Training neural networks- 24 -
23 billion operations
~380 MB parameter storage
forward-propagation
back-propagation
Untrained neural network
ResNet50Result:
cat
Label:
dog
For one picture: image classification (cat or dog)
7.7 billion operations
~35 MB parameter storage
Labelled data
3 September 2019
- 25 -
Training neural networks- 25 -
23 billion operations
~380 MB parameter storage
forward-propagation
back-propagation
Untrained neural network
ResNet50Result:
cat
Label:
dog
For one picture: image classification (cat or dog)
7.7 billion operations
~35 MB parameter storage
* for forward propagation only, backward propagation similar
Labelled data
3 September 2019
- 26 - 3 September 2019
- 26 -
forward-propagation
Inputs: e.g.
photographs
Outputs: classification
probability
99.07 % dog
0.93 % cat
trained network
Inference
• Example: Use neural network for image classification
- 27 - 3 September 2019
- 27 -
Requirements: inference
- 28 - 3 September 2019
- 28 -
• Edge requirements
• Cloud requirements
Requirements: inference
- 29 - 3 September 2019
- 29 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Cloud requirements
Requirements: inference
- 30 - 3 September 2019
- 30 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Cloud requirements
Requirements: inference
- 31 - 3 September 2019
- 31 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Cloud requirements
Requirements: inference
- 32 - 3 September 2019
- 32 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
Requirements: inference
- 33 - 3 September 2019
- 33 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
• Low latency, e.g. search engines
Requirements: inference
- 34 - 3 September 2019
- 34 -
• Edge requirements
• Low (deterministic) latency (e.g. real-time object detection)
• Power efficiency (limited battery capacity)
• Sensor fusion (e.g. industrial surveillance)
• Robustness (e.g. temperature)
• Cloud requirements
• Low latency, e.g. search engines
• Power efficiency (heat dissipation/cooling cost)
Requirements: inference
- 35 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 35 -
Qualitative hardware comparison
- 36 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 36 -
Qualitative hardware comparison
- 37 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 37 -
Qualitative hardware comparison
- 38 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 38 -
Qualitative hardware comparison
- 39 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 39 -
Qualitative hardware comparison
- 40 -
Requirements GPU FPGA ASIC
Low latency
High throughput
Power efficiency
Sensor fusion
Robustness
Programmability
Flexibility
Ease-of-use
(Development) cost
3 September 2019
- 40 -
Qualitative hardware comparison
- 41 -
FPGA ML workflow
3 September 2019
- 41 -
- 42 -
FPGA ML workflow
3 September 2019
- 42 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Trained network
Floating point model
- 43 -
FPGA ML workflow
3 September 2019
- 43 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Trained network
Floating point model
Compression
- 44 -
FPGA ML workflow
3 September 2019
- 44 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Trained network
Floating point model
Compression
- 45 -
Quick digression- 45 -
- 46 -
FPGA ML workflow
3 September 2019
- 46 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Trained network
Floating point model
Compression
- 47 -
FPGA ML workflow
3 September 2019
- 47 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Compilation
Trained network
Floating point model
Compression
- 48 -
FPGA ML workflow
3 September 2019
- 48 -
Challenge: efficient mapping of floating point model to FPGA implementation
without losing accuracy
FP32
Pruning
Pruned network
Quantization
Compilation
FPGA implementationTrained network
Floating point model
Compression
Fixed Point
- 49 -
Impact of compression
3 September 2019
- 49 -
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
- 50 -
Impact of compression
3 September 2019
- 50 -
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
- 51 -
Impact of compression
3 September 2019
- 51 -
https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf
Compression allows using significantly less resources when
deploying a neural network
without degrading network accuracy
- 52 -
Toolchains for AI on FPGAs
Provider
Edge Cloud
Computer vision Language processing Computer visionLanguage processing
Xilinx
DNNDK
(Deep Neural Network
Development Kit)
- ML (Machine Learning) Suite
Intel - - OpenVINO
Omnitek DPU (Deep Learning Processing Unit) + software framework
Lattice sensAI -
52
3 September 2019
- 53 -
Toolchains for AI on FPGAs
Provider
Edge Cloud
Computer vision Language processing Computer visionLanguage processing
Xilinx
DNNDK
(Deep Neural Network
Development Kit)
- ML (Machine Learning) Suite
Intel - -
Omnitek DPU (Deep Learning Processing Unit) + software framework
Lattice sensAI -
53
3 September 2019
- 54 -
SD card
High-level DNNDK tool flow
3 September 2019
- 54 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
- 55 -
SD card
High-level DNNDK tool flow
3 September 2019
- 55 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
1
- 56 -
SD card
High-level DNNDK tool flow
3 September 2019
- 56 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
2
- 57 -
SD card
High-level DNNDK tool flow
3 September 2019
- 57 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
3
- 58 -
SD card
High-level DNNDK tool flow
3 September 2019
- 58 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
4
- 59 -
SD card
High-level DNNDK tool flow
3 September 2019
- 59 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure
5
- 60 -
SD card
High-level DNNDK tool flow
3 September 2019
- 60 -
DPU IP .hdf
dpu.ko
dputils
n2cube
device tree
ai_application.cc
dpu_model.elf
sysroot
BOOT.BIN
image.ub
ai_application.elf
Vivado Petalinux
XSDK
DNNDK
dece
nt
dn
ncweights
network
structure6
- 61 -
Summary61
3 September 2019
- 62 -
Summary
• Neural network inference is viable on FPGAs
62
3 September 2019
- 63 -
Summary
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
63
3 September 2019
• Cloud examples
Xnor.ai: solar powered
person detection
• Edge examples
- 64 -
Summary
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
64
3 September 2019
• Cloud examples
CERN: sensor data filtering
and classificationXnor.ai: solar powered
person detection
• Edge examples
- 65 -
Summary
• Neural network inference is viable on FPGAs
• Low power (~mW – W)
• Sensor integration
• Flexibility
• Low deterministic latency
65
3 September 2019
• Cloud examples
CERN: sensor data filtering
and classification Microsoft: Azure cloud AIXnor.ai: solar powered
person detection
• Edge examples
Thanks for your attention
Visit the Enclustra booth for an AI demonstrator!
66
3 September 2019