Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Agenda
˃ Acceleration Storage acceleration benefit computeMove compute to dataCombine with Data Analysis Application
˃ Computational Storage PlatformPCIE Peer to Peer Data TransferDeveloper ToolsBig Stream platfrom
˃ Solution Proof PointsBig Stream AccelerationResult
˃ Summary
© Copyright 2019 Xilinx
Video & ImageProcessing 10x*
Data Analytics 90x*
Genomics 100x*
MachineLearning 40x*
Compute Acceleration Inline Storage Acceleration
存储加速增压计算
© Copyright 2019 Xilinx
AcceleratedOpen Frameworks
AcceleratedLibraries
DevelopmentEnvironment
Development Boards
Development Stack
System Developers
Software Application Developers
Machinelearning
Databaseanalytics
扩展至软件应用开发者
© Copyright 2019 Xilinx
从计算到数据….
CPU0 CPU1
FPGA
Storage
Accel
IO
DD
R
DD
R
CPU0 CPU1
FPGA
Storage
Accel
IO
DD
R
DD
R
CPU1CPU0 CPU1
FPGA
Storage
Accel
IO
DD
R
DD
R
˃ “Offload” storage centric workloads
˃ “Split personality” – offload or storage
˃ Accelerate storage services “Inline”
Encryption, compression, hashing
˃ Tighter integration
˃ Compute near StorageInline, offload and moreSearch, Bigdata
© Copyright 2019 Xilinx
DB-SSD
Workload Database Services -Adv
Features
KV
ETL offloads
Query offloads
ML Inference
CS HDD / SSD Platform #3
KV
Query Engines
Object Storage
Multiple Software PersonalitiesOne Hardware Platform
ETL
Query Engines
ML Inference
Big Data Analytics
Compression
Offline Curation
ML Inference
Big Data Mining
Anomaly Detection, Fraud Prevention, Smart Surveillance
HDD / SSD HDD / SSD HDD / SSD
2X Throughput System
TCO
Needle in Haystack
计算分析图
© Copyright 2019 Xilinx
计算存储 - 概述
Kernel
Buffer1 Buffer2
Buffer2’Buffer1’
Host.exe
PCIE BUS
Kernel
Buffer1 Buffer2
Buffer2’
Host.exe
PCIE BUS
Key benefit : Avoiding extra copies into Host DDR
No Change
© Copyright 2019 Xilinx
计算存储的优势
Computational Storage Solution avoids copying to x86 DDR
SSD-x86 DDR func1 func2CPU
Without acceleration
func1
SSD-x86 DDR func2CPU
FPGA
With FPGA compute acceleration (offload compute)
Host to FPGA-DDR
X86DDR
func1
func2CPU
FPGA
With FPGA computational storage acceleration (offload compute & I/O)
SSD to FPGA DDR
X86DDR
© Copyright 2019 Xilinx
Bigstream Hyper-acceleration 层Address the whole big data process
Zero code change
Cross platform
Intelligent, automatic
computation slicing
Cross acceleration hardware
2X to 30X acceleration
Dataflow Adaptation Layer
B igstream Hypervisor
HYPER-ACCELERATION
B igstream Dataflow
BIG DATA PLATFORMS
Many-cores GPU FPGA
© Copyright 2019 Xilinx
Apache Spark + Bigstream Hyper-Acceleration
Bigstream Compiler
Resource Manager
Many-cores GPU FPGA
Catalyst
Cluster Management
Master NodeClient Application
Big DataPlatform APIs
Application Commands
Executor Node
Node Manager
Spark Task
• Zero Code Change
• Cross Platform• Intelligent, Automatic
Computation Slicing• Cross-Hardware Acceleration
• 2-30X Acceleration
YBigstream Runtime
Executors
NAccelerate?
Physical Plan
Tasks (Normal/Hyper-accelerated)
HW Accelerator TemplatesHyper-Acc Tasks
Resource management messages
Application Master
© Copyright 2019 Xilinx
硬件加速器引擎
Deserialization
JSON
CSV
Parquet (under development)
FIX (under development)
DecompressionSnappy
GZIP (3rd Party)Encryption/Decryption AES
SQL
Project
Filter
Sort
Hash Aggregate
Search/Regex PCRE (3rd Party)
CPU Cores RISC-V (3rd Party)
Machine Learning
Linear/Logistic Regression
K-means
Deep Learning
CNN (3rd Party)
RNN (3rd Party)
NetworkingIP/UDP
IP/TCP (3rd Party)
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space1. Identify and load FPGA bitstream based onacceleration template match
FPGA
Offload Base AcceleratorEcosystem
OSS/3rd Party
(Existing Stack)
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
2
1. Identify and load FPGA bitstream based onacceleration template match
2. Software configuration of FPGA tocustomize hardware template for theapplication
FPGA
Offload Base AcceleratorEcosystem
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack)
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
FPGA
3 2
Offload Base AcceleratorEcosystem
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack)
1. Identify and load FPGA bitstream based onacceleration template match
2. Software configuration of FPGA tocustomize hardware template for theapplication
3. Issue “accelerated” compute task(requires I/O requests to the SSD)
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
3
4
2
1. Identify and load FPGA bitstream based onacceleration template match
2.Software configuration of FPGA tocustomize hardware template for theapplication
3. Issue “accelerated” compute task(requires I/O requests to the SSD)4.Copy input data from host to FPGA memory and back again to the application user space memory to completeFPGA
Offload Base AcceleratorEcosystem
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack)
© Copyright 2019 Xilinx
Rewind for Computational StorageNo hardware change , adopt Xilinx computational
storage framework
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
2
1. Identify and load FPGA bitstream based onacceleration template match
2. Software configuration of FPGA tocustomize hardware template for theapplication
FPGA
Computational Storage
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack)
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
FPGA
3 2
Computational Storage
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack) FPGA DRAM
1. Identify and load FPGA bitstream based onacceleration template match
2. Software configuration of FPGA tocustomize hardware template for theapplication
3. Issue “accelerated” I/O + compute requests to the SSD to pump data into the FPGA
© Copyright 2019 Xilinx
HYPE R-AC CEL ERA TION
Database of Templates
User Space File System
FPGA Driver
NVMe / PCIe DriverOS
Host CPU Host DRAM
Host Interface Controller
SSD FPGA
Database of IPs
Hardware
User Space
3
4
2
FPGA
1Template 1
Template 2
Template 3
Engine 1
Engine 2
Engine 3
OSS/3rd Party
(Existing Stack) FPGA DRAM
1. Identify and load FPGA bitstream based onacceleration template match
2.Software configuration of FPGA tocustomize hardware template for theapplication
3.Issue “accelerated” I/O + compute requests to the SSD to pump data into the FPGA
4.FPGA copies the result to the application user space
Computational Storage
© Copyright 2019 Xilinx
Data Streamer Format deserializer
inData Blocks/pages
Data
inCh
anne
lou
tCha
nnel
Accel 3 Accel 2AggData Intermediate Intermediate
data data
计算存储加速器
Result
FPGA
Flash Storage
Accel 1
© Copyright 2019 Xilinx
总体加速比较
~3100 secs
~1800 secs
~9000 secs
Baseline – No Acceleration
Hardware Acceleration Only: ~3x
Software + Hardware Acceleration: ~5x
© Copyright 2019 Xilinx
可衡量的硬件+软件协同任务加速
0
5000
10000
15000
20000
25000
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93
Task
Tim
e(m
s)
Num Tasks Completed
SW Task End Time
HW Task End Time
SW+HW Task End Time
© Copyright 2019 Xilinx
总结
˃ Exponential data growth driving the computational storage opportunity to offload compute functions closer to memory and storage
˃ FPGA enabled adaptable storage will enable differentiation and unlock efficiency for storage workloads
˃ Building on the success of Xilinx compute acceleration platform, Xilinx Computational Storage Platform provides ease of application portability and tremendous returns for workloads that are have storage affinity