24
赛灵思技术日 XILINX TECHNOLOGY DAY 唐杰 赛灵思数据中心系统架构师 2019 3 19加速大数据和计算存储应用

加速大数据和计算存储应用 - Xilinx · Big Data Mining. Anomaly Detection, Fraud Prevention, Smart Surveillance. HDD / SSD. HDD / SSD. HDD / SSD. 2X ... Address the whole

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

赛 灵 思 技 术 日XILINX TECHNOLOGY DAY

唐杰赛灵思数据中心系统架构师2019 年 3 月19日

加速大数据和计算存储应用

Agenda

˃ Acceleration Storage acceleration benefit computeMove compute to dataCombine with Data Analysis Application

˃ Computational Storage PlatformPCIE Peer to Peer Data TransferDeveloper ToolsBig Stream platfrom

˃ Solution Proof PointsBig Stream AccelerationResult

˃ Summary

© Copyright 2019 Xilinx

Video & ImageProcessing 10x*

Data Analytics 90x*

Genomics 100x*

MachineLearning 40x*

Compute Acceleration Inline Storage Acceleration

存储加速增压计算

© Copyright 2019 Xilinx

从计算到数据….

CPU0 CPU1

FPGA

Storage

Accel

IO

DD

R

DD

R

CPU0 CPU1

FPGA

Storage

Accel

IO

DD

R

DD

R

CPU1CPU0 CPU1

FPGA

Storage

Accel

IO

DD

R

DD

R

˃ “Offload” storage centric workloads

˃ “Split personality” – offload or storage

˃ Accelerate storage services “Inline”

Encryption, compression, hashing

˃ Tighter integration

˃ Compute near StorageInline, offload and moreSearch, Bigdata

© Copyright 2019 Xilinx

DB-SSD

Workload Database Services -Adv

Features

KV

ETL offloads

Query offloads

ML Inference

CS HDD / SSD Platform #3

KV

Query Engines

Object Storage

Multiple Software PersonalitiesOne Hardware Platform

ETL

Query Engines

ML Inference

Big Data Analytics

Compression

Offline Curation

ML Inference

Big Data Mining

Anomaly Detection, Fraud Prevention, Smart Surveillance

HDD / SSD HDD / SSD HDD / SSD

2X Throughput System

TCO

Needle in Haystack

计算分析图

© Copyright 2019 Xilinx

计算存储 - 概述

Kernel

Buffer1 Buffer2

Buffer2’Buffer1’

Host.exe

PCIE BUS

Kernel

Buffer1 Buffer2

Buffer2’

Host.exe

PCIE BUS

Key benefit : Avoiding extra copies into Host DDR

No Change

© Copyright 2019 Xilinx

计算存储的优势

Computational Storage Solution avoids copying to x86 DDR

SSD-x86 DDR func1 func2CPU

Without acceleration

func1

SSD-x86 DDR func2CPU

FPGA

With FPGA compute acceleration (offload compute)

Host to FPGA-DDR

X86DDR

func1

func2CPU

FPGA

With FPGA computational storage acceleration (offload compute & I/O)

SSD to FPGA DDR

X86DDR

© Copyright 2019 Xilinx

Bigstream Hyper-acceleration 层Address the whole big data process

Zero code change

Cross platform

Intelligent, automatic

computation slicing

Cross acceleration hardware

2X to 30X acceleration

Dataflow Adaptation Layer

B igstream Hypervisor

HYPER-ACCELERATION

B igstream Dataflow

BIG DATA PLATFORMS

Many-cores GPU FPGA

© Copyright 2019 Xilinx

Apache Spark + Bigstream Hyper-Acceleration

Bigstream Compiler

Resource Manager

Many-cores GPU FPGA

Catalyst

Cluster Management

Master NodeClient Application

Big DataPlatform APIs

Application Commands

Executor Node

Node Manager

Spark Task

• Zero Code Change

• Cross Platform• Intelligent, Automatic

Computation Slicing• Cross-Hardware Acceleration

• 2-30X Acceleration

YBigstream Runtime

Executors

NAccelerate?

Physical Plan

Tasks (Normal/Hyper-accelerated)

HW Accelerator TemplatesHyper-Acc Tasks

Resource management messages

Application Master

© Copyright 2019 Xilinx

硬件加速器引擎

Deserialization

JSON

CSV

Parquet (under development)

FIX (under development)

DecompressionSnappy

GZIP (3rd Party)Encryption/Decryption AES

SQL

Project

Filter

Sort

Hash Aggregate

Search/Regex PCRE (3rd Party)

CPU Cores RISC-V (3rd Party)

Machine Learning

Linear/Logistic Regression

K-means

Deep Learning

CNN (3rd Party)

RNN (3rd Party)

NetworkingIP/UDP

IP/TCP (3rd Party)

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space1. Identify and load FPGA bitstream based onacceleration template match

FPGA

Offload Base AcceleratorEcosystem

OSS/3rd Party

(Existing Stack)

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

2

1. Identify and load FPGA bitstream based onacceleration template match

2. Software configuration of FPGA tocustomize hardware template for theapplication

FPGA

Offload Base AcceleratorEcosystem

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack)

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

FPGA

3 2

Offload Base AcceleratorEcosystem

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack)

1. Identify and load FPGA bitstream based onacceleration template match

2. Software configuration of FPGA tocustomize hardware template for theapplication

3. Issue “accelerated” compute task(requires I/O requests to the SSD)

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

3

4

2

1. Identify and load FPGA bitstream based onacceleration template match

2.Software configuration of FPGA tocustomize hardware template for theapplication

3. Issue “accelerated” compute task(requires I/O requests to the SSD)4.Copy input data from host to FPGA memory and back again to the application user space memory to completeFPGA

Offload Base AcceleratorEcosystem

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack)

© Copyright 2019 Xilinx

Rewind for Computational StorageNo hardware change , adopt Xilinx computational

storage framework

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

2

1. Identify and load FPGA bitstream based onacceleration template match

2. Software configuration of FPGA tocustomize hardware template for theapplication

FPGA

Computational Storage

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack)

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

FPGA

3 2

Computational Storage

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack) FPGA DRAM

1. Identify and load FPGA bitstream based onacceleration template match

2. Software configuration of FPGA tocustomize hardware template for theapplication

3. Issue “accelerated” I/O + compute requests to the SSD to pump data into the FPGA

© Copyright 2019 Xilinx

HYPE R-AC CEL ERA TION

Database of Templates

User Space File System

FPGA Driver

NVMe / PCIe DriverOS

Host CPU Host DRAM

Host Interface Controller

SSD FPGA

Database of IPs

Hardware

User Space

3

4

2

FPGA

1Template 1

Template 2

Template 3

Engine 1

Engine 2

Engine 3

OSS/3rd Party

(Existing Stack) FPGA DRAM

1. Identify and load FPGA bitstream based onacceleration template match

2.Software configuration of FPGA tocustomize hardware template for theapplication

3.Issue “accelerated” I/O + compute requests to the SSD to pump data into the FPGA

4.FPGA copies the result to the application user space

Computational Storage

© Copyright 2019 Xilinx

Data Streamer Format deserializer

inData Blocks/pages

Data

inCh

anne

lou

tCha

nnel

Accel 3 Accel 2AggData Intermediate Intermediate

data data

计算存储加速器

Result

FPGA

Flash Storage

Accel 1

© Copyright 2019 Xilinx

总体加速比较

~3100 secs

~1800 secs

~9000 secs

Baseline – No Acceleration

Hardware Acceleration Only: ~3x

Software + Hardware Acceleration: ~5x

© Copyright 2019 Xilinx

可衡量的硬件+软件协同任务加速

0

5000

10000

15000

20000

25000

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93

Task

Tim

e(m

s)

Num Tasks Completed

SW Task End Time

HW Task End Time

SW+HW Task End Time

© Copyright 2019 Xilinx

总结

˃ Exponential data growth driving the computational storage opportunity to offload compute functions closer to memory and storage

˃ FPGA enabled adaptable storage will enable differentiation and unlock efficiency for storage workloads

˃ Building on the success of Xilinx compute acceleration platform, Xilinx Computational Storage Platform provides ease of application portability and tremendous returns for workloads that are have storage affinity

Adaptable.Intelligent.

赛 灵 思 技 术 日XILINX TECHNOLOGY DAY