12
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

FPGA Accelerator Virtualization in an

OpenPOWER cloudFei Chen, Yonghua Lin

IBM China Research Lab

Page 2: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Trend of Acceleration Technology

2

• Used FPGA to accelerate Bing search on 1632 servers• A 6*8 2D-torus design for high throughput network topology

• Storage >2000PB, processing 10~100PB/day, log 100TB~1PB/day• Using FPGA for storage controller• Used GPU for Deep Learning

Acceleration in Cloud is Taking Off

• Scalable acceleration fabric

• Open framework for accelerator integration and sharing

• Accelerator resource abstraction, re-configuration and scheduling in cloud

• Modeling & advisory tool for dynamic acceleration system composition

Innovations RequiredAcceleration programming becomes

hot topics

� TB scale problem � PB scale problem

� Acceleration architecture for singlenode

� Architecture for thousands of nodes

� Dedicate acceleration resource

� Shared acceleration resources

� Proprietary accelerator framework

� Open framework to enable accelerator sharing & integration

� Close innovation model � Open innovation model through eco-system

Appliance Acceleration in Cloud

OpenCL, Sumatra (Oracle), LiMe (IBM), …

Page 3: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Resources on FPGA are huge

3

• Resources on FPGA– Programmable resources•Logic cells (LCs)

•DSP slices: Fixed/floating-point

•On-chip memory blocks

•Clock resources

– Miscellaneous periphrals(Xilinx Virtex as an example)

•DDR3 controllers

•PCIe Gen3 interfaces

•10G Ethernet controllers

•...

– Hard processor core•PowerPC: Xilinx Virtex-5 FXT

•ARM: Xilinx Zynq-7000

•Atom: Intel + Altera E600C Xilinx Virtex UltraScale 440 FPGA, the largest-

scale FPGA in the world delivered in2014, consists of

more than 4 million logical cells (LCs).

Using this chip, we can build up to 250 AES

crypto accelerators, or 520 ARM7 processor cores.

FPGA Capacity Trends

Page 4: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

FPGA on Cloud – Double Win

4

� Cloud benefits from FPGA• Performance

• Power consumption

� FPGA benefits from cloud• Lower the cost

• Tenants need not purchase and maintain FPGAs.

• Tenants pay for accelerators only when using them.

• More applications

• High FPGA utilization

• Ecosystem

• Grow with the cloud ecosystem

Page 5: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Motivation of Accelerator/FPGA as Service in Cloud

5

Can FPGA (pool) be managed in data

center?

ID, location, re-

configuration,

performance, etc.

App.

App.

App. VM

VM

VM Container

Container

Container

How to reduce system cost through sharing FPGA

resources among applications, VMs, containers?

Dynamic, Flexible, Priority controllable

How to orchestrate FPGA/accelerator resources

with VM, network and storage resources easily,

according to the need of application?

Enable the manageability Reduce system cost

Reduce deployment complexity

VM

Containerhost

Storage

Network

Bring high value of cloud infrastructure

Could we generate new value for IaaS ?

Page 6: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

FPGA Ecosystem in Cloud

HEAT orchestrator

POWER8/PowerKVM

FPGA

cards

Compute Network StorageFPGA

accelerator

Accelerator Market Place

Cloud Service Provider

Cloud

Tenants•Pay for the usage of accelerator, rather than license and hardware

•Get the accelerator service in self-service way

•Use the single HEAT orchestrator to finish the workload deployment with accelerator, together with compute, network, and storage.

•Cloud service provider will buy the “Cloudified” accelerators on market place

•Create the Service Category for FPGA accelerator, and sale on the cloud as service

•Companies or

individual

developers could

upload and sale

their accelerator

through market

place (e.g. on

OpenPOWER)

• Accelerator market place will do the cloudify for accelerator, through integrating the service layer with accelerator and compilation

•All the integration, compilation, test, verification and certification will be done in automatic way.

Accelerator developers

Accelerator Cloudify Tool

(in plan)

OpenStack extenstion for accelerator service

Service logics for accelerator service in FPGA

Page 7: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Accelerator as Service on SuperVessel

7

• Accelerator MarketPlace for developers to

upload and compile the accelerators for

SuperVessel POWER cloud

• Allow user to request different size of cluster

Fig.1 Accelerator MarketPlace for SuperVessel Cloud

Fig.2 Cloud users could apply

accelerator when creating VM

Page 8: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

FPGA Implementation

8

AB

C D

Service Logic

Ser

vice

Sub

laye

r

Pla

tfor

m S

ubla

yer

A

B

C

D

User Sublayer : Shared FPGA resource

Service Sublayer : Job Queue, Switch, …Platform Sublayer : DRAM, PCIe, ICAP, …

FPGA chip

DRAM

Switch

Job Scheduler

Job Queue

Security Controller

DMA Engine

Reconfig

Controller

A B C D

Eth

Registers

High Bandwidth I/O

PCIe / CAPI

……

Context Controller

Hardware

OS

Apps

Computer

The FPGA subsystem is designed as a computer system.

Page 9: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

System Implementation

9

Control Node

Compute Node

Compiler

• Control Node: Nova, Glance, Horizon, Neutron, Swift

• Compute Node: Nova Compute

• Compiler: FPGA incremental compiler environment

1. Accelerator source

code package

2. image_file

Dashboard

CompilerGlance

Scheduler

Compute

3. VM request

(with accelerator)

4. acc_file

5. Launch VM

Page 10: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Evaluation

10

(1) Accelerator Sharing Evaluation

90010001100120013001 2 3 4 5 6 7 8T

ota

l Ba

nd

wid

th(M

B/s

)

Number of ProcessesHost One VM

(1)

• Host: All processes run in host environment

• One VM: All processes run in one VM

• VMs : Each process runs in one VM

• AESs: Each VM uses one independent AES accelerator

0400800120016001 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61

TimeBandwidth (BM/s)

Process 0 Process 1 Total

(2) Management – Bandwidth Control

0%

20%

40%

60%

80%

1 11 21 31 41 51 61Time (second)

CV

0

1

2

3

4

1 11 21 31 41 51 61Time (second)

Average Latency (ms)

1

10

100

1000

10000

1 11 21 31 41 51 61Time (second)

Bandwidth (MB/s)

Process 0 Process 1

Process 1 begin Priority control

1194 MB/s

25 MB/s

0.21ms 0.22ms

2.3ms

(3) Management – Priority Control

(3)

� Process 0 : 256KB payload, 100 times per second

� Process 1 : 4MB payload, best effort use.

� Same priority during second 1 ~ 38.

� Raise process 0 priority at second 38.

Page 11: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

FPGA accelerator as Services online on SuperVessel Cloud

11

SuperVessel Cloud Infrastructure

SuperVessel

Cloud

Service

SuperVessel

Big Data and HPC

Service

Super

Class

Service

OpenPOWER

Enablement

Service

Super Project

Team

Service

Super Marketplace

1.VM and

container service

2.Storage service

3.Network service

4.Accelerator

as service5. Image service

1.Big Data:

MapReduce

(Symphony),

SPARK

2.Performance

tuning service

1.X-to-P migration

2.AutoPort Tool

3.OpenPOWER

new system test

service

1.On-line video

courses

2.Teacher course

management

3.User

contribution

management

1.Project

management

service

2.DevOps

automation

Storage IBM POWER servers OpenPOWER server FPGA/GPU

Docker

(Online)(Online) (Preparing)(Preparing)

You could try it here in two week: www.ptopenlab.com

Page 12: FPGA Accelerator Virtualization in an OpenPOWERcloudon-demand.gputechconf.com/gtc/2015/presentation/S... · FPGA Accelerator Virtualization in an OpenPOWERcloud FeiChen, Yonghua Lin

Thanks!

12