16
Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone Accelerator Generation: High-Level Synthesis 10:30 am – 11:00 am HLS-Based Accelerator-Rich Architecture Simulation: PARADE 11:00 am – 11:30 am Break 11:30 am – 12:00 pm Pre-RTL SoC Simulation: gem5-Aladdin 12:00 pm – 12:30 pm FPGA Prototyping: ARACompiler 12:30 pm – 2:00 pm Lunch 2:00 pm – 3:00 pm Panel on Accelerator Research 3:00 pm – 3:30 pm Accelerator Benchmarks and Workload Characterization 3:30 pm – 4:00 pm Break 4:00 pm – 5:00 pm Hands-on Exercise 1

Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

Embed Size (px)

Citation preview

Page 1: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

1

Tutorial OutlineTime Topic

9:00 am – 9:30 am Introduction

9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin

10:10 am – 10:30 am Standalone Accelerator Generation: High-Level Synthesis

10:30 am – 11:00 amHLS-Based Accelerator-Rich Architecture Simulation:

PARADE

11:00 am – 11:30 am Break

11:30 am – 12:00 pm Pre-RTL SoC Simulation: gem5-Aladdin

12:00 pm – 12:30 pm FPGA Prototyping: ARACompiler

12:30 pm – 2:00 pm Lunch

2:00 pm – 3:00 pm Panel on Accelerator Research

3:00 pm – 3:30 pm Accelerator Benchmarks and Workload Characterization

3:30 pm – 4:00 pm Break

4:00 pm – 5:00 pm Hands-on Exercise

Page 2: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

2

Integration for Heterogeneous SoC Modeling

Yakun Sophia Shao, Sam Xi, Gu-Yeon Wei, David Brooks

Harvard University

Page 3: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

3

Accelerator-CPU Integration:Today’s Conventional SoCs

• Easy to integrate lots of IP, simple accelerator design

• Hard to program and share data

CoreL2 $

L3 $

CoreL2 $

DMA

On-Chip System Bus

Acc #1

Scratchpad

Acc #n

Scratchpad

Page 4: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

4

Accelerator Integration Trend• Users design application-specific hardware accelerators.• System vendors provide Host Service Layer with virtual

memory and cache coherence support– Intel QuickAssist QPI-Based FPGA Accelerator Platform (QAP)– IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)

CoreL2 $

L3 $

CoreL2 $ Acc

Agent Host Service Layer

Accelerator

Main CPU/SoC FPGA or user-defined ASIC

Page 5: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

5

• Example of state-of-the-art:– IBM POWER8’s Coherent Accelerator

Processor Interface (CAPI)• Virtual Addressing & Data Caching• Easier, Natural Programming Model

IBM CAPI: Two part solution

Page 6: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

6

• Coherent Accelerator Processor Proxy (CAPP)– Snoops PowerBus on behalf of accelerator

• Power Service Layer (PSL)– Performs address translations, page table walker support– Provides cache and interface logic

IBM CAPI: Two part solution

Core CoreL2 $ L2 $

On-Chip Coherent PowerBus

Memory

CAPP

Accelerator… PCIe

PSL

Cache TLB …

L3 $

Page 7: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

7

But… accelerators arenot one size fits all

• Problem: PSL layer consumes ~20-30% of FPGA resources… for one accelerator

• Applications have drastically different requirements.

• Memory design customization is often more important than datapath customization

Page 8: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

8

gem5-Aladdin Integration

CPU

DMA Engin

e

Scratchpad

TLB

DRAM

LLC

CacheCache

Acc Datapath

Page 9: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

9

Code example: Siftvoid imsmooth(F2D* array, float sigma, F2D* product);

void sift() { … imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product));

invokeAcceleratorAndBlock(imsmooth); …}

Page 10: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

10

Code example: Siftvoid imsmooth(F2D* array, float sigma, F2D* product);

void sift() { … // imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product));

invokeAccelerator(imsmooth); …}

Start Aladdin Simulation

Page 11: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

Simulating Accelerator with Memory System using Aladdin

11

Acc

Cache

Memory

Page 12: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

12

Acc

Cache

Memory

CPU

Cache

Memory

Page 13: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

13

Modeling Accelerators in an SoC-like Environment

Acc Core

Cache

Memory

Core

Page 14: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

14

Acc Core

Cache

Memory

Modeling Accelerators in an SoC-like Environment

Page 15: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

Aladdin gem5-Aladdin

FPGAPrototyping

Modeling

High-Level Synthesis

PARADE

Accelerator Research Infrastructure

15

StandaloneSystem

Integration

RTL

Page 16: Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone

Tutorial References• Y.S. Shao and D. Brooks, “ISA-Independent Workload Characterization

and its Implications for Specialized Architectures,” ISPASS’13.

• B. Reagen, Y.S. Shao, G.-Y. Wei, D. Brooks, “Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware,” ISLPED’13.

• Y.S. Shao, B. Reagen, G.-Y. Wei, D. Brooks, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures,” ISCA’14.

• B. Reagen, B. Adolf, Y.S. Shao, G.-Y. Wei, D. Brooks, “MachSuite: Benchmarks for Accelerator Design and Customized Architectures,” IISWC’14.

16