Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian...

Designing a CCA Framework for Accelerators

Khushbu AgarwalDaniel ChavarríaManoj Krishnan

Ian Gorton

Motivation

Emerging hybrid applications & architectures:Many codes & kernels being ported to hybrid CPU/GPU clusters“Production”-level applications not yet (?)

Other accelerated systems being deployed/developed:

FPGA-accelerated systemsTraditional

Convey systems

Cray XMT system: x86 front-end nodes + multithreaded compute nodes

Motivation (cont.)

Motivating Applications

Molecular dynamics GPU-based kernels being developed at PNNL

Evolving into more sophisticated/complete hybrid MD application

CCSD(T) GPU-based kernels also being developedCray XMT-based power grid contingency analysis application:

Dynamic power flow-based topology analysis on multithreaded processors to determine non-critical elements

Numerical solution of critical contingency cases on x86-based cluster

Motivating Applications (cont.)

“N-x”

non-critical elements

Dynamic Topology Analysis

Western Power Grid “N-5” C5

17000 ~= 1020

Objective

Provide a device independent interface to access accelerators and execute computations on them. e.g.:

Graphics Processor Units (GPUs)Field-Programmable Gate Arrays (FPGAs)Cray XMT

Proposed SIDL Framework

Initialize()Initializes the connection between accelerator and local host

opaque AllocMem()Allocate Memory on the accelerator and returns a pointer to it

opaque ReadMem( opaque m)Synchronous - Reads data from the accelerator and copies it to local buffer

WriteMem(opaque m, opaque data)Synchronous - Copy data from local buffer to accelerator memory

Exec()Synchronous - Executes the computation kernel on the accelerator

int AsyncExec()Asynchronous – Executes the computation kernel on the accelerator and returns a handle

Proposed SIDL Framework

AsyncWriteMem(opaque m, opaque data)Asynchronous – Issue request to copy data from local buffer to accelerator memory

opaque AsyncReadMem (opaque m)Asynchronous – Issue a request to read data from accelerator and copy it to local buffer

Wait() Wait for all the asynchronous data transfer operations to complete

WaitExec(int handle)Wait for asynchronous execution (identified by the handle) to complete

Implementation Challenges

Needs to differentiate between on board vs “farther away” accelerators.Init – will need to set up a connection between host and acceleratorHow do we start the execution of a kernel (move data) on an accelerator?

Heavily system and platform dependent

Good to hide behind a component interface!

Designing a CCA Framework for Accelerators Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian...

Documents

Research methodology khushbu

Gorton and West Gorton Draft Planning Proposals

George Gorton Machine Co Catalog

Top 3 trailers presentation claire gorton

Claire Gorton question 5

Portafolio por erasmo chavarría zamora

Jonathan Golden Khushbu Chechani

Profr. Febronio E. Chavarría AJEDREZ

Biostatistics khushbu

COLEGIO DE BACHILLERES N# 16 MANUEL CHAVARRÍA CHAVARRÍA OBLIGATORIEDAD Grupo: 207 Materia: Tic Integrantes: Martínez Chavarría Mireya A. Martínez León

Portafolio 1 Laura Chavarría Brenes

ANA JUDITH GONZÁLEZ CHAVARRÍA - USAC

Gorton Phil

Eric Gorton & Antony Romeo

Nathan Gorton

Gorton Fin Crisis

Khushbu cpd

West Gorton Design Code

Lisímaco Chavarría

Advanced diagnostic aids khushbu