Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance...

Accelerators for HPC: Programming Models

Accelerators for HPC: StreamIt on GPU

High Performance Applications on Heterogeneous Windows Clusters

http://www.hpc.serc.iisc.ernet.in/MSP

Microsoft External Research Initiative

StreamIt• A High level programming model where nodes and arcs

represent computation and communication respectively• Exposes Task, Data and Pipeline Parallelism• Example: A Two-Band Equalizer

Duplicate Splitter

Bandpass Filter

CombinerSignal Output

StreamIt on GPUs• Provides a natural way of implementing DSP applications on GPUs• Challenges:– Determining the optimal execution configuration for each filter– Harnessing the data parallelism within each SM– Orchestrating the execution across SMs to exploit task and pipeline parallelism– Register constraints

Our Approach• Good execution configuration determined by using profiling – Identify near-optimal

number of concurrent thread instances per filter.– Takes into consideration register contraints

• Formulate work scheduling and processor (SM) assignment as a unified Integer Linear Program problem.– Takes into account communication bandwidth restrictions

• Efficient buffer layout scheme to ensure all accesses to GPU memory are coalesced. • Ongoing Work: Stateful filters are assigned to CPUs – Synergistic execution across

CPUs and GPUs

GPUs• Massively Data Parallel Computation Engines, capable of

delivering TFLOPs level throughputs.• Organized as a set of Streaming Multiprocessors (SMs), each

of which can be viewed as a very wide SIMD unit• Typically programmed using CUDA, OpenCL, which are

extensions of C/C++• Steep learning curve and Portability Issues

The Process The Results• Speedups of upto 33.82X over a

single threaded CPU execution• Software Pipelined scheme

outperforms a naive serial SAS schedule in most cases

• Coalesced Accesses results in huge performance gains, as evident from the poor performance of SWPNCFurther details in: ”Software Pipelined Execution of Stream Programs on GPUs” (CGO 2009)

The Problem• Variety of general-purpose hardware accelerators– GPUs : nVidia, ATI, Intel– Accelerators: Clearspeed, Cell BE, etc.– All Said and Done: A Plethora of Instruction Sets

• HPC Design using Accelerators– Exploit instruction-level parallelism – Exploit data-level parallelism on SIMD units– Exploit thread-level parallelism on multiple units/multi-cores

• Challenges– Portability across different generation and platforms– Ability to exploit different types of parallelism

What a Solution Needs to Provide• Rich abstractions for Functionality

– Not a lowest common denominator• Independence from any single architecture• Portability without compromises on efficiency

– Don't forget high-performance goals of the ISA• Scalability – Both ways, from micro engines to massively parallel

devices– Single core embedded processor to multi-core workstation

• Take advantage of Accelerators (GPU, Cell, etc.)• Transparent Access to Distributed Memory• Management and load balance across heterogeneous devices

• Operator– Add, Mult, …

• Vector– 1-D bulk data type of base

types– E.g. <1, 2, 3, 4, 5>

• Distributor– Distributes operator over

vector – Example:

par add <1,2,3,4,5> <10,15,20,25,30> returns <11, 17, 23, 29, 35>

• Vector composition– Concat, slice, gather, scatter, …

Enter PLASMA…

The PLASMA Framework “CPLASM”, a prototype high-level

assembly language Prototype PLASMA IR Compiler Currently Supported Targets:

C (Scalar), SSE3, CUDA (NVIDIA GPUs) Future Targets:

Cell, ATI, ARM Neon, ... Compiler Optimizations for “Vector” IR

Initial Performance Results

Supercomputer Education & Research Centre, Indian Institute of ScienceR. Govindarajan, Sreepathi Pai, Matthew J. Thazhuthaveetil, Abhishek Udupa

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance...

Documents

ADP brochure - Streamit

HPC and Accelerators · HPC Going “Mainstream” •Hyper-growth within Departmental and Divisional HPC buyers •Large influx of new buyers provides opportunity for new solution

HPC Architectures · Outline • Shared memory architectures • Distributed memory architectures • Distributed memory with shared-memory nodes • Accelerators • What is the

Particle Accelerators

StreamIt: High-Level Stream Programming on Rawgroups.csail.mit.edu/cag/raw/raw_intro_day_web/... · StreamIt: High-Level Stream Programming on Raw Michael Gordon, Michal Karczmarek,

Experiments on accelerators - COMPILviereseau:conferences:conference... · Experiments on accelerators ... •Intel Confidential – NDA Dell Confidential. HPC team EMEA ... in both

StreamIt: High-Level Stream Programming on Raw

The Future of HPC · 2021. 3. 24. · Accelerators •Accelerators (Nvidia & AMD) increasingly widespread in HPC ... •Improves power efficiency (see e.g.Apple Silicon M1) •Less

StreamIt-A Programming Language for the Era of Multicores

HPC Landscape - Application Accelerators: Deus ex machina? · Deus ex machina? Presented to the. High Performance Embedded Computing 2009 (HPEC) Jeffrey Vetter. Future Tech Group

The rise of HPC accelerators · This presentation contains the personal opinions, forecasts and forward-looking statements of the authors and not necessarily the opinions of any other

I-CAL 8000 HPC!! - calmetrixdownloads.calmetrix.com/Specs/ICal8HPCSpecs.pdf · 2016. 7. 30. · Shotcrete accelerators work through accelerated formation of calcium aluminate sulfate

HPC Architectures – past ,present and emerging trends · Computational Science Trends in HPC technology Trends in HPC programming Massive parallelism Accelerators The scaling problem

Introduction to Accelerators: Evolution of Accelerators

Containers On HPC...Singularity can support any PCIe-attached device within the compute node, such as graphic accelerators. Same user inside and outside of the container Singularity

1 ACCELERATORS. 2 Topics types of accelerators relativistic effects Fermilab accelerators Fermilab proton-antiproton collider beam cooling

Using Hardware Accelerators for HPC Tasks de... · 2017-05-30 · Examples: HPC tasks for optical 3D metrology on GPUs • Application scenario: GPUs for optical metrology •Not

Software Barriers for HPC - exascale.orgexascale.org/mediawiki/images/4/41/IESP-PanelSlides.pdf · Operating System ... April 09 Slide 18 . ... and manycore accelerators Software-hardware

LINEAR ACCELERATORS

Family data sheet Accelerators for HP ProLiant servers · High-performance computing (HPC) is being used to address many of modern society’s biggest ... transition to industry-standard