23
State of programming models and code transformations on heterogeneous platforms Boyana Norris [email protected] Computer Scientist, Mathematics and Computer Science Division, Argonne National Laboratory Senior Fellow, Computation Institute, University of Chicago

Heterogeneous programming

Embed Size (px)

DESCRIPTION

High-level talk on programming models for parallel heterogeneous architectures at the second workshop organized by the NSF-funded Conceptualization of Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures, http://flash.uchicago.edu/site/NSF-SI2/

Citation preview

Page 1: Heterogeneous programming

State of programming models and code transformations on heterogeneous platforms

Boyana [email protected]­ Computer Scientist, Mathematics and

Computer Science Division, Argonne National Laboratory

­ Senior Fellow, Computation Institute, University of Chicago

Page 2: Heterogeneous programming

Before there were computers…

Jacquard Loom, invented in 1801

Programming was– Parallel– Pattern-based– Multithreaded

Page 3: Heterogeneous programming

(Possibly) the first heterogeneous computer(s)

Page 4: Heterogeneous programming

Outline, goals

Parallel programming for heterogeneous architectures– Challenges– Example approaches

Help set the stage for subsequent panel discussions w.r.t. issues related to programming heterogeneous architectures– Need your input, please do interrupt

Page 5: Heterogeneous programming

Heterogeneity

Hardware heterogeneity (different devices with different capabilities), e.g.:– Multicore x86 CPUs with GPUs– Multicore x86 CPUs with Intel Phi accelerators– big.LITTLE (coupling slower, low-power ARM cores with faster, power-

hungry ARM cores)– A cluster with different types of nodes – x86 CPU with FPGAs (e.g., Convey)– …

Software heterogeneity (e.g., OS, languages)– Not part of this talk

Page 6: Heterogeneous programming

Similarities among heterogeneous platforms Typically each processor has several, and sometimes many

execution units– NVIDIA Fermi GPUs have 16 Streaming Multiprocessors (SMPs); – AMD GPUs have 20 or more SIMD units; – Intel Phi has >50 x86 cores

Each execution unit typically has SIMD or vector execution. – NVIDIA GPUs execute threads in SIMD-like groups of 32 (what NVIDIA

calls warps); – AMD GPUs execute in wavefronts that are 64-threads wide;– Intel Phi has 512-bit wide SIMD instructions (16 floats or 8 doubles).

Page 7: Heterogeneous programming

Many scales

Page 8: Heterogeneous programming

Parallel programming models

Bulk synchronous parallelism (BSP) Stream processing Algorithmic skeletons (e.g., master-worker) Workflow/dataflow Remote method invocation Distributed objects Components Functional …

Page 9: Heterogeneous programming

Parallel programming models (cont.)

Parallel process interaction– Distributed data, exchanged through explicit messages (e.g., MPI)– Shared/global memory (e.g., PGAS)

Work parallelism– SPMD– Dataflow – Task-based– Streaming– …

Heterogeneous resources– Host-directed execution with selected kernels offloaded to co-

processor, e.g., MPI + CUDA/OpenCL– “Symmetric”, e.g., MPI on x86/Phi systems

Page 10: Heterogeneous programming

Example: Host-directed MPI+X model

Image by Yili Zheng, LBL

Page 11: Heterogeneous programming

Challenges

Managing data– Data distribution, movement, replication– Load balancing

Different processing capabilities (FPUs, clock rates, vector units)

Different instruction sets

Page 12: Heterogeneous programming

Software developer’s point of view

Important considerations, tradeoffs– Initial investment

• learning curve, reimplementation– Ongoing costs

• Maintainability, portability– Performance

• Real time, within power constraints,…– Life expectancy

• Architectures, software dependencies– Suitability for particular goals

• Embedded system vs petaflop machine– Agility

• Ability to exploit new architectures– …

Page 13: Heterogeneous programming

Programming model implementations

Established: – Parallelism expressed through message-passing, thread-based shared

memory, PGAS languages– High-level languages or libraries with APIs that can map to different

models, e.g., MPI– General-purpose languages with compiler support for exploiting

hybrid architectures– Small language extensions or annotations embedded in GPLs with

compiler or source transformation tool support, e.g., Fortran CUDA– Streaming, e.g., CUDA

More recent Extinct, e.g., HPF

Page 14: Heterogeneous programming

Tradeoffs

Scalability

Dev

elop

men

t Pro

ducti

vity

Low High

Sequential GPLs and high-level DSLs

Low-level languages or APIs, fully explicit parallelism control

Libraries, frameworks

High-level parallel languages

High

Page 15: Heterogeneous programming

Source transformations

Typically multiple levels of abstraction and programming models are used simultaneously

Goal is to express algorithms at the highest level appropriate for the functionality being implemented

A single language or library is unlikely to be best for any given application on all possible hardware

One approach: – Define algorithms using high-level abstractions– Provide tools to translate these into lower-level, possibly architecture

specific implementations Most programming on heterogeneous platforms involves

source transformation

Page 16: Heterogeneous programming

Example: Annotation-based approaches

Pros: low-effort, minimal changes Cons: limited expressivity, performance Examples:

– MPI + OpenACC directives in a GPL– Some embedded DSLs (e.g., as supported by Orio)

Page 17: Heterogeneous programming

Current limitations

Minimally intrusive approaches typically don’t result in the best performance possible, e.g., OpenACC annotations without code restructuring

A number of single-platform solutions provided by vendors (e.g., Intel, NVIDIA), portability or performance on other platforms not guaranteed

Page 18: Heterogeneous programming

General-purpose programming languages

GPLs for parallel, possibly heterogeneous architectures– UPC, CAF, Chapel, X10

Pros:– Robustness (e.g., type safety, memory consistency)– Tools (e.g., debugging, performance analysis)

Cons: – Manual reimplementation required in most cases– Hard to balance user control with resource management automation– Interoperability

Page 19: Heterogeneous programming

Recall host-directed MPI+X model

Image by Yili Zheng, LBL

Page 20: Heterogeneous programming

PGAS model

Image by Yili Zheng, LBL

Page 21: Heterogeneous programming

High-level frameworks and libraries

Domain-specific problem-solving environments and mathematical libraries can encapsulate the specifics of mapping to heterogeneous architectures (e.g., PETSc, Trilinos, Cactus)

Advantages– Efficient implementations of common functionality– Different levels of APIs to hide or expose different levels of the

implementation and runtime (unlike pure language approaches)– Relatively rapid support of new hardware

Disadvantages– Learning curves, deep software dependencies

Page 22: Heterogeneous programming

Ongoing efforts attempting to balance scalability with productivity

DOE X-Stack program pursues fundamental advances in programming models, languages, compilers, runtime systems and tools to support the transition of applications to exascale platforms– DEGAS (Dynamic, Exascale Global Address Space): a PGAS approach– SLEEC (Semantics-rich Libraries for Effective Exascale Computation):

annotations and cost models to compile into optimized low-level implementations

– X-Tune: model-based code generation and optimization of algorithms written in GPLs

– D-TEC: compilers for both new general-purpose languages and embedding DSLs into other languages

Page 23: Heterogeneous programming

Summary

Many traditional programming models can be used on heterogeneous architectures, with vendor support for compilers, libraries and runtimes

No clear multi-platform winner programming model/language/framework

Many new efforts on deepening the software stack to enable better balance of programmability, performance, portability