52
Karlsruhe Institute of Technology KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Overview of Programming Environments, Ready-to-Use Libraries and APIs Jan-Philipp Weiss Torun, Poland PPAM 2011 GPGPU Tutorial September 11, 2011 Engineering Mathematics and Computing Lab (EMCL) / SRG New Frontiers in High Performance Computing

Overview of Programming Environments, Ready-to-Use ...gpgpu.org/wp/wp-content/uploads/2011/09/02-libs.pdf · Overview of Programming Environments, ... Weiss Torun, Poland • PPAM

  • Upload
    haquynh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Karlsruhe Institute of Technology

KIT – University of the State of Baden-Wuerttemberg and

National Research Center of the Helmholtz Association www.kit.edu

Overview of Programming Environments,Ready-to-Use Libraries and APIsJan-Philipp WeissTorun, Poland • PPAM 2011 • GPGPU Tutorial • September 11, 2011

Engineering Mathematics and Computing Lab (EMCL) / SRG New Frontiers in High Performance Computing

Karlsruhe Institute of TechnologyOutline

1 Introduction and Motivation

2 Languages and Programming Environments

3 Middleware and API wrappers

4 Application-level Integration

5 Application-specific Libraries / Math Libraries

2/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

GPGPU Computing

GPUs as a viable alternative in scientific computingOutstanding GFlop/s performanceCompetetive on-chip bandwidthModerate prizes and widely availableCommunity of its own has grownLeading Top500 systems rely on GPUs (power/performance)

But there is something to do about thenumerical schemes and algorithms

Fine-grained parallelism – thousands of threadsData locality (device memory, small and fast shared memory,specific caches, ...)Heterogeneous configurations (GPU + CPU + ...)

code and program development

3/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

MotivationQuestions before starting your GPU implementation

My code/application is already there:

How can I make use of GPGPUs?Dive into CUDA or OpenCL and reimplement (at least someparts)So let’s have this tutorial today

But: I don’t want to learn a new language(no time, no knowledge, no fun):

Can I utilize the power of GPUs anyway?Yes, there are some approaches out there

This talk shows you some possibilities to utilize the power ofGPGPUs without CUDA or OpenCL(I hope I didn’t miss anything important – no guarantee forcompleteness)4/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Languages and Programming Environments

5/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Overview

CUDAExplicit language integration – high-levelThe main stream approach for GPUs (currently)

OpenCLSame flavour as CUDA, some more management to be doneVendor-independent industry standardPortable approach (in some sense)

DirectComputeGPU computing from Microsoft

PGI Accelerator CompilersImplicit parallel language support

HMPPA vendor-driven portable approach

GPUSs, StarPU, QUARK, OpenMPC

6/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUDA I

CUDA – Compute Unified Device ArchitectureParallel programming model and GPU architectureTwo models of usage

High-level, explicit language integrationLow-level device API

Already a mainstream approachMature software stack100k+ developers worldwide, 250M+ CUDA enabled GPUsRich portfolio of resources (code samples, forums, ...)

Unfortunately: Vendor-specific approachCross-platform: Linux, Windows, MacOS

7/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUDA II

CUDA’s featuresMassively multi-threaded (HW + Prog)Stream processing model: SIMT in NVIDIA’s speech

Single Instruction Multiple Threads

C-code / Fortran code + few keywords (simple extension)User launches batch of threadsThreads identified by block index + grid indexExplicit kernels for execution on GPUs

See http://nvidia.com/cudazone

8/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

OpenCL

The "Standard"C-based language for GPU kernels + device kernelsplus low-level device API

Same flavor as CUDAJIT compilation of kernel programsPortable – but inevitable optimization required for every platform

Managed by Khronos group (non-profit organization)All major vendors participateThis is the cross-vendor industry-standardBut: more effort from vendors and community necessary

See http://www.khronos.org/opencl/

9/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

DirectCompute

GPGPU under WindowsMicrosoft API / Windows standard for all GPU vendorsGeneral-purpose GPU computing under WindowsReleased with DirectX11 / Windows 7Supports all CUDA-enabled devices (DX10+DX11) and ATIGPUsLow-level API for device management and launching of kernelsDefines HLSL-based language for compute shaders

See http://archive.msdn.microsoft.com/DirectComputeLecture

10/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Compiler-based approach

PGI Accelerator CompilersCompiles original C99/Fortran code

Same makefiles, scripts, IDEs, ...

Only for CUDA GPUsAdd OpenMP-style compiler directives to crucial sections

Compiler splits data and computationsGenerates host x64 asm file + auto-generated GPU codeLinks to a unified program codeExecutes on both platforms (host and device)

At least worth a tryMinimal effort, but additional license costsSee http://www.pgroup.com/resources/accel.htm

11/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

HMPP

OpenHMPPDirective-based programming model

Add directives/pragmas to the codeOne further level of abstraction

Handles accelerators in a heterogeneous environmentSyntax for offloading computations to acceleratorsIntroduces concept of codelets

Aim: Reduce complexity of GPU computingTesla and Firestream, Linux + Windows

Driven by: CAPS, INRIA, PathscaleSee http://www.caps-entreprise.com/hmpp.html,http://www.openhmpp.org

12/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

GPUSuperscalar

GPUSsDerivative of StarSs programming modelExploits task level parallelismStandard sequential look and feelIncremental parallelizationSeparation between algorithms and resources

Developed by Barcelona Supercomputing Centre (BSC)See http://www.bsc.es

13/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

StarPU

StarPURuntime environment for heterogeneous platformsTask scheduling, resource and memory managementVirtual shared memory subsystemTask abstractions by means of codeletsMachine abstractionSchedules codelets on the entire systemsSMP + NVIDIA / CUDA + ATI / OpenCL + CellLinux / MAC / Windows

See http://runtime.bordeaux.inria.fr/StarPU/

14/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

QUARK

QUARKQUeuing And Runtime for KernelsBuilds basis of MAGMA project

Seehttp://icl.cs.utk.edu/projectsfiles/plasma/pubs/56-quark_users_guide.pdf

15/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

OpenMPC

OpenMPCExtended OpenMP programming for GPUs

Seehttps://engineering.purdue.edu/paramnt/OpenMPC/

16/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

In memoriam

RapidMindStream processing languageBought by Intel in 2009Portable code: x86, GPUs, CellEmbedded with C++, same toolchain, compilers, ...Now has involved into Intel ArBB (merged with Intel Ct)

No more third-party devices supported ?!

Other victims: PeakStream (Google)

17/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Middleware and API wrappers

18/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Overview

Middleware and API wrappersPyCUDA.NETJavaAuto-Parallelization

F2cc-AccPar4all

Simulators and DisassemblersBarraGPUocelotdecuda

19/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

.NET and Java

.NETGPU.NET, CUDA.NET, CAL.NET, OpenCL.NETGASS GPU-based supercomputing solutions.NET bindings for CUDA/CAL/OpenCL applicationsIdea: Write kernels in C#, F#, VB.NETDon’t rewrite existing code, give it a touch-upExposes minimal API from .NET approachesJIT compilation = dynamic language support

jCUDAJava bindings for CUDA from same company

See http://www.gass-ltd.co.il/en/products

20/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

PyCUDA, PyOpenCL

PyCUDA, PyOpenCLAll of CUDA / OpenCL in a modern scripting language: PythonOpen-source with MIT-licenseEasy code generation / automated tuningCUDA C-code = stringsBatteries included: GPU arrays, RNG

Developed and maintained by A. Klöckner, Brown University / NYUSee http://mathema.tician.de/software/pycuda

21/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Other Wrappers for CUDA

Third-party providedPerlLuaRubyIDL

22/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Auto-Parallelization

F2C-AccFortran to CUDA source-to-source translatorAids with manual code transformations

Seehttp://www.esrl.noaa.gov/gsd/ab/ac/F2C-ACC.html

Par4AllAutomatic parallelizing and optimizing compilerSequential codes in C and Fortran

See http://www.par4all.org

23/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Simulators

GPUocelotModular dynamic compilation for heterogeneous systemsSimulates CUDA programs on AMD GPUs, x86 CPUsAnalysis modules for PTX virtual instruction sets

See http://code.google.com/p/gpuocelot/

BarraModular functional simulator for GPUsSimulates CUDA programs at the assembly levelCan be used for low-level debugging, profiling and optimization

See http://gpgpu.univ-perp.fr/index.php/Barra

24/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Disassembler

decudaDisassembler for CUDA programs

25/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Application-level Integration

26/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Overview

Application-level integration, Plugins, ExtensionsMatlabMathematicaLabViewPetSCTrilinosOpenFoam, Ansys, ...

27/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Matlab

MatlabMatlab Parallel Computing Toolbox (PCT)

Parallel Computing on WorkstationsAcceleration for NVIDIA GPUsParallel for-loops, special array types, parallelized numericalalgorithms, data manipulationIntegration of CUDA kernels into Matlab applicationsUse of multiple GPUs

Matlab Distributed Computing ServerAllows a PCT application to be distributed to a clusterAcceleration for NVIDIA GPUs available

28/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Jacket / AccelerEyes

Jacket / AccelerEyesCompile Matlab code for CUDA-enabled GPUsHigh-level interfaceMatrix manipulationsInterfacing with CUDA / OpenCL kernelsMulti-GPU support

See http://www.accelereyes.com/products/jacket

29/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Mathematica

MathematicaHigh-level GPU interfaceInteraction with CUDA and OpenCLRange of GPU-enhanced functionsLinear algebra, image processing, financial simulation, fouriertransformsSymbolic creation of CUDA and OpenCL programsCUDALink and OpenCLLinkAccess to interface-builder, data basesIntegration of CUDA kernels

30/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

LabVIEW

LabVIEWVisual programming language from National InstrumentsAutomation of usage of processing and measurementequipmentCUDA interface for NVIDIA GPUs

31/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

PETSc

PETSc-DevPETSc-dev has some support for running portions of thecomputation on NVIDIA GPUsVector class VECCUSPMatrix class MATCUSP, performs matrix-vector products on theGPU, but no matrix assembly on GPUAll Krylow methods, except KSPIBCGS, run vector operationson the GPUSimple preconditioners on the GPU: PCBJACOBI, PCASM andPCJACOBI

32/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Trilinos

TrilinosGPU abstraction by means of Tpetra moduleProvides generic algorithm techniques for parallel linearalgebra librariesOnly intra-node communicationAllows for mixed MPI/threading and MPI/GPU

33/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

OpenFoam

OpenFoam FEM packageSeveral GPU extensions for OpenFoam

Culises by FluiDyna with iterative solvers and simplepreconditionersofgpu with PCG, PBiCG, free (GPL license),see http://openfoamspeedit.sourceforge.net

34/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Application-specific LibrariesMath Libraries

35/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Overview

Application-specific Libraries / Math LibrariesCUSparse - SpMV, CUBLAS, CUFFT,APPML, ACML-GPUCusp, CUDPP, ThrustMAGMAOpenCurentViennaCL, LAMA, lmpLAtoolbox / HiFlow3

FEAST, Op2 + UFLPARDISO...

36/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUsparse

NVIDIA library for sparse matrix vector operationsC interface, CUDA ToolkitLevel-1

axpy, dot, gthr, sctr, rotLevel-2 and Level-3

csrmv, csrmm, csrtrsv, csrtrsm

Format conversions (COO, CSR, ELL, DIAG, HYB)Sparse matrix-vector operations

37/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUBLASNVIDIA library for dense linear algebra

CUDA ToolkitLevel-1 (vector,vector), level-2 (matrix,vector) and level-3(matrix,matrix)Supports float, double, complex, double complexContains 152 routinesBuilding block of CUDA port of LAPACK

CULA from EM PhotonicsMAGMA from ICL/UTK

Matlab accelerationPCT from MathworksJacket from AccelerEyes

Ansys, LS-DYNA, ...

38/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUFFT

NVIDIA library for Fast Fourier TransformsParallel FFT: Cooley-Tukey, BluesteinInterface similar to FFTWStreamed asychronous execution1D, 2D and 3D transforms of complex and real dataDouble precision transforms, in-place and out-of-placetransformsBatch execution for doing multiple tranforms

39/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

APPML

APPMLAccelerated Parallel Processing Math LibrariesFFT and BLAS functionality(Random number generation)Successor of ACML-GPU

40/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUSP

NVIDIA library for sparse linear algebraFlexible, high-level interface for manipulating sparse matricesand solving sparse linear systemsIterative solvers, preconditioners, graph algorithms

See http://code.google.com/p/cusp-library/

41/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUDPP

Library for fundamental parallel primitivesWritten in C for CUDAOpen source BSD license / 7 committersBest-in-class performanceCUDPP 2.0 / August 2011http://code.google.com/p/cudpp

CUDPP featuresParallel prefix sum (scan), segmented scan, parallel reductionsGraphs, treesRadix sort, parallelSparse matrix-vector multiplicationRandom number generation, radix sort

42/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Thrust

A Template Library for CUDA applicationsInterface similar to C++ STL (containers and iterators)Containers on host and deviceAvoiding memory management errorsTemplates, iterators, functors, ...Iterators define ranges, algorithms act on rangesAlgorithms: sorting, reduction, scan, ...

Apache 2.0 license / 2 main contributors. Seehttp://code.google.com/p/thrust

43/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

CUDA math.h, CURAND and NPP

CUDA math.h from NVIDIABasic math functionalityC99 compatible math library plus extras

CURANDRandom number generationPseudorandom and quasirandom generation

NPPC library for performance primitivesImage and signal processing

44/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

MAGMALAPACK for HPC on heterogeneous architectures

Dense linear algebra routines on GPUs (and others)LU, QR, and Cholesky factorizations in both real and complexarithmetic (single and double);Hessenberg, bidiagonal, and tridiagonal reductions in both realand complex arithmetic (single and double);Linear solvers based on LU, QR, and Cholesky in both real andcomplex arithmetic (single and double);Eigen and singular value problem solvers in both real andcomplex arithmetic (single and double);Generalized Hermitian-definite eigenproblem solvers;Mixed-precision iterative refinement solvers based on LU, QR,and Cholesky in both real and complex arithmetic;MAGMA BLAS in real arithmetic (single and double), includinggemm, gemv, symv, and trsm.

See http://icl.cs.utk.edu/magma/45/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

OpenCurrent

OpenCurrentSolving PDEs on structured gridsTeaching tool and research vehicleFramework for solvers and grids (storage classes)Own classes for different problems, e.g. Navier Stokes,projection solversSolvers: e.g. red-black Gauss-Seidel, multigridMulti-GPU support

See http://code.google.com/p/opencurrent/

46/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

ViennaCL

ViennaCLScientific computing library written in C++ and based onOpenCLSimple, high-level access to the vast computing resourcesFocused on linear algebra operations (BLAS levels 1, 2 and 3),iterative methods for LSEs and preconditioners

See http://viennacl.sourceforge.net/

47/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

LAMA

Library for Accelerated Math ApplicationsSeamless and efficient usage of heterogeneous hardwarearchitecturesComprehensive interface to BLAS operations on dense andvarious sparse matrix storage formatsSimplifying the development of mathematical applicationsExtensible towards further storage formats as well as upcominghardware architectures.

See http://www.libama.org/

48/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

HiFlow3 / lmpLAtoolbox

lmpLAtoolbox

Parallel finite element package HiFlow3

Local multi-platform Linear Algebra toolbox (lmpLAtoolbox) is aC++ toolbox for vectors and sparse matricesTwo-level library: MPI + XCapable of fine-grained and hybrid parallelismInterfaces for x86 multi-core CPUs (OpenMP, IMKL, ATLAS),NVIDIA GPUs (CUDA, CUBLAS) and AMD/ATI GPUs(OpenCL)Unified interfaces with single source code for all plattformsIterative methods and sophisticated preconditioners (parallelILU(p) and FSAI)

49/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

FEAST

FEAST - Finite Element Analysis and Solution ToolsNext-generation Finite Element software relying onhardware-oriented numericsDesigned for large-scale runs on supercomputersHybrid non-overlapping domain decomposition parallelmultilevel/multigrid method called ScaRCGlobally unstructured and locally structured gridsExploits GPUs as co-processorsApplications for solid mechanics and fluid dynamics

See http://www.feast.tu-dortmund.de/

50/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

Op2

Op2 and UFLOpen-source framework for unstructured grid applicationsAiming at clusters of GPUs or multi-core CPUsLook of a conventional libraryUses source-source translation to generate the appropriateback-end codeComputational abstraction by Op2Mathematical abstraction by UFL

See http://people.maths.ox.ac.uk/gilesm/op2/

51/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT

Karlsruhe Institute of Technology

Introduction Languages Middleware Applic.-level Int. Math Libraries

PARDISO

PARDISOFast direct solver for sparse problemsUsing BLAS functionality of GPUs

See http://www.pardiso-project.org/

52/52 PPAM 2011 J.-P. Weiss - Overview of Programming Environments and Ready-to-Use Libraries EMCL / KIT