Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Intel® oneAPI Math Kernel Library (oneMKL) Introduction
March 2021
Gennady Fedorov, Technical Consulting Engineer, Intel Architecture, Graphics & Software (IAGS)
Intel OneMKL Introduction2
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation. Learn more at intel.com or from the OEM or retailer.
Your costs and results may vary.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. https://software.intel.com/en-us/articles/optimization-notice
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
NOTE this slide is added as part of the template after Title Slide options. Recommend keeping it there.
Intel OneMKL Introduction3
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction4
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction5
Intel® oneAPI Base & HPC Toolkit
Intel® oneAPI Tools for HPC: Deliver
Fast Applications that Scale
A toolkit that adds to the Intel® oneAPI Base Toolkit for building high-performance, scalable parallel code on C++, Fortran, OpenMP & MPI from enterprise to cloud, and HPC to AI applications.
▪ OEMs/ISVs
▪ C++, Fortran, OpenMP, MPI Developers
▪ Accelerate performance on Intel® Xeon® & Core™ Processors and Accelerators
▪ Deliver fast, scalable, reliable parallel code with less effort; built on industry standards
Back to Domain-specific Toolkits for Specialized Workloads
Intel OneMKL Introduction6
What’s Inside Intel® MKL
Linear Algebra
BLAS
LAPACK
ScaLAPACK
Sparse BLAS
Graph
PARDISO/Direct Sparse Solver
Cluster Sparse Solver
FFTs
Multi-
dimensional
FFTW interfaces
Cluster FFT
Vector RNGs
Engines
Distributions
Summary Statistics
Kurtosis
Variation coefficient
Min/max
Order Statistics
Variance-covariance
Vector Math
Trigonometric
Hyperbolic
Exponential
Log
Power
Root
And More
Splines
Interpolation
Trust Region
Fast Poisson Solver
Intel OneMKL Introduction7
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction8
What’s New for Intel® oneAPI MKL ( v.2021 )
▪ oneMKL introduced the new programming models to enable programing Intel GPUs: Data Parallel C++ (DPC++) APIs and C/Fortran OpenMP* Offload interfaces to program Intel GPUs.
• Support for Intel® Processor Graphics Gen9/Gen12 (GPU) : https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-system-requirements.html
• Support for offload to Intel® Processor Graphics Gen9/Gen12 using OpenMP for selected routines
▪ Intel MKL continues to provide support for the same C and Fortran APIs for CPUs
▪ oneAPI MKL Specification: https://spec.oneapi.com/versions/latest/elements/oneMKL/source/domains/domains.html
Intel OneMKL Introduction9
What’s New for Intel® oneAPI MKL, Domain areas
1: Subset of the full functionality available. Refer to the DPC++ developer reference for full list of DPC++ functionality supported.
2: Subset of the full functionality available. For the list of functionality, refer to the developer reference (C and Fortran)
3: Supported on Linux* only.
4: DFTI interfaces are supported; FFTW interfaces are not supported.
5. Subset of the full functionality available. Refer to the DPC++ developer reference for full list of DPC++ functionality supported or to the developer reference for C and Fortran. Functions which are not implemented for GPU can still be used and will be executed transparently on the host CPU.
Domain CPU APIs Intel GPU APIs
DPC++ C Fortran DPC++ C OpenMP*
Offload
Fortran OpenMP*
Offload
BLAS and BLAS-
like ExtensionsYes Yes Yes Yes Yes Yes
LAPACK and
LAPACK-like
ExtensionsYes1 Yes Yes Yes1 Yes2 Yes2
ScaLAPACK No Yes Yes No No No
Vector Math Yes Yes Yes Yes5 Yes5 Yes3,5
Vector Statistics
(Random Number
Generators)Yes1 Yes Yes Yes1 Yes2 Yes2,3
Vector Statistics
(Summary
Statistics)Yes1 Yes Yes Yes1 No No
Data Fitting No Yes Yes No No No
FFT/DFT Yes Yes Yes Yes Yes4 Yes4
Sparse BLAS Yes1 Yes Yes Yes1 Yes2 No
Sparse Solvers No Yes Yes No No No
Intel OneMKL Introduction10
What’s Inside Intel® oneAPI Math Kernel Library(oneMKL)
Linear Algebra
BLAS
LAPACK
ScaLAPACK
Sparse BLAS
Graph
PARDISO/Direct Sparse Solver
Cluster Sparse Solver
FFTs
Multi-
dimensional
FFTW interfaces
Cluster FFT
Vector RNGs
Engines
Distributions
Summary Statistics
Kurtosis
Variation coefficient
Min/max
Order Statistics
Variance-covariance
Vector Math
Trigonometric
Hyperbolic
Exponential
Log
Power
Root
And More
Splines
Interpolation
Trust Region
Fast Poisson Solver
Intel® Processor Graphics Gen9/Gen12 support
Limited -Intel® Processor Graphics Gen9/Gen12 support (see release notes) CPU C/Fortran support
Intel OneMKL Introduction11
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction12
Intel® Compilers
Intel Compiler TargetOpenMPSupport
OpenMP OffloadSupport
Included in oneAPIToolkit
Intel® C++ Compiler, IL0 icc
CPU Yes No HPC
Intel® oneAPI DPC++/C++ Compiler dpcpp
CPU, GPU, FPGA*
Yes Yes Base
Intel® oneAPI DPC++/C++ Compiler icx
CPUGPU*
Yes Yes Base
Intel® Fortran Compiler, IL0 ifort
CPU Yes No HPC
Intel® Fortran Compiler ifx
CPU, GPU* Yes Yes HPC
Cross Compiler Binary Compatible and Linkable!
tinyurl.com/oneAPI-Base-download tinyurl.com/oneAPI-HPC-download
tinyurl.com/openmp-and-dpcpp
Intel OneMKL Introduction13
▪ Drivers
• icx (C/C++), ifx (Fortran)
▪ OPTIONS
-fiopenmp
• Selects Intel Optimized OMP
• -fopenmp for Clang* O.S. OMP
• -qopenmp NO!! rejected, only in ICC/IFORT
-fopenmp-targets=spir64
• Needed for OMP Offload
• Generates SPIRV code fat binary for offload kernels
OpenMP with Intel® Compilers
tinyurl.com/intel-openmp-offloadGet Started with OpenMP* Offload Feature to GPU:
Intel OneMKL Introduction14
Offload: OpenMP* 4.0 for Devices - Constructs
▪ Target construct transfer control and data from the host to the device
▪ Syntax (C/C++) #pragma omp target [clause[[,] clause],…] structured-block
▪ Syntax (Fortran) !$omp target [clause[[,] clause],…] structured-block !$omp end target
▪ Clauses device(scalar-integer-expression) map([{alloc | to | from | tofrom}:] list) if(scalar-expr)
Intel OneMKL Introduction15
Offload: Key OpenMP Directives (C)
#pragma omp target data
Map host-side variables to device variables inside this block.
#pragma omp target enter data#pragma omp target exit data
Map/unmap host-side variables to device variables: the two halves of #pragma omp target data.
#pragma omp target variant dispatch
Offload oneMKL calls inside this block to the GPU.
#pragma omp target
Offload execution of block to the GPU.
Intel OneMKL Introduction16
int main() {long m = 10, n = 6, k = 8, lda = 12, ldb = 8, ldc = 10;long sizea = lda * k, sizeb = ldb * n, sizec = ldc * n;double alpha = 1.0, beta = 0.0;
// Allocate matricesdouble *A = (double *) mkl_malloc(sizeof(double) * sizea, 64);double *B = (double *) mkl_malloc(sizeof(double) * sizeb, 64);double *C = (double *) mkl_malloc(sizeof(double) * sizec, 64);
// Initialize matrices […]…
// Compute C = A * B on CPUcblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k,
alpha, A, lda, B, ldb, beta, C, ldc);
…}
𝐶 ← 𝛼𝐴𝐵 + 𝛽𝐶
GEMM with oneMKL C API
Intel OneMKL Introduction17
int main() {long m = 10, n = 6, k = 8, lda = 12, ldb = 8, ldc = 10;long sizea = lda * k, sizeb = ldb * n, sizec = ldc * n;double alpha = 1.0, beta = 0.0;
// Allocate matricesdouble *A = (double *) mkl_malloc(sizeof(double) * sizea, 64);double *B = (double *) mkl_malloc(sizeof(double) * sizeb, 64);double *C = (double *) mkl_malloc(sizeof(double) * sizec, 64);
// Initialize matrices […]…
#pragma omp target data map(to:A[0:sizea],B[0:sizeb]) map(tofrom:C[0:sizec]){
#pragma omp target variant dispatch use_device_ptr(A, B, C) nowait{
// Compute C = A * B on GPUcblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k,
alpha, A, lda, B, ldb, beta, C, ldc);}
}…
}
Use target data map to send matrices to the device
Use target variant dispatch to request GPU
execution for cblas_dgemm
List mapped device pointers in the use_device_ptr
clause
Optional nowait clause for asynchronous executionUse #pragma omp taskwait for synchronization
𝐶 ← 𝛼𝐴𝐵 + 𝛽𝐶
GEMM with oneMKL C OpenMP Offload
Intel OneMKL Introduction18
… // module files
program main
integer :: m = 10, n = 6, k = 8, lda = 12, ldb = 8, ldc = 10integer :: sizea = lda * k, sizeb = ldb * n, sizec = ldc * ndouble :: alpha = 1.0, beta = 0.0double, allocatable :: A(:), B(:), C(:)
// Allocate matrices…allocate(A(sizea))…
// Initialize matrices……
! Compute C = A * Bcall dgemm(‘N’, ‘N’, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc)
…end program
GEMM with oneMKL Fortran OpenMP Offload
Intel OneMKL Introduction19
… // module filesinclude “mkl_omp_offload.f90”
program mainuse mkl_blas_omp_offload
integer :: m = 10, n = 6, k = 8, lda = 12, ldb = 8, ldc = 10integer :: sizea = lda * k, sizeb = ldb * n, sizec = ldc * ndouble :: alpha = 1.0, beta = 0.0double, allocatable :: A(:), B(:), C(:)
// Allocate matrices…allocate(A(sizea))…
// Initialize matrices……!$omp target data map(to:A(1:sizea), B(1:sizeb)) map(tofrom:C(1:sizec))!$omp target variant dispatch use_device_ptr(A, B, C) nowait
! Compute C = A * B on GPUcall dgemm(‘N’, ‘N’, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc)
!$omp end target variant dispatch!$omp end target data…end program
GEMM with oneMKL Fortran OpenMP Offload
Use target data map to send matrices to the device
Use target variant dispatch to request GPU
execution for dgemm
List mapped device pointers in the use_device_ptr
clause
Optional nowait clause for asynchronous executionUse !$omp taskwait for synchronization
Module for Fortran OpenMP offload
Intel OneMKL Introduction20
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction21
Demo, BLAS
▪ Review sgemm_offload.cpp, makefile
▪ Build and Run:
• ./a.out 4
• ./a.out 400
• ./a.out 4000
▪ export MKL_VERBOSE=1
• ./a.out 4
Intel OneMKL Introduction22
Demo, BLAS, cont.
▪ Enabling the OpenMP profiling
▪ #LIBOMPTARGET_PROFILE=T - Enables OpenMP profiling, also prints the chosen device
▪ #LIBOMPTARGET_PLUGIN=OpenCL - Target OpenCL backend instead of Level0
▪ #LIBOMPTARGET_DEBUG=1 - Forces OpenMP to dump debug info
• export LIBOMPTARGET_DEBUG=1
• ./a.out 100 > 1.log 2>&1
• Less 1.log | grep 400
▪ https://openmp.llvm.org/docs/design/Runtimes.html
Intel OneMKL Introduction23
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction24
How to Link with oneMKL – Link Line Adviser Tool
24
▪ https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
Intel OneMKL Introduction25
25
Building & Linking with oneMKL – OpenMP Offload
▪ Headers
#include “omp.h”
#include “mkl_omp_offload.h”
▪ Build flags (new flags highlighted)
icx -c -DMKL_ILP64 -m64 -fiopenmp -fopenmp-targets=spir64 –mllvm -vpo-paropt-use-raw-dev-ptr
-I${MKLROOT}/include source.c -o source.o
▪ Link flags (example: dynamic link; sequential threading)
icx source.o -fiopenmp -fopenmp-targets=spir64 -mllvm -vpo-paropt-use-raw-dev-ptr
-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lOpenCL -lpthread -ldl
-lm -o source.out
Intel OneMKL Introduction26
26
How to Link with oneMKL in Linux*
▪ DPC++ interfaces with static linking on Linux
dpcpp -fsycl-device-code-split=per_kernel -DMKL_ILP64 <typical user includes and linking flags and other libs> ${MKLROOT}/lib/intel64/libmkl_sycl.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_<sequential|tbb_thread>.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lsycl -lOpenCL -lpthread -ldl –lm
▪ DPC++ interfaces with dynamic linking on Linux
dpcpp -DMKL_ILP64 <typical user includes and linking flags and other libs> -L${MKLROOT}/lib/intel64 -lmkl_sycl -lmkl_intel_ilp64 -lmkl_<sequential|tbb_thread> -lmkl_core -lsycl -lOpenCL -lpthread -ldl -lm
Intel OneMKL Introduction27
Agenda
▪ oneMKL Content
▪ oneMKL, What’s New
▪ oneMKL OpenMP Offload Interfaces
▪ DEMO – BLAS, OpenMP Offloading
▪ Linking with oneMKL
▪ oneMKL Known Issues/Limitations
▪ Q&A
Intel OneMKL Introduction28
Intel® oneAPI MKL, Known Issues/Limitations
▪ Dynamic linking on Windows* is supported only for the BLAS and LAPACK domains.*
▪ DFT on Intel GPU and Intel® oneAPI Level Zero backend may result in incorrect results for large batch sizes. To run DFT on Intel GPU, set the environment variable SYCL_BE=PI_OPENCL.
▪ Static linking on Windows* can take significant time (up to 10 minutes).
▪ OpenMP* offload is only supported for static libraries on Windows*.
▪ MKL 2021 backward compatibility has been broken due to the oneTBBchangings **
* https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-release-notes.html
** (https://software.intel.com/content/www/us/en/develop/articles/tbb-revamp.html)
Intel OneMKL Introduction29
Intel® oneAPI Math Kernel Library Resources
Intel® Math Kernel Library (Intel® MKL) site
Intel® oneAPI Math Kernel Library
Intel® DevCloud for oneAPI Projects
software.intel.com/intel-mkl
software.intel.com/oneAPI/mkl
https://software.intel.com/en-us/devcloud/oneapi
Online Service Center (Paid support) https://supporttickets.intel.com/servicecenter?lang=en-US
Intel® MKL Forumhttps://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bd-p/oneapi-math-kernel-library
Intel® MKL Benchmarkshttps://software.intel.com/content/www/us/en/develop/tools/math-kernel-library/benchmarks.html
Intel® MKL Link Line Advisor https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html