21
Intel® Performance Libraries: the latest Updates, upcoming features HPC Code Modernization Workshop for Intel® Xeon® processors & Xeon Phi™ coprocessors Julia Sukharina

Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

  • Upload
    vothuy

  • View
    236

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Intel® Performance Libraries: the latest Updates, upcoming featuresHPC Code Modernization Workshop for Intel® Xeon® processors & Xeon Phi™ coprocessors

Julia Sukharina

Page 2: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Intel® Performance LibrariesIntel® Parallel Studio XE

Deliver top C++ and Fortran application performance with less effort• Faster code: Boost applications performance that

scales on today’s and next-gen processors• Create code faster: Utilize a toolset that simplifies

creating fast, reliable parallel code

Intel® Math Kernel Library (Intel® MKL)Fastest and most used math library for Intel and compatible processors*

Intel® Integrated Performance Primitives (Intel® IPP)Extensive software library for media and data processing

Intel® Data Analytics Acceleration Library (Intel® DAAL)*Library of optimized building blocks for Data Analytics

Intel® System Studio (ISS)

Deep system-level insight into power, performance, and reliability• Accelerate time to market of Intel® architecture-

based systems and embedded applications• Cross-development tools for Intel® architecture and

multiple target operating systems

Intel® Math Kernel Library (Intel® MKL)

Intel® Integrated Performance Primitives (Intel® IPP)

Intel® Integrated Native Developer Experience (Intel® INDE)

Cross platform meets native performance• Cross-OS, Cross-Architecture, Cross-IDE: C++/Java*

tools and libraries for Windows* and OS X* on Intel® architecture and Android* on ARM* and Intel® architecture

• More Performance, Less Time: Develop native applications faster through code reuse & streamlined access of platform capabilities

Intel® Integrated Performance Primitives (Intel® IPP)

* Intel® DAAL is also available as standalone

NEW!

2

Page 3: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Intel® Math Kernel LibraryIntel® MKL

3

“I’m a C++ and Fortran developer and have high praise for

the Intel® Math Kernel Library. One nice feature I’d like to

stress is the bitwise reproducibility of MKL which helps me

get the assurance I need that I’m getting the same floating

point results from run to run."Franz Bernice

CEO and Senior Developer

MSTC Modern Software Technology

“Intel MKL is indispensable for any high-

performance computer user on x86 platforms.”

Prof. Jack Dongarra

Innovative Computing Lab

University of Tennessee, Knoxville

“The best new feature of Intel® MKL is the Cluster Sparse

Solver. The solution of the sparse linear systems of 106 or

even 107 equations has never been faster.”

Jaroslav Šindler, Researcher

Czech Technical University

Faculty of Mechanical Engineering

Page 4: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Powered by theIntel® Math Kernel Library (Intel® MKL)

4

‒ Speeds math processing in scientific, engineering and financial applications

‒ Functionality for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics and more

‒ Provides scientific programmers and domain scientists

‒ Interfaces to de-facto standard APIs from C++, Fortran, C#, Python and more

‒ Support for Linux*, Windows* and OS X* operating systems

‒ Extract great performance with minimal effort

‒ Unleash the performance of Intel® Core, Intel® Xeon and Intel® Xeon Phi™ product families

‒ Optimized for single core vectorization and cache utilization

‒ Coupled with automatic OpenMP*-based parallelism for multi-core, manycore and coprocessors

‒ TBB compasibility

‒ Scales to PetaFlop (1015 floating-point operations/second) clusters and beyond

‒ Included in Intel® Parallel Studio XE and Intel® System Studio Suites

Used on the World’s Fastest Supercomputers**

**http://www.top500.org

Energy FinancialAnalytics

Engineering Design Digital Content Creation

Science & Research

Signal Processing

Page 5: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Mathematical problems arise in many scientific disciplines

These applications areas typically involve mathematics

‒ Differential equations‒ Linear algebra ‒ Fourier transforms‒ Statistics

5

Intel® MKL is a Computational Math Library

Energy FinancialAnalytics

Science &Research

Engineering Design

SignalProcessing

Digital Content Creation

Intel® MKL helps solve your computational challenges

−𝝏𝒖𝟐

𝝏𝒙𝟐−

𝝏𝒖𝟐

𝝏𝒚𝟐+ 𝒒 𝒖 = 𝒇 𝒙, 𝒚

Page 6: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Linear Algebra

• BLAS

• LAPACK

• ScaLAPACK

• Sparse BLAS

• Intel® MKL PARDISO

• Cluster Sparse Solver

Fast Fourier Transforms

• Multidimensional

• FFTW interfaces

• Cluster FFT

Vector Math

• Trigonometric

• Hyperbolic

• Exponential

• Log

• Power

• Root

Vector RNGs

• Congruential

• Wichmann-Hill

• Mersenne Twister

• Sobol

• Neiderreiter

• Non-deterministic

Summary Statistics

• Kurtosis

• Variation coefficient

• Order statistics

• Min/max

• Variance-covariance

And More

• Splines

• Interpolation

• Trust Region

• Fast Poisson Solver

6

Optimized Mathematical Building Blocks

Page 7: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

7

Automatic Performance Scaling from the Core, to Multicore, to Many Core and Beyond

Extracting performance from the computing resources

‒ Core: vectorization, prefetching, cache utilization

‒ Multi-Many core (processor/socket) level parallelization

‒ Multi-socket (node) level parallelization

‒ Clusters scaling

Page 8: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Optimizations – current and future hardware support ‒ Always highly tuned for the latest Intel® Xeon® processor‒ Tuned for Intel® Xeon PhiTM coprocessor x100 (KNC)‒ Early optimizations for Intel® AVX-512

Features‒ Conditional Numerical Reproducibility‒ Extended Eigensolvers based on and compatible with FEAST1

‒ Parallel Direct Sparse Solver for Clusters‒ Small Matrix Multiply enhancements

8

Notable Enhancements in Intel® MKL 11.0-11.2

Notes:1

http://www.ecs.umass.edu/~polizzi/feast/

Page 9: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Optimized for the latest Intel® Xeon® processors and for Intel® Xeon PhiTM x200 coprocessor(KNL)

Batch GEMM functions

⁻ Improve the performance of multiple, simultaneous matrix multiply operations

⁻ Provides grouping (the same sizes and leading dimensions) and batching across groups

GEMMT functions calculate C = A * S * AT, where S is symmetric and/or diagonal

Sparse BLAS inspector-executor API

⁻ Matrix structure analysis brings performance benefit for relevant applications (i.e. iterative solvers)

⁻ Parallel triangular solver

⁻ Both 0-based and 1-based indexing, row-major and column-major ordering

⁻ Extended BSR support

Counter-based pseudorandom number generators

⁻ ARS-5 based on the Intel AES-NI instruction set

⁻ Philox4x32-10

Intel® MKL PARDISO scalability

⁻ Improved Intel® MKL PARDISO and Cluster Sparse Solver scalability on Intel Xeon Phi coprocessors

Cluster components extension

⁻ MPI wrappers provide compatibility with most MPI implementations including custom ones

⁻ Cluster components support on OS X*9

New Enhancements in Intel® MKL 11.3

Page 10: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Intel® Integrated Performance PrimitivesIntel® IPP

10

"I have extensively used Intel IPP functions in my code to

accelerate development of multi format encoding and

decoding, and realized a significant performance boost for

our video transcoder which resulted in improved user

experiences for media playback."

Jagadish Kamath

Co-founder and Software Architect

River Silica Technologies

Page 11: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

11

Intel® IPP is a Performance Primitives Library

Intel® IPP includes algorithms for various scientific disciplines

Ima

ge

Pro

ce

ssin

g/C

olo

r C

on

ve

rsio

n •Healthcare

•Special effects for photo/video processing

•Object compression/ decompression

•Image scaling, image combination

•Noise reduction

•Optical correction

Co

mp

ute

r V

isio

n •Digital Surveillance

•Industrial/Machine Control

•Image Recognition

•Bio-metric identification

•Remote operation of equipment and gesture interpretation

•Automated sorting of materials or objects

Da

ta C

om

pre

ssio

n •Internet portal data center

•Data storage centers

•Databases

•Enterprise data management

Sig

na

l P

roce

ssin

g •Telecom

•Energy

•Recording, enhancement and playback of audio and non-audio signals

•Echo cancellation: filtering, equalization and emphasis

•Simulation of environment or acoustics

•Games involving sophisticated audio content or effects

Cry

pto

gra

ph

y •Internet portal data center

•Information Security

•Telecom

•Enterprise data management

•Transaction security

•Smart card interfaces

•ID verification

•Copy protection

•Electronic signature

Page 12: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

12

Optimized Performance Building Blocks Signal Processing

•Essential Functions:

•Logical and shift; Arithmetic functions; Conversion; Viterbi Decoder; Windowing; Statistical functions; Sampling

•Filtering Functions:

•Finite impulse response (FIR) filter; Adaptive FIR using least mean squares (LMS) filter; Infinite impulse response (IIR) filter; Median filter

•Transform Functions:

•Fourier transform; Hartley transform; Walsh-Hadamard; Discrete cosine transform; Hilbert transform; Wavelet transform

•String Functions

•Fixed-Accuracy Arithmetic Functions:

•Arithmetic functions; Power and root functions; Exponential and logarithmic functions; Trigonometric functions; Hyperbolic functions; Special functions; Rounding functions

•Data Compression Functions:

•VLC and Huffman coding; Dictionary-based compression; BWT-based compression

•Long Term Evolution Wireless Support Functions

Image Processing

•Image Color Conversion:

•Color model conversions; Color-gray scale conversions; Format Conversion; Color twist; Color keying; Gamma correction; Intensity transformation

•Threshold and Compare Operations

•Morphological Operations

•Filtering Functions:

•Filters with borders; Median filters; General linear filters; Separable filters; Wiener filters; Convolution; Deconvolution; Fixed filters

•Image Linear Transforms

•Image Statistics Functions

•Image Geometry Transforms

•Miscellaneous Image Transforms

•Wavelet Transforms

•Computer Vision:

•Feature detection; Distance transform; Image gradients; Flood fill; Motion analysis and objects ; tracking; Pyramids functions; Universal pyramids functions; Image inpainting; Image segmentation; Pattern recognition; Camera calibration and 3D reconstruction; Image enhancement

Cryptography

•Symmetric Cryptography Primitive Functions:

•Block ciphers: AES, TDES, and SM4

•ARCFour stream cipher

•One-Way Hash Primitives:

•Hash functions for streaming and non-streaming messages

•Hash-based mask generation functions

•Message Authentication Functions:

• Keyed Hash Functions

•Public Key Cryptography Functions:

•RSA Algorithm Functions

•Elliptic Curve Cryptography Functions

Page 13: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Optimized for the latest Intel® Xeon® processors and for Intel® Xeon PhiTM x200 coprocessor(KNL)

New APIs to support external threading⁻ Intel® IPP library primitives are “thread-safe”

⁻ Intel® IPP internal threaded library are available as optional installation.

⁻ The external threading is recommended, which is more effective than the internal threading

New APIs to support external memory allocation ⁻ All memory allocations are done at the application level

⁻ Reduce the memory allocation for different IPP function calls by using a shared memory buffer

Improved CPU dispatcher⁻ Auto-initialization. No need for the CPU initialization call in static libraries

⁻ Code dispatching based on CPU features

Optimized cryptography functions to support SM2/SM3/SM4 algorithm

Custom dynamic library building tool

13

New Enhancements in Intel® IPP 9.0

Page 14: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Intel® Data Analytics Acceleration LibraryIntel® DAAL

14

Page 15: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

New library for data analytics

⁻ Customers: analytics solution providers, system integrators, and application developers (FSI, Telco, Retail, Grid, etc.)

⁻ Key benefits: improved time-to-value, forward-scaling performance and parallelism on IA, advanced analytics building blocks

Key features

⁻ Building blocks highly optimized for IA to support all data analysis stages.

⁻ Support batch, streaming, and distributed processing with easy connectors to popular platforms (Hadoop, Spark)

⁻ Flexible interfaces for handling different data sources (CSV, MySQL, HDFS, RDD (Spark)).

⁻ Rich set of operations to handle sparse and noisy data

⁻ C++ and Java APIs

15

Intel® DAAL is Data Analytics Library

Page 16: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

16

Optimized Analytics Building Blocks

Pre-processing Transformation Analysis Modeling Decision Making

Decompression,Filtering, Normalization

Aggregation,Dimension Reduction

Summary StatisticsClustering, etc.

TrainingParameter Estimation

Simulation

ForecastingDecision Trees, etc.

Sci

en

tifi

c/E

ng

ine

eri

ng

We

b/S

oci

al

Bu

sin

ess

Compute (Server, Desktop, … ) Client EdgeData Source Edge

Validation

Hypothesis testingModel errors

Same data analysis process and analytics building blocks despite variety of data formats, usages and domains

Analytics Targets:

⁻ Perform analysis close to data source (sensor/client/server) to optimize response latency, decrease network bandwidth utilization, and maximize security.

⁻ Offload data to server/cluster for complex and large-scale analytics only.

Page 17: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

17

Big Data Analytics Usage ModelsS

cien

tific/En

gin

ee

ring

We

b/S

ocia

l

Bu

sine

ss

17

1. Pre-Processing

2. Analysis

3. Modeling4. Model

Validation

5. Visualization

1. Pre-processing

2. Model Calibration

3. Decision Making

4. Model Validation

5. Reporting

Interactive Modeling Visualization & Reporting

Production Use

C++, Java

Software

DAAL

Software

DAAL

Visual IDE

Collaboration Services

Data Store

Infrastructure (Comms, Security)

Management (Monitoring, Rule

Management)

Client Edge

Data Source Edge

C++, Java

Data Center

Workstation

SQL, noSQL, Files,Sensor Data

C++, Java

Intel Confidential

C++, Java

Page 18: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

C++ and Java programming languages API

Data mining and analysis algorithms for

⁻ Computing correlation distance and Cosine distance

⁻ PCA (Correlation, SVD)

⁻ Matrix decomposition (SVD, QR, Cholesky)

⁻ Computing statistical moments

⁻ Computing variance-covariance matrices

⁻ Univariate and multivariate outlier detection

⁻ Association rule mining

Algorithms for supervised and unsupervised machine learning

⁻ Linear regressions

⁻ Naïve Bayes classifier

⁻ AdaBoost, LogitBoost, and BrownBoostclassifiers

⁻ SVM

⁻ K-Means clustering

⁻ Expectation Maximization (EM) for Gaussian Mixture Models (GMM)

Support for serialization/deserialization

Support for local and distributed data sources

⁻ Batch

⁻ Distributed

⁻ Streaming

Support for local and distributed data sources

⁻ In-file and in-memory CSV

⁻ MySQL

⁻ HDFS

⁻ RDD

Data compression and decompression

⁻ ZLIB

⁻ LZO

⁻ RLE

⁻ BZIP2

18

Intel® DAAL 2016 Beta FeaturesOptimized for the latest Intel® Xeon® processors and for Intel® Xeon PhiTM x200 coprocessor(KNL)

Page 19: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Intel® MKL and MKL Forum pages

• http://software.intel.com/en-us/articles/intel-mkl/

• http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/

• http://software.intel.com/en-us/forums/intel-math-kernel-library/

Intel® IPP and IPP Forum pages:

• https://software.intel.com/en-us/intel-ipp

• https://software.intel.com/en-us/forums/intel-integrated-performance-primitives/

Intel® DAAL Forum page:

• https://software.intel.com/en-us/forums/intel-data-analytics-acceleration-library

References

19

Page 20: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

20

Page 21: Intel® Performance Libraries - sscc.ruR)_performance... · ‒Early optimizations for Intel® AVX-512 ... ⁻ ARS-5 based on the Intel AES-NI instruction set ... accelerate development