83
Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Introduction to Parallel ComputingIntel Math Kernel Library

Huan-Ting Yen, Department of Mathematics, National Taiwan University2011/07/22

Page 2: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

2011/07/22Introduction to Parallel Computing2

Parallel Computing

Page 3: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

What is parallel computing?

2011/07/22Introduction to Parallel Computing3

Traditionally, software has been written for serial computation:

Page 4: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

What is parallel computing?

2011/07/22Introduction to Parallel Computing4

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

Page 5: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Resource

2011/07/22Introduction to Parallel Computing5

The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.

Core 1 Core 2 Core 3

Core 4

thread 1 thread 3 thread 4thread 2

Page 6: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Resource

2011/07/22Introduction to Parallel Computing6

The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.

core 4core 3core 2core 1

several threads

several threads

several threads

several threads

Page 7: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Resource

2011/07/22Introduction to Parallel Computing7

The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.

Page 8: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Resource

2011/07/22Introduction to Parallel Computing8

The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.

Page 9: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Why use parallel computing?

2011/07/22Introduction to Parallel Computing9

The primary reasons for using parallel computing: Save time – wall clock time Solve larger problems Provide concurrency (do many things at the same time)

Other reasons might include: Taking advantage of non-local resources Cost savings Overcoming memory constraints

Page 10: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Amdahl’s Law

2011/07/22Introduction to Parallel Computing10

Speedup of a parallel program is limited by amount of serial works.

Page 11: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Amdahl’s Law

2011/07/22Introduction to Parallel Computing11

Speedup of a parallel program is limited by amount of serial works.

Page 12: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Flynn’s Taxonomy

2011/07/22Introduction to Parallel Computing12

Classification for parallel computers and programs

Single Instruction Multiple Instruction

Single Data SISD(single core CPU)

MISD(very rare)

Multiple Data SIMD(GPU/vector processor)

MIMD(multiple core CPU)

Page 13: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Flynn’s Taxonomy

2011/07/22Introduction to Parallel Computing13

Classification for parallel computers and programs

SISD SIMD

Page 14: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Flynn’s Taxonomy

2011/07/22Introduction to Parallel Computing14

Classification for parallel computers and programs

MISD MIMD

Page 15: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

2011/07/22Introduction to Parallel Computing15

Intel Math Kernel Library

Page 16: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Overview

2011/07/22Intel MKL Quickstart16

The Intel® Math Kernel Library (Intel® MKL) provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices. The library also includes fast Fourier transform (FFT) functions, as well as vector mathematical and vector statistical functions with Fortran and C interfaces.

The versions of Intel MKL intended for Windows* and Linux* operating systems also include ScaLAPACK software and Cluster FFT software for solving respective computational problems on distributed-memory parallel computers.

Page 17: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Intel MKL: Intel Math Kernel Library

2011/07/22Intel MKL Quickstart17

Functionality BLAS and Sparse BLAS Routines LAPACK Routines: Linear Equations LAPACK Routines: Eigenvalue Problems ScaLAPACK Sparse Solver Routines Fast Fourier Transforms Cluster Fast Fourier Transforms

Page 18: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

System Requirements (Hardware)

2011/07/22Intel MKL Quickstart18

Hardware: Intel® Core™ processor family Intel® Xeon® processor family Intel® Pentium® 4 processor family Intel® Pentium® lll processor Intel® Pentium® processor (300 MHz or faster) Intel® Celeron® processor AMD Athlon* and Opteron* processors

How do you know that information about the CPUs ? $ cat /proc/cpuinfo

Page 19: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

System Requirements (Software)

2011/07/22Intel MKL Quickstart19

Following is the list of supposed operating system: Red Hat* Enterprise Linux* 3, 4, 5 Red Hat* Fedora* 9 Debian* GNU/Linux 4.0 Ubuntu* 8.04

How do you know that information about the operating system? $ cat /etc/*release

Following is the list of supposed C/C++ and Fortran compilers: Intel® Fortran Compiler 10.1 for Linux* Intel® C++ Compiler 10.1 for Linux* GNU Compiler Collection (gcc, g77, gfortran 4.2.0)

Page 20: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart20

Tools & Downloads http://software.intel.com/en-us/ (google “intel software”)

Page 21: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart21

Page 22: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart22

Page 23: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart23

Page 24: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart24

Page 25: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart25

Page 26: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart26

Page 27: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart27

user@host:~/software$ wget “URL”

user@host:~/software$ ll

$ tar –zxvf l_mkl_p_10.2.x.yyy.tar.gz

Page 28: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart28

cd l_mkl_p_10.2.x.yyy ./install.sh

Page 29: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart29

Page 30: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Installing Intel MKL on a Linux* System

2011/07/22Intel MKL Quickstart30

Page 31: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

31 Intel MKL Quickstart

Some Examples

Page 32: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart32

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 33: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart33

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 34: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex1. The complex dot product ( )

2011/07/22Intel MKL Quickstart34

#include <stdio.h>#include "mkl_blas.h”#define N 5

typedef struct{ double re; double im;}mkl_complex;

int main(){ int n, incx = 1, incy = 1, i; mkl_complex x[N], y[N], res; void zdotc(); n = N; for( i = 0; i < n; i++ ){ x[i].re = (double)i; x[i].im = (double)i * 2.0; y[i].re = (double)(n - i); y[i].im = (double)i * 2.0; } zdotc( &res, &n, x, &incx, y, &incy ); printf( “The complex dot product is: ( %6.2f, %6.2f )\n", res.re, res.im ); return 0;}

Page 35: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?dotc

2011/07/22Intel MKL Quickstart35

Computes a dot product of a conjugate vector with another vector.

Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h

Input Parameters ( zdotc(&res,&n,x,&incx,y,&incy) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y

output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result

Page 36: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Makefile (Sequential)

2011/07/22Intel MKL Quickstart36

Test : blas_c

CC = icc

MKL_HOME = /home/opt/intel/mkl/10.2.2.025

MKL_INCLUDE = $(MKL_HOME)/include

MKL_PATH = $(MKL_HOME)/lib/em64t

EXE = blas_c.exe

blas_c:

$(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH)

-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

Page 37: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Makefile (Parallel)

2011/07/22Intel MKL Quickstart37

Test = blas_c

CC = icc

MKL_HOME = /home/opt/intel/mkl/10.2.2.025

MKL_INCLUDE = $(MKL_HOME)/include

MKL_PATH = $(MKL_HOME)/lib/em64t

EXE = blas_c.exe

blas_c:

$(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH)

-Wl,--start-group -lmkl_intel_lp64 -lmkl_core

-lmkl_intel_thred -Wl,--end-group –liomp5 -lpthread

Page 38: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?dotc

2011/07/22Intel MKL Quickstart38

Computes a dot product of a conjugate vector with another vector.

Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h

Input Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y

output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result

Page 39: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

BLAS Routines

2011/07/22Intel MKL Quickstart39

Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> ()

The <character> filed indicates the data type:s real, single precisionc complex, single precisiond real, double precisionz complex, double precision

The <mode> filed indicates the data type:c conjugated vectoru unconjugated vectorg Givens rotation.

Page 40: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

BLAS Routines

2011/07/22Intel MKL Quickstart40

Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> ()

In BLAS level 2 and 3, <name> filed indicates the matrix type:ge general matrixgb general band matrixsy symmetric matrixsb symmetric band matrixhe Hermitian matrixhb Hermitian band matrixtr triangular matrixtb triangular band matrix

Page 41: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

BLAS Level 1 Routines

2011/07/22Intel MKL Quickstart41

Routine Data Type Description

?asum s, d, sc, dz Sum of vector magnitudes

?axpy s, d, c, z Scalar-vector product

?copy s, d, c, z Copy vector

?dot s, d Doc product

?dotc c, z Doc conjugated

?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm)

?rotg s, d, cs, zd Givens rotation of points

?rot s, d, cs, zd Plane rotation of points

?scal s, d, c, z, cs, zd

Vector-scalar product

?swap s, d, c, z Vector-vector swap

i?max s, d, c, z Index of the maximum absolute value element of a vector

Page 42: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart42

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 43: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex2-1. Matrix-vector product

2011/07/22Intel MKL Quickstart43

#include "mkl_blas.h”

int main(){ int m, n, incx, incy, lda, idxi, idxj; double alpha, beta, *x, *y, *A ; char trans; m = 3; n = 3; incx = 1; incy = 1; lda = m; alpha = 1.0; beta = 1.0; trans = 'n’;

x = (double*)malloc(sizeof(double)*n); y = (double*)malloc(sizeof(double)*n); A = (double*)malloc(sizeof(double)*m*n);

Page 44: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex2-2. Matrix-vector product

2011/07/22Intel MKL Quickstart44

for( idxi = 0; idxi < n; idxi++ ){

*(x+idxi) = 1.0;

*(y+idxi) = 1.0;

}

for( idxi = 0; idxi < m; idxi++ )

for( idxj = 0; idxj < n; idxj++)

*(A+idxi*m+idxj) = (double)(idxi+1) + idxj;

dgemv(&trans, &m, &n, &alpha, A, &lda, x, &incx, &beta, y, &incy);

return 0;

}

Page 45: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?gemv

2011/07/22Intel MKL Quickstart45

Computes a matrix-vector product using a general matrix. Description : The routine is declared in

Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h

Input Parameters dgemv(&trans,&m,&n,&alpha,A,&lda,x,&incx,&beta,y,&incy)

trans: if trans = ‘N’, ‘n’, then if trans = ‘T’, ‘t’, then if trans = ‘C’, ‘c’, then m: The number of rows of the matrix A .

Page 46: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?gemv

2011/07/22Intel MKL Quickstart46

Input Parameters n: The number of columns of the matrix lda: The first dimension of matrix, lda = max(1,m) incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y

output Parameters y: Updated vector y.

Page 47: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex2. Result

Vectors and PlanesIntroduction to MATLAB47

Page 48: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

BLAS Level 2 Routines

2011/07/22Intel MKL Quickstart48

Routine Data Type Description

?gemv s, d, c, z Matrix-vector product using a general matrix

?gbmv s, d, c, z Matrix-vector product using a general band matrix

?symv s, d Matrix-vector product using a symmetric matrix

?sbmv s, d Matrix-vector product using a symmetric band matrix

?hemv c, z Matrix-vector product using a Hermitian matrix

?hbmv c, z Matrix-vector product using a Hermitian band matrix

?trmv c, z Matrix-vector product using a triangular matrix

?tbmv s, d, sc, dz Matrix-vector product using a triangular band matrix

Page 49: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart49

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 50: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex3-1. Matrix-Matrix product

2011/07/22Intel MKL Quickstart50

#include "mkl_blas.h”

int main(){ int m, n, k, lda, ldb, ldc, idxi, idxj; double alpha, beta, *A, *B, *C ; char transa, transb; m = 3; n = 3; k = 3; lda = m; ldb = k; ldc = m; alpha = 1.0; beta = 1.0; transa = 'n’; transb = 'n’;

Page 51: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex3-2. Matrix-vector product

2011/07/22Intel MKL Quickstart51

A = (double*)malloc(sizeof(double)*m*n);

B = (double*)malloc(sizeof(double)*m*n);

C = (double*)malloc(sizeof(double)*m*n);

for( idxi = 0; idxi < m; idxi++ )

for( idxj = 0; idxj < n; idxj++)

{

*(A+idxi*m+idxj) = (double)(idxi+1) + idxj;

*(B+idxi*m+idxj) = (double)(idxi+1) + idxj;

*(C+idxi*m+idxj) = (double)(idxi+1) + idxj;

}

dgemm(&transa, &transb, &m, &n, &k,

&alpha, A, &lda, B, &ldb, &beta, C, &ldc);

return 0;

}

Page 52: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?gemm

2011/07/22Intel MKL Quickstart52

Input Parameters k: The number of columns of the matrix and the number

of rows of the matrix . lda: When transa=‘N’ or ‘n’, then lda = max(1,m),otherwise lda=max(1,k).

ldb: When transa=‘N’ or ‘n’, then ldb = max(1,k),otherwise lda=max(1,n).

ldc: The first dimension of matrix, ldc = max(1,m) output Parameters

C: Overwritten by m-by-n matrix.

Page 53: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex3. Result

Vectors and PlanesIntroduction to MATLAB53

Page 54: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

BLAS Level 3 Routines

2011/07/22Intel MKL Quickstart54

Routine Data Type Description

?gemm s, d, c, z Matrix-matrix product of general matrices

?hemv c, z Matrix-matrix product of Hermitian matrices

?symm s, d, c, z Matrix-matrix product of symmetric matrices

?trmm s, d, sc, dz Matrix-matrix product of triangular matrices

Page 55: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart55

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 56: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex4. LU Factorization

2011/07/22Intel MKL Quickstart56

#include "mkl_lapack.h”

int main(){ int m, n, lda, info, idxi, idxj, *ipiv; double *A; m = 3; n = 3; lda = m; ipiv = (int*)malloc(sizeof(int)*m); A = (double*)malloc(sizeof(double)*m*n); *(A+0)=1; *(A+1)=2; *(A+2)=6; *(A+3)=-2; *(A+4)=3; *(A+5)=5; *(A+6)=4; *(A+7)=8; *(A+8)=1;

dgetrf(&m, &n, A, &lda ,ipiv, &info); return 0;}

Page 57: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?getrf

2011/07/22Intel MKL Quickstart57

Description : The routine is declared in Fortran77 : mkl_lapack.fi Fortran95 : lapack.f90 C : mkl_lapack.h

Input Parameters m: The number of columns of the matrix . n: The number of rows of the matrix . lda: The first dimension of matrix . A: Array, REAL for sgetrf DOUBLE PRECISION for dgetrf COMPLEX for cgetrf DOUBLE COMPLEX for zgetrf.

Page 58: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?getrf

2011/07/22Intel MKL Quickstart58

output Parameters A: Overwritten by L and U. The unit diagonal elements of L

are not stored. ipiv: An integer array, dimension at least max(1,min(m,n)). The pivot indices; row i is interchanged with row

ipiv(i) info: Integer. If info=0,the execution is successful. If info=-i,the i-th parameter had an illegal value. If info=i, The factorization has been completed, but U is singular.

Page 59: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex4-1. Result

Vectors and PlanesIntroduction to MATLAB59

Page 60: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex4-2. Result

Vectors and PlanesIntroduction to MATLAB60

Page 61: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

LAPACK Computational Routines

2011/07/22Intel MKL Quickstart61

generalmatrix

sysmmetricindefinite

sysmmetricpositive-definite

triangularmatrix

Factorize matrix ?getrf ?sytrf ?potrf

Solve linear systemwith a factored matrix

?getrs ?sytrs ?potrs ?trtrs

Condition number ?gecon ?sycon ?pocon ?trcon

Compute the inverse matrix using the factorization

?getri ?sytri ?potri ?trtri

Page 62: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

LAPACK Routines: Linear Equations

2011/07/22Intel MKL Quickstart62

To solve a particular problem, you can call two or more computational routines or call a corresponding driver routines that combines several tasks in one call. For example, to solve a system of linear equation with a general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Alternatively, use the driver routine ?gesv that performs all these tasks in one call.

Page 63: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart63

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 64: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex5-1. Solve the Linear Eqation

2011/07/22Intel MKL Quickstart64

#include <stdio.h>#include "mkl_lapack.h”

int main(){ int n, nrhs, lda, ldb, info, idxi, idxj, *ipiv; double *A, *b; n = 3; nrhs = 1; lda = n; ldb = n; ipiv = (int*)malloc(sizeof(int)*n); A = (double*)malloc(sizeof(double)*n*n); b = (double*)malloc(sizeof(double)*n); for( idxi = 0; idxi < n; idxi++ ) for( idxj = 0; idxj < n; idxj++)*(A+idxi*n+idxj) = (double)(idxi+1) + idxj;

Page 65: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex5. Solve the Linear Eqation

2011/07/22Intel MKL Quickstart65

*(b+0) = 6;

*(b+1) = 9;

*(b+2) = 12;

dgesv(&n, &nrhs, A, &lda ,ipiv, b, &ldb, &info);

return 0;

}

Page 66: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?gesv

2011/07/22Intel MKL Quickstart66

Input Parameters nrhs: The number of columns of the matrix .

Output Parameters A: Overwritten by the factor L and U from the factorization

of . b: Overwritten by the solution matrix .

Page 67: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex5. Result

Vectors and PlanesIntroduction to MATLAB67

Page 68: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart68

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 69: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex6-1. Solve the Eigen Eqation

2011/07/22Intel MKL Quickstart69

#include "mkl_lapack.h”

int main(){ int n, lda, lwork, ldvl, ldvr, info, idxi, idxj; double *wr, *wi, *A, *work, *vl, *vr; char jobvl, jobvr; n = 3; lda = n; ldvl = 1; ldvr = n; lwork = 4*n; // not 3*n jobvl = ‘N’; jobvr = ‘V’; A = (double*)malloc(sizeof(double)*n*n); wr = (double*)malloc(sizeof(double)*n); wi = (double*)malloc(sizeof(double)*n); vl = (double*)malloc(sizeof(double)*ldvl*n); vr = (double*)malloc(sizeof(double)*ldvr*n); work = (double*)malloc(sizeof(double)*lwork);

Page 70: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex6-2. Solve the Eigen Eqation

2011/07/22Intel MKL Quickstart70

*(A+0) = 2;

*(A+1) = -1;

*(A+2) = 0;

*(A+3) = -1;

*(A+4) = 2;

*(A+5) = -1;

*(A+6) = 0;

*(A+7) = -1;

*(A+8) = 2;

dgeev(&jobvl, &jobvr, &n, A, &lda, &wr, &wi,

vl, &ldvl, vr, &ldvr, work, &lwork, &info);

return 0;

}

Page 71: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?geev

2011/07/22Intel MKL Quickstart71

Input Parameters jobvl: If jobvl=‘N’, the left eigenvalues of A are not

computed. If jobvl=‘V’, the left eigenvalues of A are computed. jobvr: If jobvr=‘N’, the right eigenvalues of A are not

computed. If jobvr=‘V’, the right eigenvalues of A are computed. work: A workspace array, its dimension max(1, lwork). lwork: The dimension of the array work. lwork ≥ max(1,3n), lwork < max(1,4n)(for real). ldvl, ldvr: The leading dimension of the output array vl and vr, respectively.

Page 72: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

?geev

2011/07/22Intel MKL Quickstart72

Output Parameters wr, wi: Contain the real and imaginary parts, respectively, of the

computed eigenvalue. vl, vr: If jobvl = ‘V’, the left eigenvectors u(j) are

stored one after another in the columns of vl, in the same order as their eigenvalues.

If jobvl = ‘N’, vl is not referenced. If the j-th eigenvalue is real, then u(j) = vl(:,j), the j-th column of vl. info: info=0, the execution is successful.

info=-i, the i-th parameter had an illegal value. info= i, then the QR algorithm failed to compute all the eigenvalues, and no eigenvector have been computed.

Page 73: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex6. Result

Vectors and PlanesIntroduction to MATLAB73

Page 74: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

LAPACK Computational Routines

2011/07/22Intel MKL Quickstart74

Orthogonal Factorizations (QR, QZ) Singular Value Decomposition Symmetric Eigenvalue Problems Generalized Symmetric-Definite Eigenvalue Problems Nonsymmetric Eigenvalue Problems Generalized Nonsymmetric Eigenvalue Problems Generalized Singular Value Decomposition

Page 75: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

LAPACK Driver Routines

2011/07/22Intel MKL Quickstart75

Linear Least Squares (LLS) Problems Generalized LLS Problems Symmetric Eigenproblems Nonsymmetric Eigenproblems Singular Value Decomposition Generalized Symmetric Definite Eigenproblems Generalized Nonsymmetric Eigenproblems

Page 76: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Example

2011/07/22Intel MKL Quickstart76

Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms

Page 77: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Five Stage Usage Model for Computing FFT

2011/07/22Intel MKL Quickstart77

Allocate a fresh descriptor for the problem with a call to the DftiCreateDescriptor function. (precision, rank, sizes, scaling factor, …)

Optionally adjust the descriptor configuration with a call to the DftiSetValue function.

Commit the descriptor with a call to the DftiCommitDescriptor function.

Compute the transform with a call to the DftiComputeForward/DftiComputeBackward function.

Deallocate the descriptor with a call to the DftiFreeDescriptor function.

Page 78: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex7-1. Three-Dimensional Complex FFT

2011/07/22Intel MKL Quickstart78

#include "mkl_dfti.h”

#define m 1000#define n 1000#define k 1000

typedef struct{ double re; double im;} mkl_complex;

int main(){ int idxi, idxj, idxk; double backward_scale; MKL_LONG status, length[3]; mkl_complex *vec_src, *vec_tmp, *vec_dst; DFTI_DESCRIPTOR_HANDLE handle = 0;

Page 79: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex7-2. Three-Dimensional Complex FFT

2011/07/22Intel MKL Quickstart79

x_src = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_tmp = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_dst = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k);

length[0] = m; length[1] = n; length[2] = k;

memset(x_src, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_tmp, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_dst, 0, sizeof(sizeof(mkl_complex)*m*n*k));

for(idxk=0; idxk<k; idxk++) for(idxj=0; idxj<n; idxj++)

for(idxi=0; idxi<m; idxi++) { (x_src+idxk*k*n+idxj*n+idxi)->re=1.0; (x_src+idxk*k*n+idxj*n+idxi)->im=0.0; }

Page 80: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Ex7-3. Three-Dimensional Complex FFT

2011/07/22Intel MKL Quickstart80

status = DftiCreateDescriptor( &handle, DFTI_DOUBLE,

DFTI_COMPLEX, 3, length );

if(status && !DftiErrorClass(status, DFTI_NO_ERROR))

{

printf("Error : %s\n", DftiErrorMessage(status));

printf("TEST FAILED : DftiCreatDescriptor(&hand, ...)\n");

}

status = DftiSetValue( handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE );

status = DftiCommitDescriptor( handle );

status = DftiComputeForward( handle, vec_src, vec_tmp );

backward_scale = 1.0/((double)m*n*k);

status = DftiSetValue( handle, DFTI_BACKWARD_SCALE, backward_scale );

status = DftiCommitDescriptor( handle );

status = DftiComputeBackward( handle, vec_tmp, vec_dst);

status = DftiFreeDescriptor( &handle );

return 0;

}

Page 81: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

FFT Functions

2011/07/22Intel MKL Quickstart81

Function Name Operation

DftiCreateDescriptorAllocates memory for the descriptor data structure and preliminarily initializes it.

DftiCommitDescriptorPerforms all initialization for the actual FFT computation.

DftiCopyDescriptorCopies an existing descriptor.

DftiFreeDescriptorFrees memory allocated for a descriptor.

DftiComputeForwardComputes the forward FFT.

DftiComputeBackwardComputes the backward FFT.

DftiSetValueSets one particular configuration parameter with the specified configuration value.

DftiGetValueGets the value of one particular configuration parameter.

Page 82: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

82 Intel MKL Quickstart

Reference

Web site form LLNL tutorials (https://computing.llnl.gov/tutorials/parallel_comp/)

Intel® Math Kernel Library Reference Manual (mklman.pdf) Intel® Math Kernel Library for the Linux OS User’s Guide

(userguide.pdf)

Reference

Page 83: Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

Vectors and PlanesIntroduction to MATLAB83