View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to Parallel ComputingIntel Math Kernel Library
Huan-Ting Yen, Department of Mathematics, National Taiwan University2011/07/22
2011/07/22Introduction to Parallel Computing2
Parallel Computing
What is parallel computing?
2011/07/22Introduction to Parallel Computing3
Traditionally, software has been written for serial computation:
What is parallel computing?
2011/07/22Introduction to Parallel Computing4
In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:
Resource
2011/07/22Introduction to Parallel Computing5
The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.
Core 1 Core 2 Core 3
Core 4
thread 1 thread 3 thread 4thread 2
Resource
2011/07/22Introduction to Parallel Computing6
The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.
core 4core 3core 2core 1
several threads
several threads
several threads
several threads
Resource
2011/07/22Introduction to Parallel Computing7
The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.
Resource
2011/07/22Introduction to Parallel Computing8
The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.
Why use parallel computing?
2011/07/22Introduction to Parallel Computing9
The primary reasons for using parallel computing: Save time – wall clock time Solve larger problems Provide concurrency (do many things at the same time)
Other reasons might include: Taking advantage of non-local resources Cost savings Overcoming memory constraints
Amdahl’s Law
2011/07/22Introduction to Parallel Computing10
Speedup of a parallel program is limited by amount of serial works.
Amdahl’s Law
2011/07/22Introduction to Parallel Computing11
Speedup of a parallel program is limited by amount of serial works.
Flynn’s Taxonomy
2011/07/22Introduction to Parallel Computing12
Classification for parallel computers and programs
Single Instruction Multiple Instruction
Single Data SISD(single core CPU)
MISD(very rare)
Multiple Data SIMD(GPU/vector processor)
MIMD(multiple core CPU)
Flynn’s Taxonomy
2011/07/22Introduction to Parallel Computing13
Classification for parallel computers and programs
SISD SIMD
Flynn’s Taxonomy
2011/07/22Introduction to Parallel Computing14
Classification for parallel computers and programs
MISD MIMD
2011/07/22Introduction to Parallel Computing15
Intel Math Kernel Library
Overview
2011/07/22Intel MKL Quickstart16
The Intel® Math Kernel Library (Intel® MKL) provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices. The library also includes fast Fourier transform (FFT) functions, as well as vector mathematical and vector statistical functions with Fortran and C interfaces.
The versions of Intel MKL intended for Windows* and Linux* operating systems also include ScaLAPACK software and Cluster FFT software for solving respective computational problems on distributed-memory parallel computers.
Intel MKL: Intel Math Kernel Library
2011/07/22Intel MKL Quickstart17
Functionality BLAS and Sparse BLAS Routines LAPACK Routines: Linear Equations LAPACK Routines: Eigenvalue Problems ScaLAPACK Sparse Solver Routines Fast Fourier Transforms Cluster Fast Fourier Transforms
System Requirements (Hardware)
2011/07/22Intel MKL Quickstart18
Hardware: Intel® Core™ processor family Intel® Xeon® processor family Intel® Pentium® 4 processor family Intel® Pentium® lll processor Intel® Pentium® processor (300 MHz or faster) Intel® Celeron® processor AMD Athlon* and Opteron* processors
How do you know that information about the CPUs ? $ cat /proc/cpuinfo
System Requirements (Software)
2011/07/22Intel MKL Quickstart19
Following is the list of supposed operating system: Red Hat* Enterprise Linux* 3, 4, 5 Red Hat* Fedora* 9 Debian* GNU/Linux 4.0 Ubuntu* 8.04
How do you know that information about the operating system? $ cat /etc/*release
Following is the list of supposed C/C++ and Fortran compilers: Intel® Fortran Compiler 10.1 for Linux* Intel® C++ Compiler 10.1 for Linux* GNU Compiler Collection (gcc, g77, gfortran 4.2.0)
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart20
Tools & Downloads http://software.intel.com/en-us/ (google “intel software”)
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart21
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart22
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart23
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart24
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart25
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart26
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart27
user@host:~/software$ wget “URL”
user@host:~/software$ ll
$ tar –zxvf l_mkl_p_10.2.x.yyy.tar.gz
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart28
cd l_mkl_p_10.2.x.yyy ./install.sh
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart29
Installing Intel MKL on a Linux* System
2011/07/22Intel MKL Quickstart30
31 Intel MKL Quickstart
Some Examples
Example
2011/07/22Intel MKL Quickstart32
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Example
2011/07/22Intel MKL Quickstart33
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex1. The complex dot product ( )
2011/07/22Intel MKL Quickstart34
#include <stdio.h>#include "mkl_blas.h”#define N 5
typedef struct{ double re; double im;}mkl_complex;
int main(){ int n, incx = 1, incy = 1, i; mkl_complex x[N], y[N], res; void zdotc(); n = N; for( i = 0; i < n; i++ ){ x[i].re = (double)i; x[i].im = (double)i * 2.0; y[i].re = (double)(n - i); y[i].im = (double)i * 2.0; } zdotc( &res, &n, x, &incx, y, &incy ); printf( “The complex dot product is: ( %6.2f, %6.2f )\n", res.re, res.im ); return 0;}
?dotc
2011/07/22Intel MKL Quickstart35
Computes a dot product of a conjugate vector with another vector.
Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h
Input Parameters ( zdotc(&res,&n,x,&incx,y,&incy) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y
output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result
Makefile (Sequential)
2011/07/22Intel MKL Quickstart36
Test : blas_c
CC = icc
MKL_HOME = /home/opt/intel/mkl/10.2.2.025
MKL_INCLUDE = $(MKL_HOME)/include
MKL_PATH = $(MKL_HOME)/lib/em64t
EXE = blas_c.exe
blas_c:
$(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH)
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
Makefile (Parallel)
2011/07/22Intel MKL Quickstart37
Test = blas_c
CC = icc
MKL_HOME = /home/opt/intel/mkl/10.2.2.025
MKL_INCLUDE = $(MKL_HOME)/include
MKL_PATH = $(MKL_HOME)/lib/em64t
EXE = blas_c.exe
blas_c:
$(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH)
-Wl,--start-group -lmkl_intel_lp64 -lmkl_core
-lmkl_intel_thred -Wl,--end-group –liomp5 -lpthread
?dotc
2011/07/22Intel MKL Quickstart38
Computes a dot product of a conjugate vector with another vector.
Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h
Input Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y
output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result
BLAS Routines
2011/07/22Intel MKL Quickstart39
Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> ()
The <character> filed indicates the data type:s real, single precisionc complex, single precisiond real, double precisionz complex, double precision
The <mode> filed indicates the data type:c conjugated vectoru unconjugated vectorg Givens rotation.
BLAS Routines
2011/07/22Intel MKL Quickstart40
Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> ()
In BLAS level 2 and 3, <name> filed indicates the matrix type:ge general matrixgb general band matrixsy symmetric matrixsb symmetric band matrixhe Hermitian matrixhb Hermitian band matrixtr triangular matrixtb triangular band matrix
BLAS Level 1 Routines
2011/07/22Intel MKL Quickstart41
Routine Data Type Description
?asum s, d, sc, dz Sum of vector magnitudes
?axpy s, d, c, z Scalar-vector product
?copy s, d, c, z Copy vector
?dot s, d Doc product
?dotc c, z Doc conjugated
?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm)
?rotg s, d, cs, zd Givens rotation of points
?rot s, d, cs, zd Plane rotation of points
?scal s, d, c, z, cs, zd
Vector-scalar product
?swap s, d, c, z Vector-vector swap
i?max s, d, c, z Index of the maximum absolute value element of a vector
Example
2011/07/22Intel MKL Quickstart42
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex2-1. Matrix-vector product
2011/07/22Intel MKL Quickstart43
#include "mkl_blas.h”
int main(){ int m, n, incx, incy, lda, idxi, idxj; double alpha, beta, *x, *y, *A ; char trans; m = 3; n = 3; incx = 1; incy = 1; lda = m; alpha = 1.0; beta = 1.0; trans = 'n’;
x = (double*)malloc(sizeof(double)*n); y = (double*)malloc(sizeof(double)*n); A = (double*)malloc(sizeof(double)*m*n);
Ex2-2. Matrix-vector product
2011/07/22Intel MKL Quickstart44
for( idxi = 0; idxi < n; idxi++ ){
*(x+idxi) = 1.0;
*(y+idxi) = 1.0;
}
for( idxi = 0; idxi < m; idxi++ )
for( idxj = 0; idxj < n; idxj++)
*(A+idxi*m+idxj) = (double)(idxi+1) + idxj;
dgemv(&trans, &m, &n, &alpha, A, &lda, x, &incx, &beta, y, &incy);
return 0;
}
?gemv
2011/07/22Intel MKL Quickstart45
Computes a matrix-vector product using a general matrix. Description : The routine is declared in
Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h
Input Parameters dgemv(&trans,&m,&n,&alpha,A,&lda,x,&incx,&beta,y,&incy)
trans: if trans = ‘N’, ‘n’, then if trans = ‘T’, ‘t’, then if trans = ‘C’, ‘c’, then m: The number of rows of the matrix A .
?gemv
2011/07/22Intel MKL Quickstart46
Input Parameters n: The number of columns of the matrix lda: The first dimension of matrix, lda = max(1,m) incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y
output Parameters y: Updated vector y.
Ex2. Result
Vectors and PlanesIntroduction to MATLAB47
BLAS Level 2 Routines
2011/07/22Intel MKL Quickstart48
Routine Data Type Description
?gemv s, d, c, z Matrix-vector product using a general matrix
?gbmv s, d, c, z Matrix-vector product using a general band matrix
?symv s, d Matrix-vector product using a symmetric matrix
?sbmv s, d Matrix-vector product using a symmetric band matrix
?hemv c, z Matrix-vector product using a Hermitian matrix
?hbmv c, z Matrix-vector product using a Hermitian band matrix
?trmv c, z Matrix-vector product using a triangular matrix
?tbmv s, d, sc, dz Matrix-vector product using a triangular band matrix
Example
2011/07/22Intel MKL Quickstart49
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex3-1. Matrix-Matrix product
2011/07/22Intel MKL Quickstart50
#include "mkl_blas.h”
int main(){ int m, n, k, lda, ldb, ldc, idxi, idxj; double alpha, beta, *A, *B, *C ; char transa, transb; m = 3; n = 3; k = 3; lda = m; ldb = k; ldc = m; alpha = 1.0; beta = 1.0; transa = 'n’; transb = 'n’;
Ex3-2. Matrix-vector product
2011/07/22Intel MKL Quickstart51
A = (double*)malloc(sizeof(double)*m*n);
B = (double*)malloc(sizeof(double)*m*n);
C = (double*)malloc(sizeof(double)*m*n);
for( idxi = 0; idxi < m; idxi++ )
for( idxj = 0; idxj < n; idxj++)
{
*(A+idxi*m+idxj) = (double)(idxi+1) + idxj;
*(B+idxi*m+idxj) = (double)(idxi+1) + idxj;
*(C+idxi*m+idxj) = (double)(idxi+1) + idxj;
}
dgemm(&transa, &transb, &m, &n, &k,
&alpha, A, &lda, B, &ldb, &beta, C, &ldc);
return 0;
}
?gemm
2011/07/22Intel MKL Quickstart52
Input Parameters k: The number of columns of the matrix and the number
of rows of the matrix . lda: When transa=‘N’ or ‘n’, then lda = max(1,m),otherwise lda=max(1,k).
ldb: When transa=‘N’ or ‘n’, then ldb = max(1,k),otherwise lda=max(1,n).
ldc: The first dimension of matrix, ldc = max(1,m) output Parameters
C: Overwritten by m-by-n matrix.
Ex3. Result
Vectors and PlanesIntroduction to MATLAB53
BLAS Level 3 Routines
2011/07/22Intel MKL Quickstart54
Routine Data Type Description
?gemm s, d, c, z Matrix-matrix product of general matrices
?hemv c, z Matrix-matrix product of Hermitian matrices
?symm s, d, c, z Matrix-matrix product of symmetric matrices
?trmm s, d, sc, dz Matrix-matrix product of triangular matrices
Example
2011/07/22Intel MKL Quickstart55
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex4. LU Factorization
2011/07/22Intel MKL Quickstart56
#include "mkl_lapack.h”
int main(){ int m, n, lda, info, idxi, idxj, *ipiv; double *A; m = 3; n = 3; lda = m; ipiv = (int*)malloc(sizeof(int)*m); A = (double*)malloc(sizeof(double)*m*n); *(A+0)=1; *(A+1)=2; *(A+2)=6; *(A+3)=-2; *(A+4)=3; *(A+5)=5; *(A+6)=4; *(A+7)=8; *(A+8)=1;
dgetrf(&m, &n, A, &lda ,ipiv, &info); return 0;}
?getrf
2011/07/22Intel MKL Quickstart57
Description : The routine is declared in Fortran77 : mkl_lapack.fi Fortran95 : lapack.f90 C : mkl_lapack.h
Input Parameters m: The number of columns of the matrix . n: The number of rows of the matrix . lda: The first dimension of matrix . A: Array, REAL for sgetrf DOUBLE PRECISION for dgetrf COMPLEX for cgetrf DOUBLE COMPLEX for zgetrf.
?getrf
2011/07/22Intel MKL Quickstart58
output Parameters A: Overwritten by L and U. The unit diagonal elements of L
are not stored. ipiv: An integer array, dimension at least max(1,min(m,n)). The pivot indices; row i is interchanged with row
ipiv(i) info: Integer. If info=0,the execution is successful. If info=-i,the i-th parameter had an illegal value. If info=i, The factorization has been completed, but U is singular.
Ex4-1. Result
Vectors and PlanesIntroduction to MATLAB59
Ex4-2. Result
Vectors and PlanesIntroduction to MATLAB60
LAPACK Computational Routines
2011/07/22Intel MKL Quickstart61
generalmatrix
sysmmetricindefinite
sysmmetricpositive-definite
triangularmatrix
Factorize matrix ?getrf ?sytrf ?potrf
Solve linear systemwith a factored matrix
?getrs ?sytrs ?potrs ?trtrs
Condition number ?gecon ?sycon ?pocon ?trcon
Compute the inverse matrix using the factorization
?getri ?sytri ?potri ?trtri
LAPACK Routines: Linear Equations
2011/07/22Intel MKL Quickstart62
To solve a particular problem, you can call two or more computational routines or call a corresponding driver routines that combines several tasks in one call. For example, to solve a system of linear equation with a general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Alternatively, use the driver routine ?gesv that performs all these tasks in one call.
Example
2011/07/22Intel MKL Quickstart63
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex5-1. Solve the Linear Eqation
2011/07/22Intel MKL Quickstart64
#include <stdio.h>#include "mkl_lapack.h”
int main(){ int n, nrhs, lda, ldb, info, idxi, idxj, *ipiv; double *A, *b; n = 3; nrhs = 1; lda = n; ldb = n; ipiv = (int*)malloc(sizeof(int)*n); A = (double*)malloc(sizeof(double)*n*n); b = (double*)malloc(sizeof(double)*n); for( idxi = 0; idxi < n; idxi++ ) for( idxj = 0; idxj < n; idxj++)*(A+idxi*n+idxj) = (double)(idxi+1) + idxj;
Ex5. Solve the Linear Eqation
2011/07/22Intel MKL Quickstart65
*(b+0) = 6;
*(b+1) = 9;
*(b+2) = 12;
dgesv(&n, &nrhs, A, &lda ,ipiv, b, &ldb, &info);
return 0;
}
?gesv
2011/07/22Intel MKL Quickstart66
Input Parameters nrhs: The number of columns of the matrix .
Output Parameters A: Overwritten by the factor L and U from the factorization
of . b: Overwritten by the solution matrix .
Ex5. Result
Vectors and PlanesIntroduction to MATLAB67
Example
2011/07/22Intel MKL Quickstart68
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Ex6-1. Solve the Eigen Eqation
2011/07/22Intel MKL Quickstart69
#include "mkl_lapack.h”
int main(){ int n, lda, lwork, ldvl, ldvr, info, idxi, idxj; double *wr, *wi, *A, *work, *vl, *vr; char jobvl, jobvr; n = 3; lda = n; ldvl = 1; ldvr = n; lwork = 4*n; // not 3*n jobvl = ‘N’; jobvr = ‘V’; A = (double*)malloc(sizeof(double)*n*n); wr = (double*)malloc(sizeof(double)*n); wi = (double*)malloc(sizeof(double)*n); vl = (double*)malloc(sizeof(double)*ldvl*n); vr = (double*)malloc(sizeof(double)*ldvr*n); work = (double*)malloc(sizeof(double)*lwork);
Ex6-2. Solve the Eigen Eqation
2011/07/22Intel MKL Quickstart70
*(A+0) = 2;
*(A+1) = -1;
*(A+2) = 0;
*(A+3) = -1;
*(A+4) = 2;
*(A+5) = -1;
*(A+6) = 0;
*(A+7) = -1;
*(A+8) = 2;
dgeev(&jobvl, &jobvr, &n, A, &lda, &wr, &wi,
vl, &ldvl, vr, &ldvr, work, &lwork, &info);
return 0;
}
?geev
2011/07/22Intel MKL Quickstart71
Input Parameters jobvl: If jobvl=‘N’, the left eigenvalues of A are not
computed. If jobvl=‘V’, the left eigenvalues of A are computed. jobvr: If jobvr=‘N’, the right eigenvalues of A are not
computed. If jobvr=‘V’, the right eigenvalues of A are computed. work: A workspace array, its dimension max(1, lwork). lwork: The dimension of the array work. lwork ≥ max(1,3n), lwork < max(1,4n)(for real). ldvl, ldvr: The leading dimension of the output array vl and vr, respectively.
?geev
2011/07/22Intel MKL Quickstart72
Output Parameters wr, wi: Contain the real and imaginary parts, respectively, of the
computed eigenvalue. vl, vr: If jobvl = ‘V’, the left eigenvectors u(j) are
stored one after another in the columns of vl, in the same order as their eigenvalues.
If jobvl = ‘N’, vl is not referenced. If the j-th eigenvalue is real, then u(j) = vl(:,j), the j-th column of vl. info: info=0, the execution is successful.
info=-i, the i-th parameter had an illegal value. info= i, then the QR algorithm failed to compute all the eigenvalues, and no eigenvector have been computed.
Ex6. Result
Vectors and PlanesIntroduction to MATLAB73
LAPACK Computational Routines
2011/07/22Intel MKL Quickstart74
Orthogonal Factorizations (QR, QZ) Singular Value Decomposition Symmetric Eigenvalue Problems Generalized Symmetric-Definite Eigenvalue Problems Nonsymmetric Eigenvalue Problems Generalized Nonsymmetric Eigenvalue Problems Generalized Singular Value Decomposition
LAPACK Driver Routines
2011/07/22Intel MKL Quickstart75
Linear Least Squares (LLS) Problems Generalized LLS Problems Symmetric Eigenproblems Nonsymmetric Eigenproblems Singular Value Decomposition Generalized Symmetric Definite Eigenproblems Generalized Nonsymmetric Eigenproblems
Example
2011/07/22Intel MKL Quickstart76
Brief examples to BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms
Five Stage Usage Model for Computing FFT
2011/07/22Intel MKL Quickstart77
Allocate a fresh descriptor for the problem with a call to the DftiCreateDescriptor function. (precision, rank, sizes, scaling factor, …)
Optionally adjust the descriptor configuration with a call to the DftiSetValue function.
Commit the descriptor with a call to the DftiCommitDescriptor function.
Compute the transform with a call to the DftiComputeForward/DftiComputeBackward function.
Deallocate the descriptor with a call to the DftiFreeDescriptor function.
Ex7-1. Three-Dimensional Complex FFT
2011/07/22Intel MKL Quickstart78
#include "mkl_dfti.h”
#define m 1000#define n 1000#define k 1000
typedef struct{ double re; double im;} mkl_complex;
int main(){ int idxi, idxj, idxk; double backward_scale; MKL_LONG status, length[3]; mkl_complex *vec_src, *vec_tmp, *vec_dst; DFTI_DESCRIPTOR_HANDLE handle = 0;
Ex7-2. Three-Dimensional Complex FFT
2011/07/22Intel MKL Quickstart79
x_src = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_tmp = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_dst = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k);
length[0] = m; length[1] = n; length[2] = k;
memset(x_src, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_tmp, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_dst, 0, sizeof(sizeof(mkl_complex)*m*n*k));
for(idxk=0; idxk<k; idxk++) for(idxj=0; idxj<n; idxj++)
for(idxi=0; idxi<m; idxi++) { (x_src+idxk*k*n+idxj*n+idxi)->re=1.0; (x_src+idxk*k*n+idxj*n+idxi)->im=0.0; }
Ex7-3. Three-Dimensional Complex FFT
2011/07/22Intel MKL Quickstart80
status = DftiCreateDescriptor( &handle, DFTI_DOUBLE,
DFTI_COMPLEX, 3, length );
if(status && !DftiErrorClass(status, DFTI_NO_ERROR))
{
printf("Error : %s\n", DftiErrorMessage(status));
printf("TEST FAILED : DftiCreatDescriptor(&hand, ...)\n");
}
status = DftiSetValue( handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE );
status = DftiCommitDescriptor( handle );
status = DftiComputeForward( handle, vec_src, vec_tmp );
backward_scale = 1.0/((double)m*n*k);
status = DftiSetValue( handle, DFTI_BACKWARD_SCALE, backward_scale );
status = DftiCommitDescriptor( handle );
status = DftiComputeBackward( handle, vec_tmp, vec_dst);
status = DftiFreeDescriptor( &handle );
return 0;
}
FFT Functions
2011/07/22Intel MKL Quickstart81
Function Name Operation
DftiCreateDescriptorAllocates memory for the descriptor data structure and preliminarily initializes it.
DftiCommitDescriptorPerforms all initialization for the actual FFT computation.
DftiCopyDescriptorCopies an existing descriptor.
DftiFreeDescriptorFrees memory allocated for a descriptor.
DftiComputeForwardComputes the forward FFT.
DftiComputeBackwardComputes the backward FFT.
DftiSetValueSets one particular configuration parameter with the specified configuration value.
DftiGetValueGets the value of one particular configuration parameter.
82 Intel MKL Quickstart
Reference
Web site form LLNL tutorials (https://computing.llnl.gov/tutorials/parallel_comp/)
Intel® Math Kernel Library Reference Manual (mklman.pdf) Intel® Math Kernel Library for the Linux OS User’s Guide
(userguide.pdf)
Reference
Vectors and PlanesIntroduction to MATLAB83