50
. . The MPACK : Multiple precision version of BLAS and LAPACK NAKATA, Maho RIKEN, Advanced Center for Computer and Communication SIAM Conference Applied Linear Algebra, at Valencia, Spain, 2012/6/18-22 MS51 11:25-11:50 June/21th Room:2.0 NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

The MPACK : Multiple precision version of BLAS and LAPACK

Embed Size (px)

DESCRIPTION

We are interested in the accuracy of linear algebra operations; accuracy of the solution of linear equation, eigenvalue and eigenvectors of some matrices, etc. This is a reason for we have been developing the MPACK. The MPACK consists of MBLAS and MLAPACK, multiple precision version of BLAS and LAPACK, respectively. Features of MPACK are: (i) based on LAPACK 3.x, (ii) to provide a reference implementation and or API (iii) written in C++, rewrite from FORTRAN77 (iv) supports GMP, MPFR, DD/QD and binary128 as multiple precision arithmetic library and (v) portable. Current version of MPACK is 0.7.0 and it supports 76 MBLAS routines and 100 MLAPACK routines. Matrix-matrix multiplication routine has been accelerated using NVIDIA C2050 GPU. All source codes are available at: http://mplapack.sourceforge.net/

Citation preview

Page 1: The MPACK : Multiple precision version of BLAS and LAPACK

.

......

The MPACK : Multiple precision version of BLAS andLAPACK

NAKATA, Maho

RIKEN, Advanced Center for Computer and Communication

SIAM Conference Applied Linear Algebra, at Valencia, Spain,2012/6/18-22 MS51 11:25-11:50 June/21th Room:2.0

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 2: The MPACK : Multiple precision version of BLAS and LAPACK

The MPACK : Multiple precision version of BLAS and LAPACK�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN

MPACK: multiple precision version of BLAS and LAPACK.

Providing Building block, reference implementation, and ApplicationProgram Interface (API)

Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.

Extensive testing: preparing test cases for all calculations.

Multi-platform:Linux/BSD/Mac/Win

Five supported multiple precision types: GMP, MPFR, quadrupleprecision (binary128), DD, QD, and double

Written in C++: easier programming, faster programming.

Distributed under: 2-clause BSD license, redistribution, modificationare permitted.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 3: The MPACK : Multiple precision version of BLAS and LAPACK

Overview

Introduction: Why do we need more accuracy?

Floating point numbers and multiple precision libraries.

Introduction of BLAS, LAPACK, and MPACK.

Summary.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 4: The MPACK : Multiple precision version of BLAS and LAPACK

Introduction: Why do we need more accuracy?

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 5: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

EXA scale computing : 1023 FLOP!!! for just one weekcalculation.

Scientific computing may suffer from the accuracy.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 6: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

EXA scale computing : 1023 FLOP!!! for just one weekcalculation.

Scientific computing may suffer from the accuracy.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 7: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

EXA scale computing : 1023 FLOP!!! for just one weekcalculation.

Scientific computing may suffer from the accuracy.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 8: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards Peta and Exa scale computing

Iterative methods in double precision calculation sometimesdo not even converge [Hasegawa 2007].

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 9: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards Peta and Exa scale computing

Iterative methods in double precision calculation sometimesdo not even converge [Hasegawa 2007].

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 10: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]

1e-10

1e-05

1

100000

1e+10

1e+15

1e+20

0 10 20 30 40 50 60 70 80 90

# of iter.

The 1-norm and the estimated 1-norm condition number of shur complement matrix

1-cond1-norm

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 11: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]

1e-10

1e-05

1

100000

1e+10

1e+15

1e+20

0 10 20 30 40 50 60 70 80 90

# of iter.

The 1-norm and the estimated 1-norm condition number of shur complement matrix

1-cond1-norm

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 12: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]

1e-10

1e-05

1

100000

1e+10

1e+15

1e+20

0 10 20 30 40 50 60 70 80 90

# of iter.

The 1-norm and the estimated 1-norm condition number of shur complement matrix

1-cond1-norm

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 13: The MPACK : Multiple precision version of BLAS and LAPACK

More accuracy is needed towards peta and exa scale computing

Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]

1e-10

1e-05

1

100000

1e+10

1e+15

1e+20

0 10 20 30 40 50 60 70 80 90

# of iter.

The 1-norm and the estimated 1-norm condition number of shur complement matrix

1-cond1-norm

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 14: The MPACK : Multiple precision version of BLAS and LAPACK

Floating point numbers and multiple precision libraries.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 15: The MPACK : Multiple precision version of BLAS and LAPACK

The double precision: most widely used floating point numberformat

“754-2008 IEEE Standard for Floating-Point Arithmetic”

The binary64 (aka double precision) format has 16 decimalsignificant digits

Widely used and very fast. Core i7 920: ∼40GFLOPS;RADEON HD7970 ∼1000GFLOPS, K computer: ∼ over10PFLOPS)�� ��Rounding error may occur for every arithmetic operation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 16: The MPACK : Multiple precision version of BLAS and LAPACK

Dealing with round-off error by multiple precision calculation

�� ��Multiple precision: A brute force method against round-off error

Floating point numbers: approximation of the real numbers oncomputer.

a + (b + c) , (a + b) + c

Round-off error can occur in each arithmetic operation.

The double precision has only 16 decimal significant digits

1 + 0.0000000000000001 = 1

one solution: higher/multiple precision calculation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 17: The MPACK : Multiple precision version of BLAS and LAPACK

Dealing with round-off error by multiple precision calculation

�� ��Multiple precision: A brute force method against round-off error

Floating point numbers: approximation of the real numbers oncomputer.

a + (b + c) , (a + b) + c

Round-off error can occur in each arithmetic operation.

The double precision has only 16 decimal significant digits

1 + 0.0000000000000001 = 1

one solution: higher/multiple precision calculation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 18: The MPACK : Multiple precision version of BLAS and LAPACK

Dealing with round-off error by multiple precision calculation

�� ��Multiple precision: A brute force method against round-off error

Floating point numbers: approximation of the real numbers oncomputer.

a + (b + c) , (a + b) + c

Round-off error can occur in each arithmetic operation.

The double precision has only 16 decimal significant digits

1 + 0.0000000000000001 = 1

one solution: higher/multiple precision calculation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 19: The MPACK : Multiple precision version of BLAS and LAPACK

Dealing with round-off error by multiple precision calculation

�� ��Multiple precision: A brute force method against round-off error

Floating point numbers: approximation of the real numbers oncomputer.

a + (b + c) , (a + b) + c

Round-off error can occur in each arithmetic operation.

The double precision has only 16 decimal significant digits

1 + 0.0000000000000001 = 1

one solution: higher/multiple precision calculation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 20: The MPACK : Multiple precision version of BLAS and LAPACK

Dealing with round-off error by multiple precision calculation

�� ��Multiple precision: A brute force method against round-off error

Floating point numbers: approximation of the real numbers oncomputer.

a + (b + c) , (a + b) + c

Round-off error can occur in each arithmetic operation.

The double precision has only 16 decimal significant digits

1 + 0.0000000000000001 = 1

one solution: higher/multiple precision calculation.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 21: The MPACK : Multiple precision version of BLAS and LAPACK

What is a multiple precision arithmetic?

�� ��There are some ways to treat multipe precision on computers

GMP is a free library for arbitrary precision arithmetic,operating on signed integers, rational numbers, and floatingpoint numbers : http://gmplib.org/

Significant digits can be arbitrary large:

One of the fastest library but still arithmetic operations arevery slow.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 22: The MPACK : Multiple precision version of BLAS and LAPACK

What is a multiple precision arithmetic?

�� ��There are some ways to treat multipe precision on computers

GMP is a free library for arbitrary precision arithmetic,operating on signed integers, rational numbers, and floatingpoint numbers : http://gmplib.org/

Significant digits can be arbitrary large:

One of the fastest library but still arithmetic operations arevery slow.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 23: The MPACK : Multiple precision version of BLAS and LAPACK

Other multiple/arbitrary precision arithmetic libraries

Other multiple/arbitrary precision arithmetic libraries:

The QD library: double-double (quad-double) precision : 32(64) significant decimal digits and FAST

binary128, quadruple precision defined in IEEE 754 2008.

IEEE754 style multiple precision libraries: MPFR (real) andMPC (complex).

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 24: The MPACK : Multiple precision version of BLAS and LAPACK

Other multiple/arbitrary precision arithmetic libraries

Other multiple/arbitrary precision arithmetic libraries:

The QD library: double-double (quad-double) precision : 32(64) significant decimal digits and FAST

binary128, quadruple precision defined in IEEE 754 2008.

IEEE754 style multiple precision libraries: MPFR (real) andMPC (complex).

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 25: The MPACK : Multiple precision version of BLAS and LAPACK

Introduction of BLAS, LAPACK, and MPACK.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 26: The MPACK : Multiple precision version of BLAS and LAPACK

What is BLAS and LAPACK?

BLAS: reference implementation of various types ofvector-vector, matrix-vector, matrix-matrix operations. Fasterimplementations are available: OpenBLAS(GotoBLAS2), IntelMKL, ATLAS etc.

LAPACK: solve linear equation, eigenvalue problem, leastsquare fitting, singular value decomposition.

De facto standard library; even using without noticing.

LAPACK web hits: 110,343,542 (Mon Dec 10 16:20:25 EST2012)�� ��BLAS and LAPACK are very very important library

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 27: The MPACK : Multiple precision version of BLAS and LAPACK

What is BLAS and LAPACK?

BLAS: reference implementation of various types ofvector-vector, matrix-vector, matrix-matrix operations. Fasterimplementations are available: OpenBLAS(GotoBLAS2), IntelMKL, ATLAS etc.

LAPACK: solve linear equation, eigenvalue problem, leastsquare fitting, singular value decomposition.

De facto standard library; even using without noticing.

LAPACK web hits: 110,343,542 (Mon Dec 10 16:20:25 EST2012)�� ��BLAS and LAPACK are very very important library

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 28: The MPACK : Multiple precision version of BLAS and LAPACK

MPACK 0.7.0: Multiple precision version of BLAS and LAPACK�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN

MPACK: multiple precision version of BLAS and LAPACK.

Providing Building block, reference implementation, and ApplicationProgram Interface (API)

Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.

Extensive testing: preparing test cases for all calculations.

Multi-platform:Linux/BSD/Mac/Win

Five supported multiple precision types: GMP, MPFR, quadrupleprecision (binary128), DD, QD, and double

Written in C++: easier programming, faster programming.

Distributed under: 2-clause BSD license, redistribution, modificationare permitted.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 29: The MPACK : Multiple precision version of BLAS and LAPACK

MPACK 0.7.0: capability and non-capability

Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.

Rgemm (matrix-matrix multiplication) : OpenMP acceleration.

Rgemm by GPU acceleration (upcoming 0.8.0)

MLAPACK what can do: diagonalization of symmetric (Hermitian)matrix, LU decomposition, Cholesky decomposition, estimation ofcondition number, matrix inversion.

MLAPACK: not yet done: diagonalization of non-symmetric matrix,singular value decomposition, least square fitting, QR factorizationetc...

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 30: The MPACK : Multiple precision version of BLAS and LAPACK

Providing Application Program Interface: naming rule

Change in Prefixfloat, double→ “R”eal,complex, double complex→ “C”omplex.

daxpy, zaxpy→ Raxpy, Caxpy

dgemm, zgemm→ Rgemm, Cgemm

dsterf, dsyev→ Rsterf, Rsyev

dzabs1, dzasum→ RCabs1, RCasum

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 31: The MPACK : Multiple precision version of BLAS and LAPACK

Supported MBLAS 0.7.0 routines (completed)

LEVEL1 MBLASCrotg Cscal Rrotg Rrot Rrotm CRrot Cswap

Rswap CRscal Rscal Ccopy Rcopy Caxpy RaxpyRdot Cdotc Cdotu RCnrm2 Rnrm2 Rasum iCasum

iRamax RCabs1 Mlsame Mxerbla

LEVEL2 MBLASCgemv Rgemv Cgbmv Rgbmv Chemv Chbmv Chpmv RsymvRsbmv Ctrmv Cgemv Rgemv Cgbmv Rgemv Chemv ChbmvChpmv Rsymv Rsbmv Rspmv Ctrmv Rtrmv Ctbmv CtpmvRtpmv Ctrsv Rtrsv Ctbsv Rtbsv Ctpsv Rger CgeruCgerc Cher Chpr Cher2 Chpr2 Rsyr Rspr Rsyr2Rspr2

LEVEL3 MBLASCgemm Rgemm Csymm Rsymm Chemm Csyrk Rsyrk CherkCsyr2k Rsyr2k Cher2k Ctrmm Rtrmm Ctrsm Rtrsm

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 32: The MPACK : Multiple precision version of BLAS and LAPACK

Supported MLAPACK 0.7.0 routines: 100 routines

Mutils Rlamch Rlae2 Rlaev2 Claev2 Rlassq ClassqRlanst Clanht Rlansy Clansy Clanhe Rlapy2 RlarfgRlapy3 Rladiv Cladiv Clarfg Rlartg Clartg RlasetClaset Rlasr Clasr Rpotf2 Clacgv Cpotf2 RlasclClascl Rlasrt Rsytd2 Chetd2 Rsteqr Csteqr RsterfRlarf Clarf Rorg2l Cung2l Rorg2r Cung2r RlarftClarft Rlarfb Clarfb Rorgqr Cungqr Rorgql CungqlRlatrd Clatrd Rsytrd Chetrd Rorgtr Cungtr RsyevCheev Rpotrf Cpotrf Clacrm Rtrti2 Ctrti2 RtrtriCtrtri Rgetf2 Cgetf2 Rlaswp Claswp Rgetrf CgetrfRgetri Cgetri Rgetrs Cgetrs Rgesv Cgesv RtrtrsCtrtrs Rlasyf Clasyf Clahef Clacrt Claesy Crot

Cspmv Cspr Csymv Csyr iCmax1 RCsum1 RpotrsRposv Rgeequ Rlatrs Rlange Rgecon Rlauu2 RlauumRpotri Rpocon

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 33: The MPACK : Multiple precision version of BLAS and LAPACK

Providing APIs: difference in calling

The difference is: call by value or call by referenceMBLAS/MLAPACK:

Rgemm("n", "n", n, n, n, alpha, A, n, B, n, beta, C, n);

Rgetrf(n, n, A, n, ipiv, &info);

Rgetri(n, A, n, ipiv, work, lwork, &info);

Rsyev("V", "U", n, A, n, w, work, &lwork, &info);

BLAS/LAPACK:

dgemm_f77("N", "N", &n, &n, &n, &One, A, &n, A, &n, &Zero, C, &n);

dgetri_f77(&n, A, &n, ipiv, work, &lwork, &info);

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 34: The MPACK : Multiple precision version of BLAS and LAPACK

Programming model

Required types: INTEGER, REAL, COMPLEX, LOGICAL.

Switching MP libs by “typedef” REAL→ mpf class, qd real,dd real etc.

Requiring elementary functions (log, sin etc); usually accuracyis enough in double.

Currently supported MP libs: GMP, MPFR, QD, DD, binary128and double

Intermediate functions which absorbs the difference betweenMP libs.

You can program using MP types almost same as “double” inC++ (cf. SDPA-DD, and -GMP)

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 35: The MPACK : Multiple precision version of BLAS and LAPACK

Extraction from MBLAS codes

Caxpy: Complex version of axpy

void Caxpy(INTEGER n, COMPLEX ca, COMPLEX * cx, INTEGER incx, COMPLEX * cy, INTEGER incy)

{

REAL Zero = 0.0;

if (n <= 0)

return;

if (RCabs1(ca) == Zero)

return;

INTEGER ix = 0;

INTEGER iy = 0;

if (incx < 0)

ix = (-n + 1) * incx;

if (incy < 0)

iy = (-n + 1) * incy;

for (INTEGER i = 0; i < n; i++) {

cy[iy] = cy[iy] + ca * cx[ix];

ix = ix + incx;

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 36: The MPACK : Multiple precision version of BLAS and LAPACK

Extraction from MLAPACK source code

Rsyev; diagonalization of real symmetric matrices

Rlascl(uplo, 0, 0, One, sigma, n, n, A, lda, info);

}

//Call DSYTRD to reduce symmetric matrix to tridiagonal form.

inde = 1;

indtau = inde + n;

indwrk = indtau + n;

llwork = *lwork - indwrk + 1;

Rsytrd(uplo, n, &A[0], lda, &w[0], &work[inde - 1], &work[indtau - 1],

&work[indwrk - 1], llwork, &iinfo);

//For eigenvalues only, call DSTERF. For eigenvectors, first call

//DORGTR to generate the orthogonal matrix, then call DSTEQR.

if (!wantz) {

Rsterf(n, &w[0], &work[inde - 1], info);

} else {

Rorgtr(uplo, n, A, lda, &work[indtau - 1], &work[indwrk - 1], llwork,

&iinfo);

Rsteqr(jobz, n, w, &work[inde - 1], A, lda, &work[indtau - 1], info);

}

//If matrix was scaled, then rescale eigenvalues appropriately.

if (iscale == 1) {

if (*info == 0) {

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 37: The MPACK : Multiple precision version of BLAS and LAPACK

Facts of MPACK (MBLAS/MLAPACK)

Google searches only my pages or related pages with”Multiple precision BLAS”

Download count : 2520 (2012/6/21)

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 38: The MPACK : Multiple precision version of BLAS and LAPACK

Quality assurance of MBLAS

�� ��BLAS uses only algebraic manipulations

Input possible values and checks with BLAS.Can detect algorithmic bugs. That’s almost ok.for (int k = MIN_K; k < MAX_K; k++) {

for (int n = MIN_N; n < MAX_N; n++) {

for (int m = MIN_M; m < MAX_M; m++) {

...

for (int lda = minlda; lda < MAX_LDA; lda++) {

for (int ldb = minldb; ldb < MAX_LDB; ldb++) {

for (int ldc = max(1, m); ldc < MAX_LDC; ldc++) {

Rgemm(transa, transb, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc);

dgemm_f77(transa, transb, &m, &n, &k, &alphad, Ad, &lda,

Bd, &ldb, &betad, Cd, &ldc);

...

diff = vec_diff(C, Cd, MAT_A(ldc, n), 1);

if (fabs(diff) > EPSILON) {

printf(‘‘#error %lf!!\n’’, diff);

errorflag = TRUE;

}

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 39: The MPACK : Multiple precision version of BLAS and LAPACK

Quality assurance of MLAPACK

�� ��Very difficult: introduction of “convergence”

Input possible values and compare the results by MLAPACKand by LAPACK.LAPACK introduces “convergence”.

Essentially different but still many routines uses only algebraicones.

Can detect bugs when used in some researches or studies(Waki et al.)

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 40: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Raxpy

on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3

y ← αx + y

Raxpy performance in Flops. multithread by OpenMP inparenthesis

MP Library(sign. digs.) Flops (OpenMP)DD(32) 130(570)MQD(64) 13.7(67)M

GMP(77) 11.3(45)MGMP(154) 7.6(32)MMPFR(154) 3.7(17)M

GotoBLAS(16) 1.5G

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 41: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemv

on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3

y ← αAx + βy

Rgemv performance in Flops.MP Library(sign. digs.) Flops (OpenMP)

DD(32) 140MQD(64) 13M

GMP(77) 11.1MMPFR(77) 4.7MGMP(154) 7.1MMPFR(154) 3.7M

GotoBLAS(16) 3.8G

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 42: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm

on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3

Rgemm performance in Flops.

C ← αAB + βC

MP Library(sign. digs.) Flops (OpenMP)DD(32) 136 (605)MQD(64) 13.9 (63)M

GMP(77) 11.5 (44)MMPFR(77) 4.6 (20)MGMP(154) 7.2 (28) MMPFR(154) 3.7 (16) M

GotoBLAS(16) 42.5G

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 43: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: double-double precision on WestmereEP

Intel Composer, Intel WestmereEP, 40 cores, 2.4GHz: apporx5GFlops

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 44: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: GMP (154 decimal digits) on WestmereEP

Intel Composer, Intel WestmereEP, 40 cores, 2.4GHz, approx.0.2GFlops

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 45: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: double-double (quasi quad precision)on Magnycours 48cores

GCC 4.6, Magny cours 2.4GHz, 48 cores : approx 3GFlops

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 46: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: binary128 (true quad precision) onMagnycours 48cores

GCC 4.6, Magny cours 2.4GHz, 48 cores : approx 0.3GFlops

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 47: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: GMP (154 decimal digits) onMagnycours 48cores

GCC 4.6, Magny cours 2.4GHz, 48 cores: approx 0.15GFlops

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 48: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rgemm: double-double precision on NVIDIAC2050

CUDA 3.2, NVIDIA C2050, 16GFlops! fast and stable!

0

2

4

6

8

10

12

14

16

0 1000 2000 3000 4000 5000 6000

GFL

OPS

Dimension

NN−KernelNN−Total

NT−KernelNT−Total

TN−KernelTN−Total

TT−KernelTT−Total

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 49: The MPACK : Multiple precision version of BLAS and LAPACK

Performance of Rsyev

on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3

Rsyev performance (symmetric 300x300, obtain eigenvalue,vectors) in second

AX = diag[λ1, λ2, · · · λN]X

MP Library(sign. digs.) secondsDD(32) 2.4QD(64) 25.6

GMP(77) 36.9MPFR(77) 78.9GMP(154) 64.0MPFR(154) 111

GotoBLAS(16) 0.1

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK

Page 50: The MPACK : Multiple precision version of BLAS and LAPACK

MPACK 0.7.0: Multiple precision version of BLAS and LAPACK

�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN

MPACK: multiple precision version of BLAS and LAPACK.

Providing Building block, reference implementation, and ApplicationProgram Interface (API)

Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.

2500+ downloads until now.

Faster implementation for double-double precision for Rgemm onGPU is available.

NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK