Upload
maho-nakata
View
1.264
Download
5
Embed Size (px)
DESCRIPTION
We are interested in the accuracy of linear algebra operations; accuracy of the solution of linear equation, eigenvalue and eigenvectors of some matrices, etc. This is a reason for we have been developing the MPACK. The MPACK consists of MBLAS and MLAPACK, multiple precision version of BLAS and LAPACK, respectively. Features of MPACK are: (i) based on LAPACK 3.x, (ii) to provide a reference implementation and or API (iii) written in C++, rewrite from FORTRAN77 (iv) supports GMP, MPFR, DD/QD and binary128 as multiple precision arithmetic library and (v) portable. Current version of MPACK is 0.7.0 and it supports 76 MBLAS routines and 100 MLAPACK routines. Matrix-matrix multiplication routine has been accelerated using NVIDIA C2050 GPU. All source codes are available at: http://mplapack.sourceforge.net/
Citation preview
.
......
The MPACK : Multiple precision version of BLAS andLAPACK
NAKATA, Maho
RIKEN, Advanced Center for Computer and Communication
SIAM Conference Applied Linear Algebra, at Valencia, Spain,2012/6/18-22 MS51 11:25-11:50 June/21th Room:2.0
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN
MPACK: multiple precision version of BLAS and LAPACK.
Providing Building block, reference implementation, and ApplicationProgram Interface (API)
Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.
Extensive testing: preparing test cases for all calculations.
Multi-platform:Linux/BSD/Mac/Win
Five supported multiple precision types: GMP, MPFR, quadrupleprecision (binary128), DD, QD, and double
Written in C++: easier programming, faster programming.
Distributed under: 2-clause BSD license, redistribution, modificationare permitted.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Overview
Introduction: Why do we need more accuracy?
Floating point numbers and multiple precision libraries.
Introduction of BLAS, LAPACK, and MPACK.
Summary.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Introduction: Why do we need more accuracy?
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
EXA scale computing : 1023 FLOP!!! for just one weekcalculation.
Scientific computing may suffer from the accuracy.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
EXA scale computing : 1023 FLOP!!! for just one weekcalculation.
Scientific computing may suffer from the accuracy.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
EXA scale computing : 1023 FLOP!!! for just one weekcalculation.
Scientific computing may suffer from the accuracy.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards Peta and Exa scale computing
Iterative methods in double precision calculation sometimesdo not even converge [Hasegawa 2007].
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards Peta and Exa scale computing
Iterative methods in double precision calculation sometimesdo not even converge [Hasegawa 2007].
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]
1e-10
1e-05
1
100000
1e+10
1e+15
1e+20
0 10 20 30 40 50 60 70 80 90
# of iter.
The 1-norm and the estimated 1-norm condition number of shur complement matrix
1-cond1-norm
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]
1e-10
1e-05
1
100000
1e+10
1e+15
1e+20
0 10 20 30 40 50 60 70 80 90
# of iter.
The 1-norm and the estimated 1-norm condition number of shur complement matrix
1-cond1-norm
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]
1e-10
1e-05
1
100000
1e+10
1e+15
1e+20
0 10 20 30 40 50 60 70 80 90
# of iter.
The 1-norm and the estimated 1-norm condition number of shur complement matrix
1-cond1-norm
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
More accuracy is needed towards peta and exa scale computing
Semidefinite programming (SDP): condition number divergesat the optimum.Therefore, one may be very hard to obtain an accuratesolution[Nakata et al 2008], [Nakata 2009], [Waki-Nakata-Muramatsu]
1e-10
1e-05
1
100000
1e+10
1e+15
1e+20
0 10 20 30 40 50 60 70 80 90
# of iter.
The 1-norm and the estimated 1-norm condition number of shur complement matrix
1-cond1-norm
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Floating point numbers and multiple precision libraries.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
The double precision: most widely used floating point numberformat
“754-2008 IEEE Standard for Floating-Point Arithmetic”
The binary64 (aka double precision) format has 16 decimalsignificant digits
Widely used and very fast. Core i7 920: ∼40GFLOPS;RADEON HD7970 ∼1000GFLOPS, K computer: ∼ over10PFLOPS)�� ��Rounding error may occur for every arithmetic operation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Dealing with round-off error by multiple precision calculation
�� ��Multiple precision: A brute force method against round-off error
Floating point numbers: approximation of the real numbers oncomputer.
a + (b + c) , (a + b) + c
Round-off error can occur in each arithmetic operation.
The double precision has only 16 decimal significant digits
1 + 0.0000000000000001 = 1
one solution: higher/multiple precision calculation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Dealing with round-off error by multiple precision calculation
�� ��Multiple precision: A brute force method against round-off error
Floating point numbers: approximation of the real numbers oncomputer.
a + (b + c) , (a + b) + c
Round-off error can occur in each arithmetic operation.
The double precision has only 16 decimal significant digits
1 + 0.0000000000000001 = 1
one solution: higher/multiple precision calculation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Dealing with round-off error by multiple precision calculation
�� ��Multiple precision: A brute force method against round-off error
Floating point numbers: approximation of the real numbers oncomputer.
a + (b + c) , (a + b) + c
Round-off error can occur in each arithmetic operation.
The double precision has only 16 decimal significant digits
1 + 0.0000000000000001 = 1
one solution: higher/multiple precision calculation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Dealing with round-off error by multiple precision calculation
�� ��Multiple precision: A brute force method against round-off error
Floating point numbers: approximation of the real numbers oncomputer.
a + (b + c) , (a + b) + c
Round-off error can occur in each arithmetic operation.
The double precision has only 16 decimal significant digits
1 + 0.0000000000000001 = 1
one solution: higher/multiple precision calculation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Dealing with round-off error by multiple precision calculation
�� ��Multiple precision: A brute force method against round-off error
Floating point numbers: approximation of the real numbers oncomputer.
a + (b + c) , (a + b) + c
Round-off error can occur in each arithmetic operation.
The double precision has only 16 decimal significant digits
1 + 0.0000000000000001 = 1
one solution: higher/multiple precision calculation.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
What is a multiple precision arithmetic?
�� ��There are some ways to treat multipe precision on computers
GMP is a free library for arbitrary precision arithmetic,operating on signed integers, rational numbers, and floatingpoint numbers : http://gmplib.org/
Significant digits can be arbitrary large:
One of the fastest library but still arithmetic operations arevery slow.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
What is a multiple precision arithmetic?
�� ��There are some ways to treat multipe precision on computers
GMP is a free library for arbitrary precision arithmetic,operating on signed integers, rational numbers, and floatingpoint numbers : http://gmplib.org/
Significant digits can be arbitrary large:
One of the fastest library but still arithmetic operations arevery slow.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Other multiple/arbitrary precision arithmetic libraries
Other multiple/arbitrary precision arithmetic libraries:
The QD library: double-double (quad-double) precision : 32(64) significant decimal digits and FAST
binary128, quadruple precision defined in IEEE 754 2008.
IEEE754 style multiple precision libraries: MPFR (real) andMPC (complex).
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Other multiple/arbitrary precision arithmetic libraries
Other multiple/arbitrary precision arithmetic libraries:
The QD library: double-double (quad-double) precision : 32(64) significant decimal digits and FAST
binary128, quadruple precision defined in IEEE 754 2008.
IEEE754 style multiple precision libraries: MPFR (real) andMPC (complex).
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Introduction of BLAS, LAPACK, and MPACK.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
What is BLAS and LAPACK?
BLAS: reference implementation of various types ofvector-vector, matrix-vector, matrix-matrix operations. Fasterimplementations are available: OpenBLAS(GotoBLAS2), IntelMKL, ATLAS etc.
LAPACK: solve linear equation, eigenvalue problem, leastsquare fitting, singular value decomposition.
De facto standard library; even using without noticing.
LAPACK web hits: 110,343,542 (Mon Dec 10 16:20:25 EST2012)�� ��BLAS and LAPACK are very very important library
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
What is BLAS and LAPACK?
BLAS: reference implementation of various types ofvector-vector, matrix-vector, matrix-matrix operations. Fasterimplementations are available: OpenBLAS(GotoBLAS2), IntelMKL, ATLAS etc.
LAPACK: solve linear equation, eigenvalue problem, leastsquare fitting, singular value decomposition.
De facto standard library; even using without noticing.
LAPACK web hits: 110,343,542 (Mon Dec 10 16:20:25 EST2012)�� ��BLAS and LAPACK are very very important library
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
MPACK 0.7.0: Multiple precision version of BLAS and LAPACK�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN
MPACK: multiple precision version of BLAS and LAPACK.
Providing Building block, reference implementation, and ApplicationProgram Interface (API)
Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.
Extensive testing: preparing test cases for all calculations.
Multi-platform:Linux/BSD/Mac/Win
Five supported multiple precision types: GMP, MPFR, quadrupleprecision (binary128), DD, QD, and double
Written in C++: easier programming, faster programming.
Distributed under: 2-clause BSD license, redistribution, modificationare permitted.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
MPACK 0.7.0: capability and non-capability
Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.
Rgemm (matrix-matrix multiplication) : OpenMP acceleration.
Rgemm by GPU acceleration (upcoming 0.8.0)
MLAPACK what can do: diagonalization of symmetric (Hermitian)matrix, LU decomposition, Cholesky decomposition, estimation ofcondition number, matrix inversion.
MLAPACK: not yet done: diagonalization of non-symmetric matrix,singular value decomposition, least square fitting, QR factorizationetc...
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Providing Application Program Interface: naming rule
Change in Prefixfloat, double→ “R”eal,complex, double complex→ “C”omplex.
daxpy, zaxpy→ Raxpy, Caxpy
dgemm, zgemm→ Rgemm, Cgemm
dsterf, dsyev→ Rsterf, Rsyev
dzabs1, dzasum→ RCabs1, RCasum
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Supported MBLAS 0.7.0 routines (completed)
LEVEL1 MBLASCrotg Cscal Rrotg Rrot Rrotm CRrot Cswap
Rswap CRscal Rscal Ccopy Rcopy Caxpy RaxpyRdot Cdotc Cdotu RCnrm2 Rnrm2 Rasum iCasum
iRamax RCabs1 Mlsame Mxerbla
LEVEL2 MBLASCgemv Rgemv Cgbmv Rgbmv Chemv Chbmv Chpmv RsymvRsbmv Ctrmv Cgemv Rgemv Cgbmv Rgemv Chemv ChbmvChpmv Rsymv Rsbmv Rspmv Ctrmv Rtrmv Ctbmv CtpmvRtpmv Ctrsv Rtrsv Ctbsv Rtbsv Ctpsv Rger CgeruCgerc Cher Chpr Cher2 Chpr2 Rsyr Rspr Rsyr2Rspr2
LEVEL3 MBLASCgemm Rgemm Csymm Rsymm Chemm Csyrk Rsyrk CherkCsyr2k Rsyr2k Cher2k Ctrmm Rtrmm Ctrsm Rtrsm
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Supported MLAPACK 0.7.0 routines: 100 routines
Mutils Rlamch Rlae2 Rlaev2 Claev2 Rlassq ClassqRlanst Clanht Rlansy Clansy Clanhe Rlapy2 RlarfgRlapy3 Rladiv Cladiv Clarfg Rlartg Clartg RlasetClaset Rlasr Clasr Rpotf2 Clacgv Cpotf2 RlasclClascl Rlasrt Rsytd2 Chetd2 Rsteqr Csteqr RsterfRlarf Clarf Rorg2l Cung2l Rorg2r Cung2r RlarftClarft Rlarfb Clarfb Rorgqr Cungqr Rorgql CungqlRlatrd Clatrd Rsytrd Chetrd Rorgtr Cungtr RsyevCheev Rpotrf Cpotrf Clacrm Rtrti2 Ctrti2 RtrtriCtrtri Rgetf2 Cgetf2 Rlaswp Claswp Rgetrf CgetrfRgetri Cgetri Rgetrs Cgetrs Rgesv Cgesv RtrtrsCtrtrs Rlasyf Clasyf Clahef Clacrt Claesy Crot
Cspmv Cspr Csymv Csyr iCmax1 RCsum1 RpotrsRposv Rgeequ Rlatrs Rlange Rgecon Rlauu2 RlauumRpotri Rpocon
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Providing APIs: difference in calling
The difference is: call by value or call by referenceMBLAS/MLAPACK:
Rgemm("n", "n", n, n, n, alpha, A, n, B, n, beta, C, n);
Rgetrf(n, n, A, n, ipiv, &info);
Rgetri(n, A, n, ipiv, work, lwork, &info);
Rsyev("V", "U", n, A, n, w, work, &lwork, &info);
BLAS/LAPACK:
dgemm_f77("N", "N", &n, &n, &n, &One, A, &n, A, &n, &Zero, C, &n);
dgetri_f77(&n, A, &n, ipiv, work, &lwork, &info);
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Programming model
Required types: INTEGER, REAL, COMPLEX, LOGICAL.
Switching MP libs by “typedef” REAL→ mpf class, qd real,dd real etc.
Requiring elementary functions (log, sin etc); usually accuracyis enough in double.
Currently supported MP libs: GMP, MPFR, QD, DD, binary128and double
Intermediate functions which absorbs the difference betweenMP libs.
You can program using MP types almost same as “double” inC++ (cf. SDPA-DD, and -GMP)
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Extraction from MBLAS codes
Caxpy: Complex version of axpy
void Caxpy(INTEGER n, COMPLEX ca, COMPLEX * cx, INTEGER incx, COMPLEX * cy, INTEGER incy)
{
REAL Zero = 0.0;
if (n <= 0)
return;
if (RCabs1(ca) == Zero)
return;
INTEGER ix = 0;
INTEGER iy = 0;
if (incx < 0)
ix = (-n + 1) * incx;
if (incy < 0)
iy = (-n + 1) * incy;
for (INTEGER i = 0; i < n; i++) {
cy[iy] = cy[iy] + ca * cx[ix];
ix = ix + incx;
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Extraction from MLAPACK source code
Rsyev; diagonalization of real symmetric matrices
Rlascl(uplo, 0, 0, One, sigma, n, n, A, lda, info);
}
//Call DSYTRD to reduce symmetric matrix to tridiagonal form.
inde = 1;
indtau = inde + n;
indwrk = indtau + n;
llwork = *lwork - indwrk + 1;
Rsytrd(uplo, n, &A[0], lda, &w[0], &work[inde - 1], &work[indtau - 1],
&work[indwrk - 1], llwork, &iinfo);
//For eigenvalues only, call DSTERF. For eigenvectors, first call
//DORGTR to generate the orthogonal matrix, then call DSTEQR.
if (!wantz) {
Rsterf(n, &w[0], &work[inde - 1], info);
} else {
Rorgtr(uplo, n, A, lda, &work[indtau - 1], &work[indwrk - 1], llwork,
&iinfo);
Rsteqr(jobz, n, w, &work[inde - 1], A, lda, &work[indtau - 1], info);
}
//If matrix was scaled, then rescale eigenvalues appropriately.
if (iscale == 1) {
if (*info == 0) {
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Facts of MPACK (MBLAS/MLAPACK)
Google searches only my pages or related pages with”Multiple precision BLAS”
Download count : 2520 (2012/6/21)
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Quality assurance of MBLAS
�� ��BLAS uses only algebraic manipulations
Input possible values and checks with BLAS.Can detect algorithmic bugs. That’s almost ok.for (int k = MIN_K; k < MAX_K; k++) {
for (int n = MIN_N; n < MAX_N; n++) {
for (int m = MIN_M; m < MAX_M; m++) {
...
for (int lda = minlda; lda < MAX_LDA; lda++) {
for (int ldb = minldb; ldb < MAX_LDB; ldb++) {
for (int ldc = max(1, m); ldc < MAX_LDC; ldc++) {
Rgemm(transa, transb, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc);
dgemm_f77(transa, transb, &m, &n, &k, &alphad, Ad, &lda,
Bd, &ldb, &betad, Cd, &ldc);
...
diff = vec_diff(C, Cd, MAT_A(ldc, n), 1);
if (fabs(diff) > EPSILON) {
printf(‘‘#error %lf!!\n’’, diff);
errorflag = TRUE;
}
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Quality assurance of MLAPACK
�� ��Very difficult: introduction of “convergence”
Input possible values and compare the results by MLAPACKand by LAPACK.LAPACK introduces “convergence”.
Essentially different but still many routines uses only algebraicones.
Can detect bugs when used in some researches or studies(Waki et al.)
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Raxpy
on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3
y ← αx + y
Raxpy performance in Flops. multithread by OpenMP inparenthesis
MP Library(sign. digs.) Flops (OpenMP)DD(32) 130(570)MQD(64) 13.7(67)M
GMP(77) 11.3(45)MGMP(154) 7.6(32)MMPFR(154) 3.7(17)M
GotoBLAS(16) 1.5G
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemv
on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3
y ← αAx + βy
Rgemv performance in Flops.MP Library(sign. digs.) Flops (OpenMP)
DD(32) 140MQD(64) 13M
GMP(77) 11.1MMPFR(77) 4.7MGMP(154) 7.1MMPFR(154) 3.7M
GotoBLAS(16) 3.8G
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm
on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3
Rgemm performance in Flops.
C ← αAB + βC
MP Library(sign. digs.) Flops (OpenMP)DD(32) 136 (605)MQD(64) 13.9 (63)M
GMP(77) 11.5 (44)MMPFR(77) 4.6 (20)MGMP(154) 7.2 (28) MMPFR(154) 3.7 (16) M
GotoBLAS(16) 42.5G
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: double-double precision on WestmereEP
Intel Composer, Intel WestmereEP, 40 cores, 2.4GHz: apporx5GFlops
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: GMP (154 decimal digits) on WestmereEP
Intel Composer, Intel WestmereEP, 40 cores, 2.4GHz, approx.0.2GFlops
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: double-double (quasi quad precision)on Magnycours 48cores
GCC 4.6, Magny cours 2.4GHz, 48 cores : approx 3GFlops
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: binary128 (true quad precision) onMagnycours 48cores
GCC 4.6, Magny cours 2.4GHz, 48 cores : approx 0.3GFlops
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: GMP (154 decimal digits) onMagnycours 48cores
GCC 4.6, Magny cours 2.4GHz, 48 cores: approx 0.15GFlops
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rgemm: double-double precision on NVIDIAC2050
CUDA 3.2, NVIDIA C2050, 16GFlops! fast and stable!
0
2
4
6
8
10
12
14
16
0 1000 2000 3000 4000 5000 6000
GFL
OPS
Dimension
NN−KernelNN−Total
NT−KernelNT−Total
TN−KernelTN−Total
TT−KernelTT−Total
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
Performance of Rsyev
on Intel Core i7 920 (2.6GHz) / Ubuntu 10.04 / gcc 4.4.3
Rsyev performance (symmetric 300x300, obtain eigenvalue,vectors) in second
AX = diag[λ1, λ2, · · · λN]X
MP Library(sign. digs.) secondsDD(32) 2.4QD(64) 25.6
GMP(77) 36.9MPFR(77) 78.9GMP(154) 64.0MPFR(154) 111
GotoBLAS(16) 0.1
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK
MPACK 0.7.0: Multiple precision version of BLAS and LAPACK
�� ��http://mplapack.sourceforge.net/NAKATA, Maho @ RIKEN
MPACK: multiple precision version of BLAS and LAPACK.
Providing Building block, reference implementation, and ApplicationProgram Interface (API)
Version 0.7.0 (2012/6/16); Status: MBLAS completed, and 100MLAPACK routines.
2500+ downloads until now.
Faster implementation for double-double precision for Rgemm onGPU is available.
NAKATA, Maho The MPACK : Multiple precision version of BLAS and LAPACK