cblas with ACML c++ support etc.
One currently large problem with the BLAS is that these libraries are all written for Fortran 77. While many commercial BLAS/LAPACK implementations seek to correct this. (NAG, IMSL, MKL, etc). It is not portable across all systems. The only way to use them for sure is to write the code only in Fortran. it is quite annoying as more and more users are writing their code in C and C++.
Now you can call Fortran from C. Its not that hard, fortran passes everyhting by reference:
DGEMV(TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY, TRANS)
Becomes
dgemv_(char *trans, int *m, int *n, double *alpha, double *a, int *lda, double *x, int *incx, double *beta, double *y, int *incy, int trans_len);
It’s is not the cleanest thing in the world and everything much be by reference.
Well there was an update to BLAS known as the CBLAS. The paper on the CBLAS is defined as cblas_dgemv(); The cblas shall also have options so you can tell the BLAS if your matrix is Row Major or Colum major (if your writting C or Fortran you better know what this is! If not email me: brockp@mlds-networks.com). Very helpful because C puts matrixes in Row Major while Fortran is collum major so calling fortran from C can make memory location quite the pain.
Now yes some (most) BLAS fuctions that work with matrixes have a transpose option which can correct from this true. I have not done tests yet if it hurts performace or not due to cache locality My guess is it will depend on your BLAS library.
Now so we have a nice deffinition of how the BLAS should be called from C. While MKL and ATLAS have implimented some if not all of it, ACML and other have not or have done their own for of BLAS from C. This is a major pain. To have code work on multiple platforms with differnt cpu vendors and thus differnt BLAS libraries. Anything other than fortran could be a pain.
I recently figured out a way to make the CBLAS work with ACML which does not impliement the ‘correct’ CBLAS.
Steps:
- Download CBLAS
- Modify Makefile.in to use ACML (Mine attached)
- Build libcblas.a following the CBLAS instructions
When you are all done, the cblas library will take care of moving all calls even Row Major to the ACML fortran BLAS. This method should work for any BLAS that does not have CBLAS hooks. The one that comes to mind is GOTO BLAS one of the best BLAS’s out there.
In my case the call
cblas_dgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, M, N, K, alpha, a, lda, b, ldb, beta, c, ldc);
pgcc -fastsse -mp driver.c -I /opt/cblas/include/ -L /opt/cblas/lib -L /opt/acml/pgi64_mp/lib -lcblas -lacml_mp -lpgftnrtl
Will work, and link with the threaded BLAS so that I can use OpenMP for parallel. My tests show this to be atleast as fast as the native fortran77 so I am happy.
Though It would still be better if all BLAS lib makers would settle on the CBLAs paper above from Netlib.
CBLAS from C++
The cblas.h does not work when called from C++ reason for this is C++ name mangling. Its an easy fix and hope the BLAS working group adds this fix.
Open up cblas.h and add:
#ifdef __cplusplus
extern "C" {
#endif /* __cpluplus */
Right after the last include before anything else is defined.
Then at the very end of the header add:
#ifdef __cplusplus } #endif /* __cplusplus*/
Right before the last #endif

