931 questions
0
votes
0
answers
9
views
BLAS/LAPACK compatibility
I've been trying to figure out whether the newer version of BLAS/LAPACK are backward compatible with the older releases but I can't find anything on the netlib website or docs.
Are they compatible ...
1
vote
1
answer
63
views
"Invalid read of size 8" warning from Valgrind when calling zhemv blas function in C++
I'm computing a hermitian (self-adjoint) matrix times a complex vector multiplication by means of ZHEMV in BLAS by calling the function from a C++ interface. The problem I see is getting an "...
1
vote
0
answers
92
views
Ifx cannot find modern generic MKL routines like GEMM_F95
I am compiling Fortran code with the ifx compiler (version 2025.0.4) on Windows. I have the Intel MKL library downloaded as well and I am trying to compile a program using it, like this:
ifx test.f90 ...
1
vote
1
answer
185
views
MKL and openBLAS interactions - a question about linking
I'm using a binary (R) that dynamically links to a generic version of BLAS,
for instance (and in a lot of cases) this is openBLAS.
Now, inside R, I'm dynamically loading another shared library (...
1
vote
2
answers
68
views
Undefined reference to cblas_* with cmake on windows
I'working on a project that uses SAF (Spatial Audio Framework) which has OpenBlas and LAPACK as Dependecies. (The Project includes a lot of libraries so I only show the code that relates to my problem:...
1
vote
0
answers
37
views
Confused about cblas_dgemm arguments
Say I want to calculate x^T * Y, x is an n by 1 matrix and Y is an n by n matrix:
cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const ...
5
votes
2
answers
188
views
crossprod(m1, m2) is running slower than t(m1) %*% m2 on my machine
Why does t(mat1) %*% mat2 work quicker than crossprod(mat1, mat2). Isn't the whole point of the latter that it calls a more efficient low-level routine?
r$> mat1 <- array(rnorm(100 * 600), dim = ...
5
votes
1
answer
94
views
How to control (BLAS?) parallelization when using mgcv::gam
I am running some fairly large gam models and don't want to parallelize the computations, or at least want to be able to control the degree of parallelization. (Besides not wanting to fry my machine ...
2
votes
1
answer
66
views
Parallelize operations on arrays and merge results into one array using OpenMP
I am trying to speed up a function that, given a complex-valued array arr with n entries, calculates the sum of m operations on that array using BLAS routines. Finally, it replaces the values of arr.
...
0
votes
0
answers
83
views
Unexpected behaviour of matmul when compiled with blas in Fortran
I am trying to benchmark the blas routines dgemv and dgemm in Fortran. For that I have written this simple codes:
matmul.f90:
program test ...
1
vote
0
answers
88
views
How to use BLAS in C, using gcc on Linux?
On Linux, in the file a.c, I do #include <cblas.h> and later I do cblas_sgemm(...). Compiling with
gcc -O2 -march=native -fopenmp a.c
or with
gcc -O2 -march=native -lblas -fopenmp a.c
results in ...
0
votes
0
answers
26
views
Is BLAS interface of cvxopt different from standard ones?
According to the official documentation, the description of BLAS routine tbmv seems no different from the standard routines, for example found in the Intel's MKL Manual.
However, running the ...
0
votes
0
answers
14
views
Reason behind transposition restrictions in the BLAS interface
I wonder if there is some reason behind the restrictions in the BLAS interface regarding transposition. Unlike gemm, not all routines allow all combinations of transpositions of the input matrices. ...
0
votes
1
answer
138
views
Problems evaluating CUDNN for SGEMM
I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below,
Configuration
GPU: T1000/SM_75
cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
0
votes
0
answers
88
views
How can I use multithreaded BLAS from a single threaded EIgen C++ application?
I'm trying to speed up Eigen dense matrix * matrix operation by using multihreaded BLAS library calls.
I've achieved 100% speed increase using AMD AOCL-BLAS library from within Eigen. But I seem ...