Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
1 answer
234 views

Avoid blas when involving temporary memory allocation?

I have a program that computes the matrix product x'Ay repeatedly. Is it better practice to compute this by making calls to MKL's blas, i.e. cblas_dgemv and cblas_ddot, which requires allocating ...
Agrim Pathak's user avatar
  • 3,207
24 votes
2 answers
21k views

Link ATLAS/MKL to an installed Numpy

TL;DR how to link ATLAS/MKL to existing Numpy without rebuilding. I have used Numpy to calculate with the large matrix and I found that it is very slow because Numpy only use 1 core to do calculation....
tndoan's user avatar
  • 653
0 votes
1 answer
505 views

Efficient implementation of indirect daxpy operation

_axpy is a blas level one operation which implements following for i = 1:n a[i] = a[i]-$\alpha$ b[i] There are efficient implementation of such regular daxpy available through various blas ...
arbitUser1401's user avatar
2 votes
0 answers
300 views

Strange performance issue with AMD's ACML BLAS/LAPACK library

I asked this question over at the AMD developers forum a few days ago, but haven't gotten an answer. Maybe someone here has some insight. http://devgurus.amd.com/thread/167492 I am running ACML ...
mrip's user avatar
  • 15.2k