All Questions
4 questions
0
votes
1
answer
234
views
Avoid blas when involving temporary memory allocation?
I have a program that computes the matrix product x'Ay repeatedly. Is it better practice to compute this by making calls to MKL's blas, i.e. cblas_dgemv and cblas_ddot, which requires allocating ...
24
votes
2
answers
21k
views
Link ATLAS/MKL to an installed Numpy
TL;DR how to link ATLAS/MKL to existing Numpy without rebuilding.
I have used Numpy to calculate with the large matrix and I found that it is very slow because Numpy only use 1 core to do calculation....
0
votes
1
answer
505
views
Efficient implementation of indirect daxpy operation
_axpy is a blas level one operation which implements following
for i = 1:n
a[i] = a[i]-$\alpha$ b[i]
There are efficient implementation of such regular daxpy available through various blas ...
2
votes
0
answers
300
views
Strange performance issue with AMD's ACML BLAS/LAPACK library
I asked this question over at the AMD developers forum a few days ago, but haven't gotten an answer. Maybe someone here has some insight.
http://devgurus.amd.com/thread/167492
I am running ACML ...