Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
5 votes
2 answers
188 views

crossprod(m1, m2) is running slower than t(m1) %*% m2 on my machine

Why does t(mat1) %*% mat2 work quicker than crossprod(mat1, mat2). Isn't the whole point of the latter that it calls a more efficient low-level routine? r$> mat1 <- array(rnorm(100 * 600), dim = ...
Turdle's user avatar
  • 53
14 votes
1 answer
576 views

Faster evaluation of matrix multiplication from right to left

I noticed that evaluating matrix operations in quadratic form from right to left is significantly faster than left to right in R, depending on how the parentheses are placed. Obviously they both ...
Taotao Tan's user avatar
2 votes
1 answer
213 views

Armadillo: Inefficient chaining of .t()

consider the following two ways of doing the same thing. arma::Mat<double> B(5000,5000,arma::fill::randu); arma::Mat<double> C(5000,500, arma::fill::randu); Okay two dense matrices in ...
BenG's user avatar
  • 143
1 vote
0 answers
648 views

How does cblas_dcopy fare against memcpy_s and std::copy?

There is a lot of discussion on the comparison between std::copy and memcpy_s in terms of efficiency in copying one array to another. I'd like to know where Intel MKL's cblas_dcopy stands in all of ...
MajinSaha's user avatar
  • 198
1 vote
1 answer
348 views

Optimize eigen recomposition (Matrix - Diagonal Matrix - Matrix) product C++ with BLAS and OpenMP

I wrote a C++ code to solve a linear system A.x = b where A is a symmetric matrix by first diagonalizing the matrix A = V.D.V^T with LAPACK(E) (because I need the eigenvalues later) and then solving x ...
Toool's user avatar
  • 361
1 vote
1 answer
1k views

Why is numpy's kron so fast?

I was trying to implement a kronecker product function. Below are three ideas that I have: def kron(arr1, arr2): """columnwise outer product, avoiding relocate elements. """ r1, c1 = arr1....
Chen's user avatar
  • 330
1 vote
1 answer
79 views

C - array function evaluation

Aloha! I am working in C and I'm using basic functions on all elements of an array with a for loop and I was wondering if it's possible to speed up this calculation (e.g. with cblas functions). I am ...
fxm's user avatar
  • 135
0 votes
1 answer
202 views

A * B computation when B is a symmetric matrix in armadillo

Is there any way to multiply a symmetric matrix by a dense one in armadillo(and use the fact that we have a symmetric matrix)? I know about DSYMM Routine in BLAS,but the matrices I'm dealing with are ...
MAh2014's user avatar
  • 147
2 votes
0 answers
271 views

Eigenlib and performance of small matrix operations

I chose eigenlib for my project, since I deal with a lot of small scale vector- and matrix-operations. Naturally, I implemented the simple the vector-Matrix-vector product in eigenlib as this function:...
thogra's user avatar
  • 323
2 votes
0 answers
106 views

cython_blas level 1 routine orders of magnitude faster than Cython for loop

I've come across a performance difference between a call to cblas (namely daxpy: perform y += alpha * x where y and x are vectors of the same length, and alpha is a scalar) and the same operation ...
P. Camilleri's user avatar
  • 13.2k
1 vote
1 answer
4k views

Optimizing numpy array multiplication: * faster than numpy.dot?

Questions: 1) How is it that numpy.dot() is slower than * in the example code below when BLAS is being used? 2) Is there a way that numpy.dot() can be implemented instead of * in this case for ...
user avatar
0 votes
1 answer
234 views

Avoid blas when involving temporary memory allocation?

I have a program that computes the matrix product x'Ay repeatedly. Is it better practice to compute this by making calls to MKL's blas, i.e. cblas_dgemv and cblas_ddot, which requires allocating ...
Agrim Pathak's user avatar
  • 3,207
5 votes
1 answer
3k views

Is numpy.einsum efficient compared to fortran or C?

I have written a numpy program which is very time consuming. After profiling it, I found that most of the time is spent in numpy.einsum. Although numpy is a wrapper of LAPACK or BLAS, I don't know ...
atbug's user avatar
  • 838
1 vote
1 answer
1k views

performance in linear algebra with python

Benchmarks of different languages and related questions are everywhere on the Internet. However, I still cannot figure out an answer of whether I should switch to C in my program. Basically, The most ...
atbug's user avatar
  • 838
9 votes
1 answer
2k views

How to measure overall performance of parallel programs (with papi)

I asked myself what would be the best way to measure the performance (in flops) of a parallel program. I read about papi_flops. This seems to work fine for a serial program. But I don't know how I can ...
Sebastian's user avatar
  • 153

15 30 50 per page