All Questions
31 questions
0
votes
1
answer
138
views
Problems evaluating CUDNN for SGEMM
I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below,
Configuration
GPU: T1000/SM_75
cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
-1
votes
1
answer
191
views
Why does the magma_dgemm function not use tensor cores on the V100 GPU?
I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does.
V100 result:
H100 result:
The ...
-1
votes
2
answers
790
views
CUDA CSR Matrix-Matrix product transpose by itself
I have a very large, very sparse least-squares design matrix (A), which I would like to multiply by itself, as follows: N = A^T * A, where A & N are stored in CSR format. Obviously, A has more ...
-1
votes
1
answer
3k
views
CUBLAS Sgemm confusing results
For two matrices X and Q of size 4x3 and 2x3
which in memory look like
x = [0 1 2 3 4 5 6 7 8 9 10 11]
q = [3 4 5 6 7 8]
I tried to use cublas multiplication cublasSgemm, but I couldn't manage to get ...
-2
votes
1
answer
386
views
Impact of matrix sparsity on cblas sgemm in Ubuntu 14.04
I have recently discovered that the performance of a cblas_sgemm call for matrix multiplication dramatically improves if the matrices have a "large" number of zeros in them. It improves to the point ...
2
votes
1
answer
853
views
How large should matrices be if I use BLAS/cuBLAS for it to perform better than plain C/CUDA?
I am currently implementing Stochastic Gradient Descent on a GPU using CUDA, Thrust and cuBLAS.
In my initial implementation I used plain CUDA to perform matrix-vector operations, and now I'm trying ...
1
vote
1
answer
693
views
Does the leading dimension in cuBLAS allow for accessing any submatrix?
I'm trying to understand the idea of the leading dimension in cuBLAS. It's mentioned that lda must always be greater than or equal to the # of rows in a matrix.
If I have a 100x100 matrix A and I ...
4
votes
1
answer
1k
views
R and nvblas.dynlib (on a mac)
I have R on my mac installed via CRAN. I also have openblas installed via homebrew. I can switch between BLAS implementations as follows:
Reference blas (netlib I think):
ln -sf /Library/Frameworks/...
0
votes
0
answers
68
views
cuBLAS - Issue with cublasSdot and cublasSgemv not taking pointers to GPU memory [duplicate]
I'm playing around with cuBlas, trying to get a dot product and a matrix-vector product to work. While doing so, I've come across a problem. First of, the code:
float result_1;
cublasSdot_v2(handle, ...
1
vote
1
answer
290
views
magmablas_dgemm not working for larger grid size
I am new to using cuda and the magma libraries. I'm trying out some functions on a test problem, a 2D heat equation. The code I wrote seemed to work perfectly for grid sizes of 32, 64, and 128. But it ...
2
votes
0
answers
451
views
cuSPARSE csrmm with dense matrix in row-major format
I want to use the cuSPARSE csrmm function to multiply two matrices. The A matrix is sparse and the B matrix is dense. The dense matrix is in row-major format. Is there some nice way (trick) to ...
-2
votes
1
answer
2k
views
CUDA Library for Computing Kronecker Product [closed]
I have an application that requires me to calculate some large Kronecker products of 2D matrices and multiply the result by large 2D matrices. I would like to implement this on a GPU in CUDA and ...
1
vote
1
answer
2k
views
Standard Fortran interface for cuBLAS
I am using a commercial simulation software on Linux that does intensive matrix manipulation. The software uses Intel MKL by default, but it allows me to replace it with a custom BLAS/LAPACK library. ...
-1
votes
1
answer
669
views
Can you use cublasDdot() to use blas operations in non-GPU memory?
So I have a code that performs matrix multiplicaiton, but the problem is it returns just zeroes when I use the library -lcublas and the compiler nvcc; however, the code runs great with just a few ...
1
vote
2
answers
1k
views
How threads/blocks are mapped on GPU while calling cublasSgemm/clAmdBlasSgemm routines?
I am interested in knowing how cublasSgemm/clAmdBlasSgemm routines are mapped on GPU while calculating matrix multiplication (C = A * B).
Assume the dimensions of input Matrix ::A_rows = 6144;
...