Newest 'blas cuda' Questions

0 votes

1 answer

138 views

Problems evaluating CUDNN for SGEMM

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...

sof

9,689

asked Jul 17, 2024 at 10:08

-1 votes

1 answer

191 views

Why does the magma_dgemm function not use tensor cores on the V100 GPU?

I run MAGMA testing_dgemm code both on V100 and H100 GPU. With Nsight Systems, I found that on the V100 the code doesn't use tensor cores, but code on the H100 it does. V100 result: H100 result: The ...

ingridli

5

asked Aug 9, 2023 at 13:31

-1 votes

2 answers

790 views

CUDA CSR Matrix-Matrix product transpose by itself

I have a very large, very sparse least-squares design matrix (A), which I would like to multiply by itself, as follows: N = A^T * A, where A & N are stored in CSR format. Obviously, A has more ...

Dar Cos

97

asked Mar 10, 2019 at 11:14

-1 votes

1 answer

3k views

CUBLAS Sgemm confusing results

For two matrices X and Q of size 4x3 and 2x3 which in memory look like x = [0 1 2 3 4 5 6 7 8 9 10 11] q = [3 4 5 6 7 8] I tried to use cublas multiplication cublasSgemm, but I couldn't manage to get ...

zinsek

9

asked Feb 19, 2018 at 22:42

-2 votes

1 answer

386 views

Impact of matrix sparsity on cblas sgemm in Ubuntu 14.04

I have recently discovered that the performance of a cblas_sgemm call for matrix multiplication dramatically improves if the matrices have a "large" number of zeros in them. It improves to the point ...

malang

33

asked Jun 25, 2016 at 17:26

2 votes

1 answer

853 views

How large should matrices be if I use BLAS/cuBLAS for it to perform better than plain C/CUDA?

I am currently implementing Stochastic Gradient Descent on a GPU using CUDA, Thrust and cuBLAS. In my initial implementation I used plain CUDA to perform matrix-vector operations, and now I'm trying ...

Bar

2,826

asked Feb 5, 2016 at 15:54

1 vote

1 answer

693 views

Does the leading dimension in cuBLAS allow for accessing any submatrix?

I'm trying to understand the idea of the leading dimension in cuBLAS. It's mentioned that lda must always be greater than or equal to the # of rows in a matrix. If I have a 100x100 matrix A and I ...

John

3,115

asked Jul 13, 2015 at 20:13

4 votes

1 answer

1k views

R and nvblas.dynlib (on a mac)

I have R on my mac installed via CRAN. I also have openblas installed via homebrew. I can switch between BLAS implementations as follows: Reference blas (netlib I think): ln -sf /Library/Frameworks/...

Zach

30.4k

asked Jan 27, 2015 at 20:10

0 votes

0 answers

68 views

cuBLAS - Issue with cublasSdot and cublasSgemv not taking pointers to GPU memory [duplicate]

I'm playing around with cuBlas, trying to get a dot product and a matrix-vector product to work. While doing so, I've come across a problem. First of, the code: float result_1; cublasSdot_v2(handle, ...

spurra

1,025

asked Nov 18, 2014 at 11:37

1 vote

1 answer

290 views

magmablas_dgemm not working for larger grid size

I am new to using cuda and the magma libraries. I'm trying out some functions on a test problem, a 2D heat equation. The code I wrote seemed to work perfectly for grid sizes of 32, 64, and 128. But it ...

user3527862

45

asked Jun 12, 2014 at 0:17

2 votes

0 answers

451 views

cuSPARSE csrmm with dense matrix in row-major format

I want to use the cuSPARSE csrmm function to multiply two matrices. The A matrix is sparse and the B matrix is dense. The dense matrix is in row-major format. Is there some nice way (trick) to ...

user1096294

859

asked May 18, 2014 at 19:57

-2 votes

1 answer

2k views

CUDA Library for Computing Kronecker Product [closed]

I have an application that requires me to calculate some large Kronecker products of 2D matrices and multiply the result by large 2D matrices. I would like to implement this on a GPU in CUDA and ...

Michael Puglia

145

asked Jan 17, 2014 at 16:17

1 vote

1 answer

2k views

Standard Fortran interface for cuBLAS

I am using a commercial simulation software on Linux that does intensive matrix manipulation. The software uses Intel MKL by default, but it allows me to replace it with a custom BLAS/LAPACK library. ...

Bichoy

351

asked Sep 16, 2013 at 2:42

-1 votes

1 answer

669 views

Can you use cublasDdot() to use blas operations in non-GPU memory?

So I have a code that performs matrix multiplicaiton, but the problem is it returns just zeroes when I use the library -lcublas and the compiler nvcc; however, the code runs great with just a few ...

Mechy

259

asked May 18, 2013 at 20:21

1 vote

2 answers

1k views

How threads/blocks are mapped on GPU while calling cublasSgemm/clAmdBlasSgemm routines?

I am interested in knowing how cublasSgemm/clAmdBlasSgemm routines are mapped on GPU while calculating matrix multiplication (C = A * B). Assume the dimensions of input Matrix ::A_rows = 6144; ...

Gopal

785

asked Feb 13, 2013 at 7:16

Collectives™ on Stack Overflow

All Questions

Problems evaluating CUDNN for SGEMM

Why does the magma_dgemm function not use tensor cores on the V100 GPU?

CUDA CSR Matrix-Matrix product transpose by itself

CUBLAS Sgemm confusing results

Impact of matrix sparsity on cblas sgemm in Ubuntu 14.04

How large should matrices be if I use BLAS/cuBLAS for it to perform better than plain C/CUDA?

Does the leading dimension in cuBLAS allow for accessing any submatrix?

R and nvblas.dynlib (on a mac)

cuBLAS - Issue with cublasSdot and cublasSgemv not taking pointers to GPU memory [duplicate]

magmablas_dgemm not working for larger grid size

cuSPARSE csrmm with dense matrix in row-major format

CUDA Library for Computing Kronecker Product [closed]

Standard Fortran interface for cuBLAS

Can you use cublasDdot() to use blas operations in non-GPU memory?

How threads/blocks are mapped on GPU while calling cublasSgemm/clAmdBlasSgemm routines?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags