CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Reporting a bug

I have tried using the latest released version of Numba (most recent is
visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem.
i.e. it's possible to run as 'python bug.py'.

(a static analyzer bug report)

In file numba/np/ufunc/omppool.cpp, `numba/np

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

Describe the bug
When compiling v0.13.0 build fails with

/home/zannzetsu/open3d/src/open3d/cpp/open3d/visualization/rendering/filament/FilamentEntitiesMods.cpp:212:22:   required from here
/usr/include/fmt/core.h:1579:7: error: static assertion failed: Cannot format an argument. To make type T formattable provide a formatter<T> specialization: https://fmt.dev/latest/api.html#udt
 157

For feature engineering tasks, I'd like to be able to determine whether a datetime is the beginning or end of a year, like I can in pandas.

import pandas as pd

s = pd.Series(["2021-02-27", "2020-03-31"], dtype="datetime64[ms]")
s.dt.is_year_end
0    False
1    False
dtype: bool

import pandas as pd

s = pd.Series(["2021-01-01", "2020-04-01"], dtype="datet

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

We're seeing a lot of warnings on Linux, host compiler GCC 9.3.0 with -Wconversion.

For a basic example, compile:

#include <thrust/device_vector.h>
int main() {
  thrust::device_vector<int> a;
  return 0;
}

With: nvcc main.cu -Xcompiler=-Wconversion.

This will result in around 1000 lines of output warnings.
Tested with the Thrust versions in CUDA 11.0, 11.3, and the la

请问可以直接training tmfile出来吗? 因为tengine-convert-tool covert 会有error

tengine-lite library version: 1.4-dev
Get input tensor failed

或是有例子能training出下面tmfile 呢?
![Screenshot from 2021-05-27 07-01-46](https://user-images.githubusercontent.com/40915044/11

Report needed documentation

Report needed documentation
While the estimator guide offers a great breakdown of how to use many of the tools in api_context_managers.py, it would be helpful to have information right in the docstring during development to more easily understand what is actually going on in each of the provided functions/classes/methods. This is particularly important for

I often use -v just to see that something is going on, but a progress bar (enabled by default) would serve the same purpose and be more concise.

We can just factor out the code from futhark bench for this.

CUDA

Here are 2,947 public repositories matching this topic...

NVIDIA / nvidia-docker

kaldi-asr / kaldi

hashcat / hashcat

numba / numba

Reporting a bug

catboost / catboost

chainer / chainer

taskflow / taskflow

cupy / cupy

intel-isl / Open3D

hybridgroup / gocv

rapidsai / cudf

arrayfire / arrayfire

NVIDIA / thrust

OAID / Tengine

uber / aresdb

ROCm-Developer-Tools / HIP

rapidsai / cuml

Report needed documentation

chrxh / alien

Celtoys / Remotery

NVIDIA / libcudacxx

dmlc / nnvm

diku-dk / futhark

graphistry / pygraphistry

bytedance / lightseq

mp3guy / ElasticFusion

Xtra-Computing / thundersvm

AlexiaJM / Deep-learning-with-cats

QuantScientist / Deep-Learning-Boot-Camp

NVIDIA / MinkowskiEngine

NVIDIA / cuda-samples

Related Topics