gpu-computing

usually, after trained model. i save model in cpp format with code:

cat_model.save_model('a', format="cpp")
cat_model.save_model('b', format="cpp")

but when my cpp need to use multi models.

in my main.cpp

#include "a.hpp"
#include "b.hpp"

int main() {
  // do something
  double a_pv = ApplyCatboostModel({1.2, 2.3});  // i want to a.hpp's model here
  double b_pv

Heston model has accurate density approximations for European option prices, which are of interest.

The module implementing this method should live under tf_quant_finance/volatility/heston_approximation.py. It should support both European option puts and calls approximations. Tests should be in heston_approximation_test.py in the same folder.

The standard accelerate test suite, used by all the backends, can be quite slow. Several of the tests are significantly slower than the others, for example segmented folds and scans, which I believe is because the reference implementations are very inefficient. Writing some more efficient reference implementations (e.g. using Data.Vector.Unboxed) should help speed things up.

Open issue to openly discuss potential ideas or improvements, whether on documentation, interfaces, examples, bug fixes, etc.

Hi,

one could and should experiment with Interprocedural optimization (IPO) also known as link-time optimization (LTO), especially on the host side for smaller binaries and potentially faster code. It's supported by GCC, Clang, and ICC, among others, which are our typical go-to compilers in HPC.

It's very easy to implement as well

Bug summary
There is evidence that sub_group::get_group_id() does not return the same value as threadIdx.x / warpSize (assuming 1D kernel), as expected on CUDA. We should check the implementation of this function. Our implementation of this function performs bit manipulation magic, presumably the optimization went to far...

To Reproduce
Compare sub_group{}.get_group_id() or `sub

M: Mute (muting is not a node-wrangler feature, but I include it here because it's also node editor quality of life)
Ctrl+Shift+LMB: View texture, material or volume node (create emission viewer if necessary)
Ctrl+T: Create image node+attached mapping node
Ctrl+Shift+T: Open file picker, user selects a bunch of textures, create disney material with textures attached to t

The problem is that the OpenCL types in https://github.com/triSYCL/triSYCL/blob/master/include/triSYCL/opencl_types.hpp are defined on the host according to the x86-64 Linux ABI which depends on the CPU & OS instead of using the description from https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#built-in-scalar-data-types

Note that the system-wide cl_size_t has been removed

gpu-computing

Here are 477 public repositories matching this topic...

catboost / catboost

taskflow / taskflow

google / tf-quant-finance

tensorflow / lingvo

microsoft / pai

calebwin / emu

jbush001 / NyuziProcessor

inducer / pycuda

mitmath / 18337

uncomplicate / neanderthal

BindsNET / bindsnet

mratsim / Arraymancer

AccelerateHS / accelerate

Langhalsdino / Kubernetes-GPU-Guide

LuxCoreRender / LuxCore

stotko / stdgpu

KomputeProject / kompute

ComputationalRadiationPhysics / picongpu

illuhad / hipSYCL

zszazi / Deep-learning-in-cloud

eyalroz / cuda-api-wrappers

LuxCoreRender / BlendLuxCore

triSYCL / triSYCL

huiscliu / Tutorials

uncomplicate / bayadera

mikbry / awesome-webgpu

Glavnokoman / vuh

favreau / Sol-R

uncomplicate / clojurecl

googlefonts / compute-shader-101

Improve this page

Add this topic to your repo