Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
41 views

tf.math.segment_mean operation not working with XLA_GPU_JIT in google colab

I am using Google Colab pro with a T4 GPU for testing, my model is a GNN that is used to find relational data between chess pieces on a board. When I ran my code with a CPU it ran perfectly fine, but ...
MessiSkillz's user avatar
0 votes
0 answers
50 views

XLA can't find algorithm for grouped convolutions with Conv3D

I have a Conv3D layer in my model. It produces correct results, however, each time I run it Tensorflow throws me the following warning: W tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker....
Daniyar's user avatar
  • 11
0 votes
0 answers
38 views

cudagraph launch cored when use nsys profiling

cudagraph launch segment fault when nsys is used to profile, the stack trace is as following: cudaProfilerStart for nsys Capture range started in the application Collecting data... *** SIGSEGV (@...
Mengli Cheng's user avatar
1 vote
0 answers
554 views

xla_latency_hiding_scheduler_rerun causing Colab TPU runtime crash

For the life of me, I cannot run my training script on Google Colab Pro TPUv2 runtime. I've been upgrading, downgrading, uninstalling, reinstalling to try and get all my dependencies compatible, but ...
therealtheruss's user avatar
1 vote
1 answer
523 views

TPU V4-64 Runtime Error: TPU initialization failed: Failed to establish SliceBuilder grpc channel

During the TPU Research Program, I tried to use TPU V4-64 as I have 32 free on-demand TPU V4 chips. However, unlike TPU V4-8, the test codes provided in the tutorial didn't work whenever I used TPU V4-...
gnsrnjs's user avatar
  • 11
1 vote
0 answers
152 views

jax sum creates a huge intermediate array slowing down GPU performance

I'm trying to create a jax function that selects a set of values from a 2D array and adds them together in a vectorized manner. More specifically, given an (R x C) array data and a (1 x C) array of ...
user avatar
0 votes
0 answers
16 views

Use XLA automatic grouping feature in a specific layer of a model

I have a tf.keras model with multiple layers. In one of the layers, there are a lot of small linear algebra operations which are causing sparse GPU utilization (I observe this in the nsight systems ...
Aviraj Bevli's user avatar
1 vote
1 answer
284 views

Why JAX is considering same list as different data structure depending on appending a new array inside function?

I am very new to JAX. Please excuse me if this something obvious or I am making some stupid mistake. I am trying to implement a function which does the following. All these functions will be called ...
Endeavour 's user avatar
0 votes
0 answers
581 views

deepFace: No module named 'tensorflow.keras'

i try downgrading from tensorflow = 2.16.1 to 2.15.0. because when i run it with tensorflow = 2.16.1 i encountered tf-keras not found. now i have tensorflow =2.15.0 , (Ubuntu). how can i solve this ...
qasim's user avatar
  • 13
2 votes
1 answer
2k views

Why does tensorflow.function (without jit_compile) speed up forward passes of a Keras model?

XLA can be enabled using model = tf.function(model, jit_compile=True). Some model types are faster that way, some are slower. So far, so good. But why can model = tf.function(model, jit_compile=None) ...
Tobias Hermann's user avatar
1 vote
1 answer
357 views

How can I test if a jitted Jax function creates new tensor or a view?

I have a basic code like this: @jit def concat_permute(indices, in1, in2): tensor = jnp.concatenate([jnp.atleast_1d(in1), jnp.atleast_1d(in2)]) return tensor[indices] Here is my test tensors: ...
nazimorhan's user avatar
0 votes
1 answer
78 views

Why Is Scalar Multiply Before Einsum Faster?

In the TensorFlow Keras implementation of Multi-Head Attention, instead of evaluating the numerator first like in they evaluate Q/√dₖ first and put comment Note: Applying scalar multiply at the ...
rkuang25's user avatar
  • 141
0 votes
1 answer
465 views

HLO protobuf to pytorch / tensorflow graph

Assume we have HLO protobuf from a model through Pytorch-XLA or Tensorflow. Is there a way to create computational graph from it? Is it possible to create Pytorch-XLA and Tensorflow model from it? ...
Roy's user avatar
  • 167
0 votes
1 answer
657 views

Are Tensor sharding and Tensor tilting the same implementation?

I know the each concept of Tensor Sharding and Tensor Tiling. But Is there any differences between them? Especially about the XLA/Hlo or GSPMD concept in parallel training (data parallel or model ...
YuGyoung Yun's user avatar
3 votes
0 answers
301 views

Is it possible to use XLA in Tensorflow with variable input shape?

Trying to use XLA to further enhance the performance and speed up the training of my model in TF2.10. However, my input data shape varies, i.e. batch.shape = TensorShape([X, 4]) with X varying ...
George El Haber's user avatar

15 30 50 per page
1
2 3 4 5 6