88 questions
0
votes
0
answers
41
views
tf.math.segment_mean operation not working with XLA_GPU_JIT in google colab
I am using Google Colab pro with a T4 GPU for testing, my model is a GNN that is used to find relational data between chess pieces on a board. When I ran my code with a CPU it ran perfectly fine, but ...
0
votes
0
answers
50
views
XLA can't find algorithm for grouped convolutions with Conv3D
I have a Conv3D layer in my model. It produces correct results, however, each time I run it Tensorflow throws me the following warning:
W tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker....
0
votes
0
answers
38
views
cudagraph launch cored when use nsys profiling
cudagraph launch segment fault when nsys is used to profile, the stack trace is as following:
cudaProfilerStart for nsys
Capture range started in the application
Collecting data...
*** SIGSEGV (@...
1
vote
0
answers
554
views
xla_latency_hiding_scheduler_rerun causing Colab TPU runtime crash
For the life of me, I cannot run my training script on Google Colab Pro TPUv2 runtime. I've been upgrading, downgrading, uninstalling, reinstalling to try and get all my dependencies compatible, but ...
1
vote
1
answer
523
views
TPU V4-64 Runtime Error: TPU initialization failed: Failed to establish SliceBuilder grpc channel
During the TPU Research Program, I tried to use TPU V4-64 as I have 32 free on-demand TPU V4 chips.
However, unlike TPU V4-8, the test codes provided in the tutorial didn't work whenever I used TPU V4-...
1
vote
0
answers
152
views
jax sum creates a huge intermediate array slowing down GPU performance
I'm trying to create a jax function that selects a set of values from a
2D array and adds them together in a vectorized manner.
More specifically, given an (R x C) array data and a (1 x C) array of ...
0
votes
0
answers
16
views
Use XLA automatic grouping feature in a specific layer of a model
I have a tf.keras model with multiple layers. In one of the layers, there are a lot of small linear algebra operations which are causing sparse GPU utilization (I observe this in the nsight systems ...
1
vote
1
answer
284
views
Why JAX is considering same list as different data structure depending on appending a new array inside function?
I am very new to JAX. Please excuse me if this something obvious or I am making some stupid mistake. I am trying to implement a function which does the following. All these functions will be called ...
0
votes
0
answers
581
views
deepFace: No module named 'tensorflow.keras'
i try downgrading from tensorflow = 2.16.1 to 2.15.0. because when i run it with tensorflow = 2.16.1 i encountered tf-keras not found.
now i have tensorflow =2.15.0 , (Ubuntu). how can i solve this ...
2
votes
1
answer
2k
views
Why does tensorflow.function (without jit_compile) speed up forward passes of a Keras model?
XLA can be enabled using model = tf.function(model, jit_compile=True). Some model types are faster that way, some are slower. So far, so good.
But why can model = tf.function(model, jit_compile=None) ...
1
vote
1
answer
357
views
How can I test if a jitted Jax function creates new tensor or a view?
I have a basic code like this:
@jit
def concat_permute(indices, in1, in2):
tensor = jnp.concatenate([jnp.atleast_1d(in1), jnp.atleast_1d(in2)])
return tensor[indices]
Here is my test tensors:
...
0
votes
1
answer
78
views
Why Is Scalar Multiply Before Einsum Faster?
In the TensorFlow Keras implementation of Multi-Head Attention, instead of evaluating the numerator first like in
they evaluate Q/√dₖ first and put comment
Note: Applying scalar multiply at the ...
0
votes
1
answer
465
views
HLO protobuf to pytorch / tensorflow graph
Assume we have HLO protobuf from a model through Pytorch-XLA or Tensorflow.
Is there a way to create computational graph from it?
Is it possible to create Pytorch-XLA and Tensorflow model from it?
...
0
votes
1
answer
657
views
Are Tensor sharding and Tensor tilting the same implementation?
I know the each concept of Tensor Sharding and Tensor Tiling.
But Is there any differences between them?
Especially about the XLA/Hlo or GSPMD concept in parallel training (data parallel or model ...
3
votes
0
answers
301
views
Is it possible to use XLA in Tensorflow with variable input shape?
Trying to use XLA to further enhance the performance and speed up the training of my model in TF2.10. However, my input data shape varies, i.e. batch.shape = TensorShape([X, 4]) with X varying ...