Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
14 views

Order in which operations become visible to the remote

I am trying to get familiar with UCX. If, to an endpoint ep, I issue ucp_put_nbx (ep, ...); // "first put" ucp_ep_flush_nbx (ep, ...); ucp_put_nbx (ep, ...); // "second put" ...
JohnB's user avatar
  • 13.8k
0 votes
0 answers
70 views

UCX build with cuda aware

I was trying to install cuda aware openmpi. While following the following recipe laid out here: $ git clone https://github.com/openucx/ucx.git ucx $ cd ucx $ ./autogen.sh $ mkdir build $ cd build $ .....
R Walser's user avatar
  • 504
0 votes
1 answer
108 views

How RDMA map remote memory into local virtual memory?

I am new to RDMA and have just started looking into OPENSHMEM and UCP. I saw that both of them allow mapping remote memory region into local virtual memory space and access it using regular load and ...
zhongyuan chen's user avatar
0 votes
1 answer
281 views

How to run syn-workspace-info in Databricks UCX Migration

I'm trying to work through the Databricks Unity Catalog migration using the UCX tool and am getting this weird error when trying to sync the workspace info. When I run databricks labs ucx sync-...
Japhy's user avatar
  • 45
0 votes
0 answers
95 views

Singularity container with OpenMPI and InfiniBand (UCX)

I'm working as an intern currently, and I was asked to build a Singularity container for OpenMPI to make distributed programming possible on multiple machines of our HPC cluster using containers. ...
Vincent Donney's user avatar
0 votes
0 answers
33 views

Unable to restrict UCP access to tmfifo_net0 using UCX_NET_DEVICES when using UCX in bluefield-2 DOCA

When I run the DOCA allreuce application example: https://docs.nvidia.com/doca/sdk/nvidia+doca+allreduce+application+guide/index.html I find that even though I have set UCX_NET_DEVICES=mlx5_2:1 about ...
Mingxuan Liu's user avatar
0 votes
1 answer
683 views

mm_xpmem.c UCX error failed to attach xpmem

I am running analysis on a cluster and internally I am spawning some processes. Most of the times it works, but sometimes I get following error: mm_xpmem.c:135 UCX ERROR failed to attach xpmem ...
Pavan's user avatar
  • 155
0 votes
1 answer
371 views

How to install ucp module in python? [dask]

I am trying to run on a dask cluster using the ucx protocol. I got to know from my admins that the protocol is installed as expected. I however receive the following error when I am trying to switch ...
NiRvanA's user avatar
  • 135
2 votes
0 answers
162 views

Using UCX protocol Dask Distributed

I would like to take advantage of the InfiniBand network to connect Dask Client and the workers and scheduler (especially between the clients and workers -not necessary with GPUs- as I scatter some ...
Mitchou's user avatar
  • 37
3 votes
1 answer
2k views

UCX warn unexpected tag-receive

What can the following be due to / how to debug it? it happens when closing my MPI application [1612979755.727913] [compute-0-9:21112:0] tag_match.c:61 UCX WARN unexpected tag-receive ...
ATK's user avatar
  • 1,526
1 vote
1 answer
6k views

How to enable CUDA Aware OpenMPI?

I'm using OpenMPI and I need to enable CUDA aware MPI. Together with MPI I'm using OpenACC with the hpc_sdk software. Following https://www.open-mpi.org/faq/?category=buildcuda I downloaded and ...
Steve's user avatar
  • 89
1 vote
3 answers
328 views

Please use compiler that supports __attribute__((constructor))

I just compiled my own version of gcc/9.2.0 using gcc/4.8.2. After successful compilation and installation of gcc/9.2.0 I try compiling ucx-1.5.1. When I try to run the ucx configure script I get the ...
L.H's user avatar
  • 33