Fix issues related to GPUNetIO #1112

foraxe · 2025-12-11T14:50:09Z

What?

This PR fixs the GPUNetIO backend to make nixlbench run on intranode.

Why?

The DOCA couldn't start transfer data in our server envs, with following log:

prepare_write radd 7f8637fff010 rkey 130100 ladd 7f6180000000 lkey d2100100 size 4096     
prepare_write radd 7f8637fff010 rkey 130100 ladd 7f6180000000 lkey d2100100 size 4096
>>>>>>> CUDA rdma write kernel pos 0 posted 512 buffers from base_wqe_idx 0               
got completion with err: syndrome=0x5, vendor_err_synd=0xf9, hw_err_synd=0, hw_synd_type=0, wqe_counter=65281 wqe_qpn=ea030008                                                      
kernel_progress: block 0 error CQE! poll_status -5 wqe 511 index 0

I had tried many versions of code in community, but ended up with above log.
So, i am uploading an working version which ran on our local servers.
Ref:
#952 (comment)
#788
NVIDIA-DOCA/gpunetio#2 (comment)

How to run:

NIXL_DOCASIM_IPV4_OVERRIDE use the inet ip of mlx5_bond0. One can obtain it by running command ip a

Run on node 0:

ETCDCTL_API=3 etcdctl del "" --from-key=true 
export NIXL_GPUNETIO_DEBUG_DUMP=1
    export NIXL_GPUNETIO_SWAP_KEYS=1
    export NIXL_LOG_LEVEL=DEBUG
    export NIXL_DOCASIM_IPV4_OVERRIDE=200.x.7.238
cd /workspace/nixl/benchmark/nixlbench/build
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/gdrcopy/lib:/opt/mellanox/doca:/opt/mellanox/doca/lib/x86_64-linux-gnu NIXL_PLUGIN_DIR=/usr/local/nixl/lib/x86_64-linux-gnu/plugins CUDA_MODULE_LOADING=EAGER ./nixlbench --etcd-endpoints http://10.x.x.21:2379 --backend=GPUNETIO --initiator_seg_type=VRAM --target_seg_type=DRAM --runtime_type=ETCD --gpunetio_device_list=0 --device_list=mlx5_bond_0 --start_batch_size=1 --max_batch_size=1 --total_buffer_size=131072 --max_block_size 16384 --op_type=READ

Run for intranode: node 0, use above codes.
Result:

If you want to test run for internode: node 1 , change NIXL_DOCASIM_IPV4_OVERRIDE:

    export NIXL_GPUNETIO_DEBUG_DUMP=1
    export NIXL_GPUNETIO_SWAP_KEYS=1
export NIXL_LOG_LEVEL=DEBUG
    export NIXL_DOCASIM_IPV4_OVERRIDE=200.x.7.238 #change this addr to try on node1
cd /workspace/nixl/benchmark/nixlbench/build
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/gdrcopy/lib:/opt/mellanox/doca:/opt/mellanox/doca/lib/x86_64-linux-gnu NIXL_PLUGIN_DIR=/usr/local/nixl/lib/x86_64-linux-gnu/plugins CUDA_MODULE_LOADING=EAGER ./nixlbench --etcd-endpoints http://10.x.x.21:2379 --backend=GPUNETIO --initiator_seg_type=VRAM --target_seg_type=DRAM --runtime_type=ETCD --gpunetio_device_list=0 --device_list=mlx5_bond_0 --start_batch_size=1 --max_batch_size=1 --total_buffer_size=131072 --max_block_size 16384 --op_type=READ

cc @e-ago @ovidiusm

copy-pr-bot · 2025-12-11T14:50:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-12-11T14:50:18Z

👋 Hi foraxe! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

brminich · 2026-01-08T15:41:25Z

src/plugins/gpunetio/verbs/verbs.cpp

+    if (env_dbg) {
+        std::ostringstream oss;
+        oss << "[dbg] MR path: " << (dmabuf_fd >= 0 ? "dmabuf" : "peermem/ibv_reg_mr")
+            << ", addr 0x" << std::hex << std::uppercase << (uintptr_t)addr << std::dec
+            << " len 0x" << std::hex << (uint64_t)tot_size << std::dec
+            << " lkey 0x" << std::hex << (uint32_t)lkey << std::dec
+            << " rkey 0x" << std::hex << (uint32_t)rkey << std::dec;
+        NIXL_INFO << oss.str();
+    }


pls use NIXL_DEBUG instead of this

brminich · 2026-01-08T15:41:34Z

src/plugins/gpunetio/verbs/verbs.cpp

+//    lkey = htobe32(ibmr->lkey);
+//    rkey = htobe32(ibmr->rkey);


brminich · 2026-01-08T15:42:01Z