Skip to content

Conversation

@foraxe
Copy link
Contributor

@foraxe foraxe commented Dec 11, 2025

What?

This PR fixs the GPUNetIO backend to make nixlbench run on intranode.

Why?

The DOCA couldn't start transfer data in our server envs, with following log:

prepare_write radd 7f8637fff010 rkey 130100 ladd 7f6180000000 lkey d2100100 size 4096     
prepare_write radd 7f8637fff010 rkey 130100 ladd 7f6180000000 lkey d2100100 size 4096
>>>>>>> CUDA rdma write kernel pos 0 posted 512 buffers from base_wqe_idx 0               
got completion with err: syndrome=0x5, vendor_err_synd=0xf9, hw_err_synd=0, hw_synd_type=0, wqe_counter=65281 wqe_qpn=ea030008                                                      
kernel_progress: block 0 error CQE! poll_status -5 wqe 511 index 0     

I had tried many versions of code in community, but ended up with above log.
So, i am uploading an working version which ran on our local servers.
Ref:
#952 (comment)
#788
NVIDIA-DOCA/gpunetio#2 (comment)

How to run:

NIXL_DOCASIM_IPV4_OVERRIDE use the inet ip of mlx5_bond0. One can obtain it by running command ip a

  • Run on node 0:
ETCDCTL_API=3 etcdctl del "" --from-key=true 
export NIXL_GPUNETIO_DEBUG_DUMP=1
    export NIXL_GPUNETIO_SWAP_KEYS=1
    export NIXL_LOG_LEVEL=DEBUG
    export NIXL_DOCASIM_IPV4_OVERRIDE=200.x.7.238
cd /workspace/nixl/benchmark/nixlbench/build
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/gdrcopy/lib:/opt/mellanox/doca:/opt/mellanox/doca/lib/x86_64-linux-gnu NIXL_PLUGIN_DIR=/usr/local/nixl/lib/x86_64-linux-gnu/plugins CUDA_MODULE_LOADING=EAGER ./nixlbench --etcd-endpoints http://10.x.x.21:2379 --backend=GPUNETIO --initiator_seg_type=VRAM --target_seg_type=DRAM --runtime_type=ETCD --gpunetio_device_list=0 --device_list=mlx5_bond_0 --start_batch_size=1 --max_batch_size=1 --total_buffer_size=131072 --max_block_size 16384 --op_type=READ
  • Run for intranode: node 0, use above codes.
  • Result:
image
  • If you want to test run for internode: node 1 , change NIXL_DOCASIM_IPV4_OVERRIDE:
    export NIXL_GPUNETIO_DEBUG_DUMP=1
    export NIXL_GPUNETIO_SWAP_KEYS=1
export NIXL_LOG_LEVEL=DEBUG
    export NIXL_DOCASIM_IPV4_OVERRIDE=200.x.7.238 #change this addr to try on node1
cd /workspace/nixl/benchmark/nixlbench/build
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/gdrcopy/lib:/opt/mellanox/doca:/opt/mellanox/doca/lib/x86_64-linux-gnu NIXL_PLUGIN_DIR=/usr/local/nixl/lib/x86_64-linux-gnu/plugins CUDA_MODULE_LOADING=EAGER ./nixlbench --etcd-endpoints http://10.x.x.21:2379 --backend=GPUNETIO --initiator_seg_type=VRAM --target_seg_type=DRAM --runtime_type=ETCD --gpunetio_device_list=0 --device_list=mlx5_bond_0 --start_batch_size=1 --max_batch_size=1 --total_buffer_size=131072 --max_block_size 16384 --op_type=READ

cc @e-ago @ovidiusm

@foraxe foraxe requested a review from a team as a code owner December 11, 2025 14:50
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi foraxe! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

Comment on lines +473 to +481
if (env_dbg) {
std::ostringstream oss;
oss << "[dbg] MR path: " << (dmabuf_fd >= 0 ? "dmabuf" : "peermem/ibv_reg_mr")
<< ", addr 0x" << std::hex << std::uppercase << (uintptr_t)addr << std::dec
<< " len 0x" << std::hex << (uint64_t)tot_size << std::dec
<< " lkey 0x" << std::hex << (uint32_t)lkey << std::dec
<< " rkey 0x" << std::hex << (uint32_t)rkey << std::dec;
NIXL_INFO << oss.str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use NIXL_DEBUG instead of this

Comment on lines +466 to +467
// lkey = htobe32(ibmr->lkey);
// rkey = htobe32(ibmr->rkey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove

((volatile struct docaNotif *)notif_send_cpu)->msg_buf = msg_buf;
((volatile struct docaNotif *)notif_send_cpu)->msg_lkey = notif->send_mr->get_lkey();
((volatile struct docaNotif *)notif_send_cpu)->msg_size = newMsg.size();
// ((volatile struct docaNotif *)notif_send_cpu)->msg_lkey =notif->send_mr->get_lkey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove

xferReqRingCpu[treq->end_pos - 1].lbuf_notif = notif_addr;
xferReqRingCpu[treq->end_pos - 1].lkey_notif = notif->send_mr->get_lkey();
uint32_t notif_lkey_host = notif->send_mr->get_lkey();
// xferReqRingCpu[treq->end_pos - 1].lkey_notif = notif_lkey_host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +1260 to +1261
//uint32_t lkey = xferReqRingCpu[pos].lkey[idx];
//uint32_t rkey = xferReqRingCpu[pos].rkey[idx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +1077 to +1088
NIXL_INFO << "GPUNETIO registerMem publish dev " << priv->devId << " addr "
<< format_hex((uintptr_t)priv->mr->get_addr()) << " len "
<< format_hex((uint64_t)priv->mr->get_tot_size()) << " lkey "
<< format_hex(lkey) << " rkey " << format_hex(rkey);
if (debug_dump) {
std::ostringstream oss;
oss << " [dbg] publish raw: dev=" << priv->devId << " addr_dec="
<< (uintptr_t)priv->mr->get_addr() << " len_dec="
<< (uint64_t)priv->mr->get_tot_size() << " lkey_dec=" << lkey << " rkey_dec="
<< rkey;
NIXL_INFO << oss.str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it should be NIXL_DEBUG

ss << (int)ipv4_addr[0] << "." << (int)ipv4_addr[1] << "." << (int)ipv4_addr[2] << "."
<< (int)ipv4_addr[3];
str = ss.str();
std::cout << "getConnInfo DOCA: " << str << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not print anything by default except errors
pls use NIXL_DEBUG

Comment on lines +559 to +561
// server_addr.sin_addr.s_addr = INADDR_ANY;
/* listen on any interface */
std::memcpy(&server_addr.sin_addr.s_addr, ipv4_addr, sizeof(ipv4_addr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// server_addr.sin_addr.s_addr = INADDR_ANY;
/* listen on any interface */
std::memcpy(&server_addr.sin_addr.s_addr, ipv4_addr, sizeof(ipv4_addr));
std::memcpy(&server_addr.sin_addr.s_addr, ipv4_addr, sizeof(ipv4_addr));

notifMap[remote_agent] = notif;
((volatile struct docaNotif *)notif_fill_cpu)->msg_buf = (uintptr_t)notif->recv_addr;
((volatile struct docaNotif *)notif_fill_cpu)->msg_lkey = notif->recv_mr->get_lkey();
// ((volatile struct docaNotif *)notif_fill_cpu)->msg_lkey = notif->recv_mr->get_lkey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ((volatile struct docaNotif *)notif_fill_cpu)->msg_lkey = notif->recv_mr->get_lkey();

else
swap_keys_config = true;
}
const char *env_dbg = std::getenv("NIXL_GPUNETIO_DEBUG_DUMP");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using NIXL_DEBUG instead?

@brminich brminich requested a review from e-ago January 8, 2026 15:46
@brminich
Copy link
Contributor

brminich commented Jan 8, 2026

@e-ago, can you pls review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants