Change nixlbench to call makeXferReq on every I/O #1136

benlwalker · 2025-12-16T18:31:02Z

What?

Change nixlbench to call makeXferReq on every I/O. The prepXferDlist calls are outside of the main I/O loop.

Why?

We observe that all real software using NIXL calls either makeXferReq or createXferReq in the I/O path, immediately before posting the request.
I cannot imagine a system that wouldn't call makeXferReq on demand. Once a request is created, it's tied to all of the ranges it will be transferirng. There's no fathomable system that could know all of the exact batches it will transfer over the life of the application at start up time.

Reason 1 is insufficient for the change here because all real software I can find also busy polls in a loop after calling postXferReq and that doesn't mean we should make NIXL synchronous. Instead, it's the combination of reason 1 and reason 2 that justifies the change.

How?

For the GUSLI path there was already logic to call createXferReq in the I/O path. I first made that always the case, rather than just for GUSLI. Then I broke createXferReq into two prepXferDlist calls up front and makeXferReq in the I/O path because that's more efficient and still realistic.

Remove recreate_per_iteration parameter from execTransferIterations and always recreate transfer requests for each iteration. This reflects real world usage where at least makeXferRequest is always called in the hot path. No one does the exact same I/O over and over. Signed-off-by: Ben Walker <[email protected]>

…akeXferReq Replace createXferReq with the two-step approach of prepXferDlist and makeXferReq. This optimization moves the prepXferDlist calls outside the iteration loop, avoiding redundant descriptor list preparation on each iteration. While most real uses of NIXL seem to call createXferReq in the hot path, the most optimized ones at least do prepXferDlist ahead of time. This makes nixlbench match that behavior. Signed-off-by: Ben Walker <[email protected]>

copy-pr-bot · 2025-12-16T18:31:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-12-16T18:31:11Z

👋 Hi benlwalker! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

aranadive · 2025-12-16T21:48:47Z

Lets keep the previous createXferReq path as well. This denotes the absolute limit for the system config similar to many performance tests as well as to test nixl API in general. MakeXferReq can denote an alternate path for applications to perform flexible allocations.

iyastreb · 2025-12-17T07:20:32Z

benchmark/nixlbench/src/worker/nixl/nixl_worker.cpp

+        std::cerr << "prepXferDlist (local) failed: " << nixlEnumStrings::statusStr(prep_rc)
+                  << std::endl;
+        return -1;
+    }


I would propose to use a scope guard to automatically release handles when goes out of scope. This way we don't need release calls in many places, and it's also exception safe. Just allocate it once right after prepXferDlist, same for remote_dlist_hndl

auto local_guard = make_scope_guard ([&] { agent->releasedDlistH(local_dlist_hndl); });

benlwalker · 2025-12-19T19:42:44Z

Lets keep the previous createXferReq path as well. This denotes the absolute limit for the system config similar to many performance tests as well as to test nixl API in general. MakeXferReq can denote an alternate path for applications to perform flexible allocations.

I strongly disagree here. It's not possible to create a system that doesn't call either createXferReq or makeXferReq on each I/O request, so it isn't worth measuring without it. If we need to pull together the team to have a meeting on this, let's do that after the holidays.

benlwalker added 2 commits December 15, 2025 13:18

benlwalker requested review from aranadive, brminich and ovidiusm as code owners December 16, 2025 18:31

pull-request-size bot added the size/L label Dec 16, 2025

github-actions bot added the external-contribution label Dec 16, 2025

iyastreb reviewed Dec 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change nixlbench to call makeXferReq on every I/O #1136

Change nixlbench to call makeXferReq on every I/O #1136

Uh oh!

benlwalker commented Dec 16, 2025

Uh oh!

copy-pr-bot bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

aranadive commented Dec 16, 2025

Uh oh!

iyastreb Dec 17, 2025

Uh oh!

benlwalker commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change nixlbench to call makeXferReq on every I/O #1136

Are you sure you want to change the base?

Change nixlbench to call makeXferReq on every I/O #1136

Uh oh!

Conversation

benlwalker commented Dec 16, 2025

What?

Why?

How?

Uh oh!

copy-pr-bot bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

aranadive commented Dec 16, 2025

Uh oh!

iyastreb Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

benlwalker commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants