Skip to content

Conversation

@Spycsh
Copy link

@Spycsh Spycsh commented Nov 24, 2025

What?

Correct NIXL build script

Why?

The current NIXL build script cannot recognize UCX RDMA devices and here is the fix. With this fix, no extra parameters are needed as well. auditwheel is also removed here because it is not necessary.

How?

Benchmark as below

	1) ucx+rdma+cpu
 UCX_TLS="rc,ud,ib" python nixl_api_test.py --nixl_backend UCX --block-size 128 --device-type cpu
Sustained Throughput:    24.991 GB/s

	2) ucx+tcp+cpu
 UCX_TLS="tcp" python nixl_api_test.py --nixl_backend UCX --block-size 128 --device-type cpu
Sustained Throughput:    0.699 GB/s

	3) ofi+cpu
python nixl_api_test.py --nixl_backend OFI --block-size 128 --device-type cpu
Sustained Throughput:    23.161 GB/s

	4) ofi+hpu
python nixl_api_test.py --nixl_backend OFI --block-size 128 --device-type hpu
Sustained Throughput:    19.446 GB/s

Update with an h2d copy after transfer

1) UCX_TLS="rc,ud,ib" python nixl_api_test.py --nixl_backend UCX --block-size 128 --device-type cpu --do-h2d-cp
Sustained Throughput:    1.120 GB/s

2) UCX_TLS="tcp" python nixl_api_test.py --nixl_backend UCX --block-size 128 --device-type cpu --do-h2d-cp
Sustained Throughput:    0.374 GB/s

3) python nixl_api_test.py --nixl_backend OFI --block-size 128 --device-type cpu --do-h2d-cp
Sustained Throughput:    1.297 GB/s

4) python nixl_api_test.py --nixl_backend OFI --block-size 128 --device-type hpu --do-h2d-cp
Sustained Throughput:    19.567 GB/s

@Spycsh Spycsh changed the title Correct the NIXL build script Correct the NIXL build script and add optional h2d copy after transfer Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant