Skip to content

Commit fa559ee

Browse files
authored
nixlbench: Update README.md (#1026)
* nixlbench: Update README.md Add info on etcd barrier timeout and some cleanup. Fixes #1022. Signed-off-by: Adit Ranadive <[email protected]>
1 parent a089fdc commit fa559ee

File tree

1 file changed

+13
-24
lines changed

1 file changed

+13
-24
lines changed

benchmark/nixlbench/README.md

Lines changed: 13 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -528,6 +528,7 @@ NIXL Benchmark uses an ETCD key-value store for coordination between benchmark w
528528

529529
1. Ensure ETCD server is running (e.g., `docker run -p 2379:2379 quay.io/coreos/etcd`
530530
2. Launch multiple nixlbench instances pointing to the same ETCD server
531+
3. Multiple instances should be launched within the default timeout of 60s.
531532

532533
**For single-instance storage benchmarks:**
533534
```bash
@@ -538,21 +539,15 @@ NIXL Benchmark uses an ETCD key-value store for coordination between benchmark w
538539
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend GDS --filepath /mnt/storage/testfile
539540
```
540541

541-
Note: etcd can be installed directly on host as well:
542-
```bash
543-
apt install etcd-server
544-
```
545-
546-
Example:
542+
**For multi-instance storage benchmarks where ETCD is required:**
547543
```bash
548544
# On host 1
549545
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --initiator_seg_type VRAM --target_seg_type VRAM
550546

551547
# On host 2
552548
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --initiator_seg_type VRAM --target_seg_type VRAM
553549
```
554-
555-
The workers automatically coordinate ranks through ETCD as they connect.
550+
The workers automatically coordinate ranks through ETCD as they connect. Note, the second nixlbench should be started within 60s, otherwise the first instance will stop with an error in the barrier.
556551

557552
### Backend-Specific Examples
558553

@@ -562,9 +557,11 @@ The workers automatically coordinate ranks through ETCD as they connect.
562557
```bash
563558
# Basic UCX benchmark
564559
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX
560+
sleep 2 && ./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX
565561

566562
# UCX with specific devices
567-
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --device_list mlx5_0,mlx5_1
563+
$ host1 > ./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --device_list mlx5_0,mlx5_1
564+
$ host2 > sleep 2 && ./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --device_list mlx5_0,mlx5_1
568565
```
569566

570567
**GPUNETIO Backend:**
@@ -706,20 +703,6 @@ Transfer times are higher than local storage, so consider reducing iterations:
706703
- Test read operations: `--op_type READ`
707704
- Validate data consistency: `--check_consistency`
708705

709-
### Multi-Node Coordination
710-
711-
Launch multiple nixlbench instances pointing to the same ETCD server:
712-
713-
```bash
714-
# On host 1
715-
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --initiator_seg_type VRAM --target_seg_type VRAM
716-
717-
# On host 2
718-
./nixlbench --etcd_endpoints http://etcd-server:2379 --backend UCX --initiator_seg_type VRAM --target_seg_type VRAM
719-
```
720-
721-
The workers automatically coordinate ranks through ETCD as they connect.
722-
723706
## Troubleshooting
724707

725708
### Common Build Issues
@@ -814,6 +797,12 @@ export UCX_LOG_LEVEL=DEBUG # Verbose UCX logging
814797
export UCX_PROTO_INFO=y # See transport used by UCX
815798
```
816799

800+
#### ETCD Cleanup
801+
```bash
802+
# If a nixlbench instance failed you need to cleanup the etcd instance before starting nixlbench again
803+
ETCDCTL_API=3 etcdctl del "xferbench" --prefix=true
804+
```
805+
817806
### Performance Tuning
818807

819808
#### CPU Affinity
@@ -842,4 +831,4 @@ sudo sysctl -p
842831

843832
---
844833

845-
*This guide covers NIXLBench build and usage procedures as of 2025. For the latest updates, please refer to the official repository.*
834+
*This guide covers NIXLBench build and usage procedures as of 2025. For the latest updates, please refer to the official repository.*

0 commit comments

Comments
 (0)