Skip to content

Commit af31295

Browse files
authored
fix: fix unfinished_ranks order (#93)
<!-- **Thanks for contributing to Awex.** **If this is your first time opening a PR on Awex, you can refer to [CONTRIBUTING.md](https://github.com/inclusionAI/asystem-awex/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Awex** community has requirements on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/inclusionAI/asystem-awex/blob/main/CONTRIBUTING.md). --> ## What does this PR do? <!-- Describe the details of this PR. --> ## Related issues <!-- Is there any related issue? If this PR closes them you say say fix/closes: - #xxxx0 - #xxxx1 - Fixes #xxxx2 --> ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/inclusionAI/asystem-awex/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change?
1 parent 8ed38a9 commit af31295

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

awex/transfer/nccl_comm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ def nccl_build_send_ops(parameters, transfer_plan, weights_update_group, copy_ra
388388
train_slice_context = {}
389389
while len(unfinished_ranks) > 0:
390390
finished_ranks = set()
391-
for recv_rank in unfinished_ranks:
391+
for recv_rank in sorted(unfinished_ranks):
392392
operations = transfer_plan.operations[recv_rank]
393393
progress = send_progress[recv_rank]
394394
num_operations = len(operations)
@@ -425,7 +425,7 @@ def nccl_build_recv_ops(
425425
unfinished_ranks = set(transfer_plan.operations.keys())
426426
while len(unfinished_ranks) > 0:
427427
finished_ranks = set()
428-
for send_rank in unfinished_ranks:
428+
for send_rank in sorted(unfinished_ranks):
429429
operations = transfer_plan.operations[send_rank]
430430
progress = recv_progress[send_rank]
431431
num_operations = len(operations)

0 commit comments

Comments
 (0)