Skip to content

Conversation

@amd-nithyavs
Copy link
Contributor

Adds some bug fixes to the acoll component.

  • gather is corrected to work for non-zero root ranks when three stage algorithm is used.
  • Busy waits in shared memory bcast and barrier are changed to avoid random hangs.
  • Overrides with command line arguments are properly taken care of in bcast algorithm selection.
  • Multinode path of reduce and allreduce is fixed to use the hierarchical algorithms of acoll.
  • Compile time warnings are removed.

Adds some bug fixes to the acoll component.
- gather is corrected to work for non-zero root ranks when three stage
  algorithm is used.
- Busy waits in shared memory bcast and barrier are changed to avoid
  random hangs.
- Overrides with command line arguments are properly taken care of in
  bcast algorithm selection.
- Multinode path of reduce and allreduce is fixed to use the
  hierarchical algorithms of acoll.
- Compile time warnings are removed.

Signed-off-by: Nithya V S <[email protected]>
Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor formatting things, but otherwise it looks good in my opinion.

}

if (MPI_SUCCESS != err) {
if (NULL != inplacebuf_free) { free(inplacebuf_free); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the free(inplace_buf) into a separate line (two more instances of the same issue below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

if (num_nodes == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR has a number of places where the constant is not listed at the first part of the comparison. Its most relevant for the == options, but does also apply for != and other operands (at many places you have the correct order, e.g. a few lines above at the if (MPI_SUCCESS != err)). Please double check the PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and in other places in multiple files.

Changes in formatting to adhere to ompi coding style.

Signed-off-by: Nithya V S <[email protected]>
Fixes a condition for freeing temporary buffer in acoll reduce.

Signed-off-by: Nithya V S <[email protected]>
Copy link
Contributor Author

@amd-nithyavs amd-nithyavs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes along with one more fix related to a buffer free.

Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mshanthagit mshanthagit merged commit e47e9dd into open-mpi:main Jan 5, 2026
15 of 17 checks passed
amd-nithyavs added a commit to amd-nithyavs/ompi_acoll that referenced this pull request Jan 6, 2026
…ixes

Fixes to acoll component

(cherry picked from commit e47e9dd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants