Intermittent deadlock in ompi_sync_wait_mt on ARM64 (Graviton)

## Background information

### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

- v4.1.6
- v5.0.6

### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

- Ubuntu distribution
- AWS distribution (EFA)

### Please describe the system on which you are running

* Operating system/version: Amazon Linux 2023, Ubuntu 24
* Computer hardware: c7gn.16xlarge (and other Graviton2 or Graviton3 instance types)
* Network type: ENI (TCP)

-----------------------------

## Summary

We're seeing intermittent deadlocks in multi-threaded MPI applications on AWS Graviton that never occur on x86_64. All threads end up stuck in `ompi_sync_wait_mt()`. After analyzing the code, I believe this is due to missing memory barriers in the ARM64 atomic operations used by the wait_sync mechanism.

## Environment

- Open MPI: v5.0.8 (also tested v4.1.6, same issue)
- Platform: AWS Graviton2 or Graviton3 (ARM64/aarch64)
- Compiler: GCC -O2
- Application: Multi-threaded MPI with concurrent operations from multiple threads

## What we observe

When the deadlock occurs, backtraces show all ranks stuck in `ompi_sync_wait_mt()`. Some threads are spinning in `opal_progress()`, others are blocked on condition variables. The affected MPI operations include MPI_Ssend, MPI_Mprobe, and MPI_Waitall.

Unfortunately there is _no known minimal reproducer_ that triggers the deadlock deterministically. The issue appears intermittently at production-scale workloads with high thread contention. It never happens on x86_64 with the same code.

## Analysis

Looking at the code, `ompi_sync_wait_mt()` implements a counting semaphore pattern:

The completion thread decrements `sync->count` (wait_sync.h:144):
```c
if (0 != (OPAL_THREAD_ADD_FETCH32(&sync->count, -updates))) {
    return;
}
WAIT_SYNC_SIGNAL(sync);
```

The waiting thread polls it (wait_sync.c:118-120):
```c
OPAL_THREAD_ADD_FETCH32(&num_thread_in_progress, 1);
while (sync->count > 0) {
    opal_progress();
}
```

The problem is in `opal/include/opal/sys/arm64/atomic.h`. The `OPAL_ASM_MAKE_ATOMIC` macro (lines 286-295) uses `ldxr`/`stxr` which are atomic but lack acquire/release semantics. Additionally, the `while (sync->count > 0)` is a plain C read with no acquire semantics.

On ARM64's weak memory model, this means:
- The completion thread's decrement might not be visible to the waiting thread
- The waiting thread might read a stale cached value

Interestingly, the same file has `opal_atomic_swap_*` (lines 247-258) which correctly uses `ldaxr`/`stlxr` with acquire/release semantics. So there's an inconsistency.

## History

This appears to have been introduced in commit 7893248c5a (Nov 2017) when fetch-and-op atomics were added. It affects all v4.x and v5.x releases. There was a partial fix attempt in commit 5e13f02631 (Jan 2021) that added atomic_llsc.h with proper barriers, but the OPAL_ASM_MAKE_ATOMIC macro wasn't updated.

## Proposed fix

I think both of these changes are needed:

1. Update OPAL_ASM_MAKE_ATOMIC to use ldaxr/stlxr (matching opal_atomic_swap_*)
2. Change the plain read to use an atomic load with acquire semantics

## References

- Commit 7893248c5a (introduced the issue)
- Commit 5e13f02631 (partial fix attempt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent deadlock in ompi_sync_wait_mt on ARM64 (Graviton) #13761

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Summary

Environment

What we observe

Analysis

History

Proposed fix

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intermittent deadlock in ompi_sync_wait_mt on ARM64 (Graviton) #13761

Description

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Summary

Environment

What we observe

Analysis

History

Proposed fix

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions