Commit 42e8954
authored
fix(ep): mlx5 collapsed CQ + dedicated dispatch send buffer for internode-v1 (#366)
- Fix mlx5-RoCE internode-v1 silent corruption: combine overwrote dispatch's in-flight send source `staging`; dispatch now uses a dedicated `dispatchStaging` buffer.
- Add mlx5 collapsed CQ (cc=1/oi=1): track completions via `CQE[0].wqe_counter`; final per-pe quiet waits for live `postIdx`, recycle gate keeps a snapshot.
- Allocate GPU control structures (CQ/QP/doorbell/atomic ibuf) uncached; device-scope fences in the CQ drain.
- Drop dead `outstandingWqe` writes on mlx5/psd.1 parent d87651c commit 42e8954
5 files changed
Lines changed: 97 additions & 137 deletions
File tree
- include/mori
- ops/dispatch_combine
- shmem
- src
- application/transport/rdma/providers/mlx5
- ops/dispatch_combine
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| 185 | + | |
| 186 | + | |
185 | 187 | | |
186 | 188 | | |
187 | 189 | | |
| |||
0 commit comments