v0.8.0
- Enables intra-node zero-copy to improve data transfer efficiency for small messages.
- Supports a naive AllReduce implementation in uniRunner mode using a CPU-centric, device-assisted algorithm.
- Adds one-sided communication primitives via the new APIs flagcxHeteroPut and flagcxHeteroPutSignal.