fix(memory-pool): replace flaky sum-based assertion with monotonic counter check#2267
fix(memory-pool): replace flaky sum-based assertion with monotonic counter check#2267phil-opp wants to merge 1 commit into
Conversation
…ssertion The receiver validated data propagation by summing the first 8 elements of a random tensor (range [0,1000) each). Adjacent draws had ~1/3000 collision probability; across 99 comparisons this gave ~3% spurious failure per run even when the transport worked correctly. Stamp element[0] with the iteration counter in the sender so the receiver can assert `tensor[0] == i` — deterministic with zero false positives. Also wire cpu2cpu.yml into the test harness (the only GPU-free scenario): - tests/example-smoke.rs: two #[ignore] tests (torch not in PR CI, run explicitly with `cargo test -- --ignored smoke_memory_pool`) - scripts/smoke-all.sh: conditional on `python3 -c "import torch"` Partially addresses #2264 (smoke tests for cpu2cpu; CUDA variants still require NVIDIA hardware). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QheETdNcNiZ6JZa8oz5sRK
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Automated review by Claude — fully automated, no human has vetted this; treat as a suggestion. No issues found. Reviewed the diff directly:
Test-only / example-only change with no production-code impact. Generated by Claude Code |
|
Automated review by Claude — fully automated review; no human has vetted this, please verify before acting. No issues found. The counter fix is correct, the two smoke tests are One note: this overlaps with #2268 and #2279, which address the same #2264 findings. #2268 additionally wires the four Generated by Claude Code |
Summary
Addresses the two concrete findings in #2264.
Finding 2 fix: eliminate the ~3% spurious failure rate in
receiver.pyThe old assertion summed the first 8
int64elements (each in[0, 1000)) of two consecutive random tensors and asserted they differed. With a sum range of[0, 7992]and ~99 comparisons per run, two adjacent draws had a ~1/3000 collision probability, giving roughly 3% spurious test failure per run even when the transport worked perfectly.Fix: stamp element
[0]with the iteration counteriin the sender, and asserttensor[0] == iin the receiver. Zero collision probability.sender.py:
receiver.py:
Finding 1 (partial): wire
cpu2cpu.ymlinto the test harnesscpu2cpu.ymlis the only GPU-free scenario in the memory-pool suite. Added:tests/example-smoke.rs: two#[ignore]smoke tests (torch is not installed in standard PR CI; run explicitly withcargo test --test example-smoke -- --ignored smoke_memory_pool)scripts/smoke-all.sh: conditional block that runscpu2cpu.ymlwhenpython3 -c "import torch, tqdm"succeeds, skips otherwiseThe CUDA-dependent scenarios (
cpu2cuda,cuda2cpu) remain blocked by the need for NVIDIA hardware.Closes #2264
Generated by Claude Code