Skip to content

Fix plugin quality: schema-driven parsing, SIGTERM handling, container base alignment#4

Open
mcurrier2 wants to merge 6 commits intomainfrom
fix-plugin-quality-improvements
Open

Fix plugin quality: schema-driven parsing, SIGTERM handling, container base alignment#4
mcurrier2 wants to merge 6 commits intomainfrom
fix-plugin-quality-improvements

Conversation

@mcurrier2
Copy link
Copy Markdown
Contributor

Summary

Addresses 9 quality and correctness issues in the rusty-comms Arcaflow plugin:

  • Schema-driven JSON deserialization: Replaced ~220 lines of manual _parse_* functions with the SDK's build_object_schema / unserialize, adding a recursive _normalize_for_schema() preprocessor that strips unknown keys and coerces float-to-int for IntType fields.
  • Explicit blocking default: Changed blocking schema default from None to True and simplified the CLI condition from is not False to a straightforward truthy check, making intent explicit.
  • SIGTERM handler: Added _sigterm_handler() that forwards SIGTERM to the active child process group, preventing orphaned ipc-benchmark processes during Arcaflow engine shutdown.
  • Configurable timeout: Made the benchmark execution timeout configurable per-test via TestRunConfig.timeout with an explicit schema default of 3600s, replacing the hardcoded value. Error messages now report the actual configured timeout.
  • Merge aggregation fix: _merge_outputs now aggregates duplicate mechanism summaries (sum messages, max throughput, min latencies) instead of silently overwriting. Winners are derived in a second pass over fully merged data.
  • Type-safe status field: Changed BenchmarkResult.status from typing.Any to str, added failure_reason: Optional[str] field. Preprocessing transforms {"Failure": "reason"} dicts from the Rust binary into the split typed fields.
  • Base images updated: Bumped arcalot base images from 0.4.0 to 0.5.0.
  • Unified container distro: Switched Rust build stage from Debian bookworm (glibc 2.36) to CentOS Stream 9 (glibc 2.34) with rustup, matching the runtime base and eliminating glibc forward-compatibility risk.
  • SDK version bump: Updated arcaflow-plugin-sdk from 0.14.0 to 0.14.4 and Python requirement from 3.9+ to 3.12+ to fix ForwardRef._evaluate() TypeError on Python 3.12.4+.
  • CACHE_BUST build arg: Added ARG CACHE_BUST before git clone so the layer cache can be invalidated on demand without rebuilding the Rust toolchain.

Test plan

  • All 32 unit tests pass in containerized environment (Python 3.12, SDK 0.14.4)
  • Full podman build succeeds end-to-end (Rust build on CentOS Stream 9 + Python tests in buildbase)
  • Run plugin against live ipc-benchmark binary to verify JSON parsing with real output
  • Verify SIGTERM forwarding: send SIGTERM to plugin during a benchmark run and confirm no orphaned child processes

Made with Cursor

… bases

- Replace ~220 lines of manual _parse_* JSON-to-dataclass functions with
  schema-driven deserialization using SDK's build_object_schema/unserialize.
  Add _normalize_for_schema() recursive preprocessor that strips unknown
  dict keys and coerces float→int for IntType fields before unserialize.

- Fix blocking default logic: change `if params.blocking is not False` to
  `if params.blocking` and set schema default to True explicitly, making
  intent clear and matching documented behavior.

- Add SIGTERM handler (_sigterm_handler) that forwards SIGTERM to the
  active child process group, preventing orphaned ipc-benchmark processes
  when the Arcaflow engine initiates graceful shutdown.

- Make benchmark timeout configurable via TestRunConfig.timeout (per-test),
  replacing the hardcoded 3600s value. Error messages now report the
  actual configured timeout.

- Fix _merge_outputs to aggregate duplicate mechanism summaries instead
  of silently overwriting: total_messages are summed, best throughput
  and lowest latencies are kept. Fastest/lowest-latency mechanism winners
  are derived in a second pass over fully merged data.

- Improve type safety of BenchmarkResult.status: change from typing.Any
  to str, add optional failure_reason field. Preprocessing in
  _parse_json_output transforms {"Failure": "reason"} dicts into
  status="Failure" + failure_reason="reason".

- Update arcalot base images from 0.4.0 to 0.5.0 for both buildbase
  and osbase stages.

- Switch Rust build stage from Debian bookworm (glibc 2.36) to CentOS
  Stream 9 (glibc 2.34) with rustup, matching the runtime base image
  distro family and eliminating glibc forward-compatibility risk.

- Update tests: expect ConstraintException for missing keys, add tests
  for float-to-int coercion, unknown key stripping, and extra top-level
  keys.

AI-assisted-by: claude-4.6-opus (Anthropic)
Made-with: Cursor
…mpat

- Remove explicit curl from dnf install in Rust build stage; the
  CentOS Stream 9 base already ships curl-minimal which conflicts
  with the full curl package but is sufficient for rustup download.

- Bump arcaflow-plugin-sdk from ^0.14.0 to ^0.14.4 (resolves to
  0.14.4) to fix ForwardRef._evaluate() TypeError on Python 3.12.4+.
  SDK 0.14.0 called ForwardRef._evaluate() without the recursive_guard
  keyword argument added in CPython 3.12.4.

- Update Python version requirement from ^3.9 to ^3.12 to match
  SDK 0.14.4's minimum and the buildbase image's Python 3.12.11.

- Regenerate poetry.lock with updated dependencies.

AI-assisted-by: claude-4.6-opus (Anthropic)
Made-with: Cursor
- Add ARG CACHE_BUST before the git clone step so the layer cache
  can be invalidated on demand (e.g. podman build --build-arg
  CACHE_BUST=$(date +%s)) without rebuilding the Rust toolchain.

AI-assisted-by: claude-4.6-opus (Anthropic)
Made-with: Cursor
- Change timeout field default from None to 3600 in TestRunConfig
  schema, making the default explicit (per review feedback).
- Remove the None-check fallback in _run_single_test; since the
  schema always provides a value, test_config.timeout is used
  directly.

AI-assisted-by: claude-4.6-opus (Anthropic)
Made-with: Cursor
@mcurrier2 mcurrier2 requested review from dustinblack and ewchong April 7, 2026 15:09
arcalot-bot and others added 2 commits April 7, 2026 15:32
- Replace manual ARG CACHE_BUST with ADD of the GitHub API
  refs/heads/main endpoint. ADD always fetches the URL at build
  time; when the commit SHA changes the layer cache is invalidated
  automatically, forcing a fresh git clone and cargo build.
- Works on both local and CI builds without requiring --build-arg.

AI-assisted-by: claude-4.6-opus (Anthropic)
Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants