Add regression test suite and CI workflow by astefano · Pull Request #25 · Beneficial-AI-Foundation/verilib-cli

astefano · 2026-02-24T17:51:36Z

Introduce 16 regression tests covering the full verilib-cli pipeline (create -> atomize -> specify -> verify) with a Rust mock probe-verus binary for end-to-end testing without external dependencies.

Test categories:

Full pipeline with mock probe-verus
Cross-stage data flow (atomize -> specify -> verify)
Quantitative regression (enrichment coverage, spec-text, verification)
"Structure for Verification Projects" requirement validation (code-name precedence, .md immutability, cert timestamps, dependency correctness, required fields)
Idempotency (atomize and verify produce stable output)

All assertions check exit codes and artifact contents, never stdout/stderr message formats, for resilience to implementation changes.

Also adds a GitHub Actions CI workflow that runs fmt, clippy, and tests on pull requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce 16 regression tests covering the full verilib-cli pipeline (create -> atomize -> specify -> verify) with a Rust mock probe-verus binary for end-to-end testing without external dependencies. Test categories: - Full pipeline with mock probe-verus - Cross-stage data flow (atomize -> specify -> verify) - Quantitative regression (enrichment coverage, spec-text, verification) - "Structure verif proj" doc requirement validation (code-name precedence, .md immutability, cert timestamps, dependency correctness, required fields) - Idempotency (atomize and verify produce stable output) All assertions check exit codes and artifact contents, never stdout/stderr message formats, for resilience to implementation changes. Also adds a GitHub Actions CI workflow that runs fmt, clippy, and tests on pull requests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When --no-probe is set, parse .md frontmatter directly using the existing structure module instead of calling probe-verus stubify. This makes atomize --no-probe fully self-contained and fixes CI failures where probe-verus is not available. Co-authored-by: Cursor <cursoragent@cursor.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request introduces a comprehensive regression test suite for the verilib-cli pipeline along with a CI workflow. The PR adds 16 regression tests organized into 5 categories: full pipeline tests with a mock probe-verus binary, cross-stage data flow tests, quantitative regression tests, requirement validation tests (based on the "Structure for Verification Projects" document), and idempotency tests.

Changes:

Added 16 regression tests in tests/regression_test.rs covering the full create → atomize → specify → verify pipeline
Introduced a Rust-based mock probe-verus binary for end-to-end testing without external dependencies
Added test fixtures and configuration files to support the test suite
Added GitHub Actions CI workflow that runs fmt, clippy, and tests on pull requests
Applied extensive formatting improvements across the codebase (clippy fixes, consistent indentation, line breaks)
Enhanced atomize command with load_stubs_from_md_files function to support --no-probe mode by directly parsing .md frontmatter

Reviewed changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/regression_test.rs	New comprehensive test suite with 16 tests across 5 categories covering full pipeline, data flow, quantitative metrics, requirements validation, and idempotency
tests/helpers/mock_probe_verus.rs	Mock probe-verus binary that copies fixture files based on subcommand for testing without external dependencies
tests/fixtures/*.{json,csv}	Test fixtures for atoms, specs, proofs, stubs, and tracked functions
tests/structure_commands_test.rs	Formatting improvements to existing tests
src/commands/atomize.rs	Added `load_stubs_from_md_files` function to support --no-probe mode
src/structure/*.rs	Formatting improvements (indentation, line breaks)
src/storage/*.rs	Formatting improvements and module reordering
src/download/*.rs	Formatting improvements
src/commands/*.rs	Extensive formatting improvements across all command files
src/main.rs, src/cli.rs, src/constants.rs	Minor formatting improvements
.github/workflows/ci.yml	New CI workflow for fmt, clippy, and test checks on PRs
Cargo.toml	Added mock-probe-verus binary configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Merge regression_test.rs and structure_commands_test.rs into a single file with 27 behavioral tests (down from 40), organized into 7 modules: atomize, atomize_atoms_only, specify, verify, error_handling, create, and pipeline. Removed 13 tests: stdout/stderr message assertions (not behavioral), vacuous tests with if-guards that skip assertions, and exact duplicates between the two files. Every remaining test has a doc comment explaining the behavioral property it verifies. Also fix atomize --no-probe to parse .md frontmatter directly instead of calling probe-verus stubify, so the flag works without probe-verus installed. Co-authored-by: Cursor <cursoragent@cursor.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T09:36:00Z

+    /// Enrichment is idempotent: running atomize twice with the same inputs
+    /// must produce byte-identical stubs.json.
+    #[test]
+    fn two_runs_produce_identical_output() {
+        let tmp = setup_project();
+
+        assert_success(&cli(&["atomize", "--no-probe"], tmp.path()), "first run");
+        let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));
+
+        assert_success(&cli(&["atomize", "--no-probe"], tmp.path()), "second run");
+        let second: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));
+
+        assert_eq!(first, second, "atomize must be idempotent");
+    }


The doc comment claims the output is “byte-identical”, but the test compares parsed serde_json::Values (assert_eq!(first, second)), which only checks semantic JSON equality (formatting/key order differences won’t be detected). Either adjust the comment to match the assertion, or compare the raw file bytes/strings if true byte-level stability is required.

Copilot · 2026-02-25T09:36:00Z

+    /// Verification is idempotent: running verify twice with the same inputs
+    /// must produce byte-identical stubs.json.
+    #[test]
+    fn two_runs_produce_identical_output() {
+        let tmp = setup_project();
+        cli(&["atomize", "--no-probe"], tmp.path());
+
+        assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "first run");
+        let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));
+
+        assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "second run");
+        let second: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));
+
+        assert_eq!(first, second, "verify must be idempotent");
+    }


This doc comment says verify should produce “byte-identical stubs.json”, but the test compares deserialized JSON values, not the file’s raw bytes. Please either reword the comment to “JSON-identical” (semantic equality) or change the assertion to compare the file contents if byte-level stability is intended.

Copilot · 2026-02-25T09:36:01Z

+        let tmp = setup_project_with_config("config_auto_validate.json");
+        cli(&["atomize", "--no-probe"], tmp.path());
+        assert_success(&cli(&["specify", "--no-probe"], tmp.path()), "specify");


These tests run atomize --no-probe as a prerequisite but do not assert it succeeds. If atomize fails, the specify assertions may still pass using the pre-copied fixtures, which can mask regressions in cross-stage data flow. Please assert success on the atomize step (or remove it if it’s intentionally not part of the test’s setup).

Copilot · 2026-02-25T09:36:01Z

+    fn sets_verified_field_from_proofs() {
+        let tmp = setup_project();
+        cli(&["atomize", "--no-probe"], tmp.path());
+        assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "verify");


This verify test runs atomize --no-probe but doesn’t check its exit status. If atomize fails, the verify assertion may fail for the wrong reason or (depending on fixture state) still pass without actually exercising the intended pipeline. Please assert that the atomize step succeeds.

Copilot · 2026-02-25T09:36:01Z

+        let before = collect_md_checksums(&dir);
+        assert!(!before.is_empty());
+
+        cli(&["verify", "--no-probe"], tmp.path());
+


verify --no-probe is invoked here without asserting success. If verify fails, the checksum comparison may be skipped only because the test panics later, and the failure reason becomes unclear. Please assert the verify command’s exit status so the test failure points to the real regression.

Copilot · 2026-02-25T09:36:01Z

+        let rel_path = path
+            .strip_prefix(structure_root)
+            .unwrap_or(path)
+            .to_string_lossy()
+            .to_string();
+        match parse_frontmatter(path) {
+            Ok(fm) => {
+                stubs.insert(rel_path, serde_json::to_value(fm)?);


load_stubs_from_md_files uses to_string_lossy() on a Path to derive stub keys. On Windows this will produce src\\module.rs\\... keys, while other parts of the pipeline/fixtures typically use forward slashes (e.g., src/module.rs/...). This can make stubs.json non-portable across platforms and can diverge from probe-verus’s key format. Consider normalizing the relative path to a consistent separator (e.g., /) before inserting into the stubs map.

Suggested change

let rel_path = path

.strip_prefix(structure_root)

.unwrap_or(path)

.to_string_lossy()

.to_string();

match parse_frontmatter(path) {

Ok(fm) => {

stubs.insert(rel_path, serde_json::to_value(fm)?);

let rel_path = path.strip_prefix(structure_root).unwrap_or(path);

let rel_path = rel_path

.to_string_lossy()

.replace('\\', "/");

match parse_frontmatter(path) {

Ok(fm) => {

stubs.insert(rel_path.clone(), serde_json::to_value(fm)?);

Copilot · 2026-02-25T09:36:02Z

+    let src = std::path::PathBuf::from(&fixtures).join(fixture_file);
+    if let Some(dest) = output_path {
+        if let Some(parent) = std::path::Path::new(dest.as_str()).parent() {
+            let _ = fs::create_dir_all(parent);
+        }


The mock only writes an output file when -o/--output is present; if it’s missing, the process exits successfully without producing anything. That can mask regressions (the caller thinks probe-verus succeeded, but artifacts are missing). Consider exiting non-zero when no output path is provided so failures are immediate and easier to diagnose.

Copilot · 2026-02-25T09:36:02Z

+        let tmp = setup_project();
+        cli(&["atomize", "--no-probe"], tmp.path());
+        assert_success(
+            &cli(&["atomize", "--no-probe", "--check-only"], tmp.path()),
+            "atomize --check-only",


In this test, the initial atomize --no-probe invocation is not asserted to succeed. If that setup step fails, the subsequent --check-only assertion may become misleading (or fail for the wrong reason). Please assert the first command’s exit status (e.g., via assert_success) to ensure the test only fails on the behavior under test.

Copilot · 2026-02-25T09:36:02Z

+        let tmp = setup_project();
+        cli(&["atomize", "--no-probe"], tmp.path());
+
+        assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "first run");
+        let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));


This idempotency test runs atomize --no-probe as setup but doesn’t assert that it succeeded before comparing outputs. Please assert success on the atomize step to avoid false positives/unclear failures if atomize regresses.

Bare cli() calls used as test setup were silently discarding exit codes, which could mask failures and produce confusing downstream assertions. Wrap all setup calls with assert_success. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

astefano · 2026-02-27T07:32:11Z

Closing as superseded by #37

run clippy, fmt

ede45b5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

astefano force-pushed the la/add_regression_tests branch from 5826f84 to b26da38 Compare February 24, 2026 18:18

astefano force-pushed the la/add_regression_tests branch from b26da38 to f795185 Compare February 24, 2026 18:20

astefano and others added 3 commits February 24, 2026 19:40

clippy

fe38c83

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

upd comments

f6b9005

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

astefano marked this pull request as ready for review February 24, 2026 19:23

astefano requested review from Copilot, shaoweilin and silverboyir February 24, 2026 19:23

Copilot started reviewing on behalf of astefano February 24, 2026 19:23 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Comment thread tests/regression_test.rs Outdated

Comment thread tests/regression_test.rs Outdated

Comment thread tests/regression_test.rs Outdated

Comment thread tests/helpers/mock_probe_verus.rs Outdated

silverboyir force-pushed the master branch 5 times, most recently from 1f8ff36 to 7f94348 Compare February 24, 2026 21:52

astefano force-pushed the la/add_regression_tests branch from 1617a8d to 71d11bd Compare February 25, 2026 09:29

Copilot AI review requested due to automatic review settings February 25, 2026 09:29

Copilot started reviewing on behalf of astefano February 25, 2026 09:30 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

astefano closed this Feb 27, 2026

Conversation

astefano commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

astefano commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants