Skip to content

Add regression test suite and CI workflow#25

Closed
astefano wants to merge 7 commits intomasterfrom
la/add_regression_tests
Closed

Add regression test suite and CI workflow#25
astefano wants to merge 7 commits intomasterfrom
la/add_regression_tests

Conversation

@astefano
Copy link
Copy Markdown
Collaborator

Introduce 16 regression tests covering the full verilib-cli pipeline (create -> atomize -> specify -> verify) with a Rust mock probe-verus binary for end-to-end testing without external dependencies.

Test categories:

  • Full pipeline with mock probe-verus
  • Cross-stage data flow (atomize -> specify -> verify)
  • Quantitative regression (enrichment coverage, spec-text, verification)
  • "Structure for Verification Projects" requirement validation (code-name precedence, .md immutability, cert timestamps, dependency correctness, required fields)
  • Idempotency (atomize and verify produce stable output)

All assertions check exit codes and artifact contents, never stdout/stderr message formats, for resilience to implementation changes.

Also adds a GitHub Actions CI workflow that runs fmt, clippy, and tests on pull requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@astefano astefano force-pushed the la/add_regression_tests branch from 5826f84 to b26da38 Compare February 24, 2026 18:18
Introduce 16 regression tests covering the full verilib-cli pipeline
(create -> atomize -> specify -> verify) with a Rust mock probe-verus
binary for end-to-end testing without external dependencies.

Test categories:
- Full pipeline with mock probe-verus
- Cross-stage data flow (atomize -> specify -> verify)
- Quantitative regression (enrichment coverage, spec-text, verification)
- "Structure verif proj" doc requirement validation (code-name precedence, .md immutability,
  cert timestamps, dependency correctness, required fields)
- Idempotency (atomize and verify produce stable output)

All assertions check exit codes and artifact contents, never
stdout/stderr message formats, for resilience to implementation changes.

Also adds a GitHub Actions CI workflow that runs fmt, clippy, and tests
on pull requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@astefano astefano force-pushed the la/add_regression_tests branch from b26da38 to f795185 Compare February 24, 2026 18:20
astefano and others added 3 commits February 24, 2026 19:40
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When --no-probe is set, parse .md frontmatter directly using the
existing structure module instead of calling probe-verus stubify.
This makes atomize --no-probe fully self-contained and fixes CI
failures where probe-verus is not available.

Co-authored-by: Cursor <cursoragent@cursor.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@astefano astefano marked this pull request as ready for review February 24, 2026 19:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a comprehensive regression test suite for the verilib-cli pipeline along with a CI workflow. The PR adds 16 regression tests organized into 5 categories: full pipeline tests with a mock probe-verus binary, cross-stage data flow tests, quantitative regression tests, requirement validation tests (based on the "Structure for Verification Projects" document), and idempotency tests.

Changes:

  • Added 16 regression tests in tests/regression_test.rs covering the full create → atomize → specify → verify pipeline
  • Introduced a Rust-based mock probe-verus binary for end-to-end testing without external dependencies
  • Added test fixtures and configuration files to support the test suite
  • Added GitHub Actions CI workflow that runs fmt, clippy, and tests on pull requests
  • Applied extensive formatting improvements across the codebase (clippy fixes, consistent indentation, line breaks)
  • Enhanced atomize command with load_stubs_from_md_files function to support --no-probe mode by directly parsing .md frontmatter

Reviewed changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/regression_test.rs New comprehensive test suite with 16 tests across 5 categories covering full pipeline, data flow, quantitative metrics, requirements validation, and idempotency
tests/helpers/mock_probe_verus.rs Mock probe-verus binary that copies fixture files based on subcommand for testing without external dependencies
tests/fixtures/*.{json,csv} Test fixtures for atoms, specs, proofs, stubs, and tracked functions
tests/structure_commands_test.rs Formatting improvements to existing tests
src/commands/atomize.rs Added load_stubs_from_md_files function to support --no-probe mode
src/structure/*.rs Formatting improvements (indentation, line breaks)
src/storage/*.rs Formatting improvements and module reordering
src/download/*.rs Formatting improvements
src/commands/*.rs Extensive formatting improvements across all command files
src/main.rs, src/cli.rs, src/constants.rs Minor formatting improvements
.github/workflows/ci.yml New CI workflow for fmt, clippy, and test checks on PRs
Cargo.toml Added mock-probe-verus binary configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/regression_test.rs Outdated
Comment thread tests/regression_test.rs Outdated
Comment thread tests/regression_test.rs Outdated
Comment thread tests/helpers/mock_probe_verus.rs Outdated
@silverboyir silverboyir force-pushed the master branch 5 times, most recently from 1f8ff36 to 7f94348 Compare February 24, 2026 21:52
Merge regression_test.rs and structure_commands_test.rs into a single
file with 27 behavioral tests (down from 40), organized into 7 modules:
atomize, atomize_atoms_only, specify, verify, error_handling, create,
and pipeline.

Removed 13 tests: stdout/stderr message assertions (not behavioral),
vacuous tests with if-guards that skip assertions, and exact duplicates
between the two files. Every remaining test has a doc comment explaining
the behavioral property it verifies.

Also fix atomize --no-probe to parse .md frontmatter directly instead
of calling probe-verus stubify, so the flag works without probe-verus
installed.

Co-authored-by: Cursor <cursoragent@cursor.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@astefano astefano force-pushed the la/add_regression_tests branch from 1617a8d to 71d11bd Compare February 25, 2026 09:29
Copilot AI review requested due to automatic review settings February 25, 2026 09:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/regression_test.rs
Comment on lines +292 to +305
/// Enrichment is idempotent: running atomize twice with the same inputs
/// must produce byte-identical stubs.json.
#[test]
fn two_runs_produce_identical_output() {
let tmp = setup_project();

assert_success(&cli(&["atomize", "--no-probe"], tmp.path()), "first run");
let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));

assert_success(&cli(&["atomize", "--no-probe"], tmp.path()), "second run");
let second: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));

assert_eq!(first, second, "atomize must be idempotent");
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment claims the output is “byte-identical”, but the test compares parsed serde_json::Values (assert_eq!(first, second)), which only checks semantic JSON equality (formatting/key order differences won’t be detected). Either adjust the comment to match the assertion, or compare the raw file bytes/strings if true byte-level stability is required.

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +591 to +605
/// Verification is idempotent: running verify twice with the same inputs
/// must produce byte-identical stubs.json.
#[test]
fn two_runs_produce_identical_output() {
let tmp = setup_project();
cli(&["atomize", "--no-probe"], tmp.path());

assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "first run");
let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));

assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "second run");
let second: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));

assert_eq!(first, second, "verify must be idempotent");
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc comment says verify should produce “byte-identical stubs.json”, but the test compares deserialized JSON values, not the file’s raw bytes. Please either reword the comment to “JSON-identical” (semantic equality) or change the assertion to compare the file contents if byte-level stability is intended.

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +418 to +420
let tmp = setup_project_with_config("config_auto_validate.json");
cli(&["atomize", "--no-probe"], tmp.path());
assert_success(&cli(&["specify", "--no-probe"], tmp.path()), "specify");
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests run atomize --no-probe as a prerequisite but do not assert it succeeds. If atomize fails, the specify assertions may still pass using the pre-copied fixtures, which can mask regressions in cross-stage data flow. Please assert success on the atomize step (or remove it if it’s intentionally not part of the test’s setup).

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +528 to +531
fn sets_verified_field_from_proofs() {
let tmp = setup_project();
cli(&["atomize", "--no-probe"], tmp.path());
assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "verify");
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verify test runs atomize --no-probe but doesn’t check its exit status. If atomize fails, the verify assertion may fail for the wrong reason or (depending on fixture state) still pass without actually exercising the intended pipeline. Please assert that the atomize step succeeds.

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +549 to +553
let before = collect_md_checksums(&dir);
assert!(!before.is_empty());

cli(&["verify", "--no-probe"], tmp.path());

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify --no-probe is invoked here without asserting success. If verify fails, the checksum comparison may be skipped only because the test panics later, and the failure reason becomes unclear. Please assert the verify command’s exit status so the test failure points to the real regression.

Copilot uses AI. Check for mistakes.
Comment thread src/commands/atomize.rs
Comment on lines +287 to +294
let rel_path = path
.strip_prefix(structure_root)
.unwrap_or(path)
.to_string_lossy()
.to_string();
match parse_frontmatter(path) {
Ok(fm) => {
stubs.insert(rel_path, serde_json::to_value(fm)?);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_stubs_from_md_files uses to_string_lossy() on a Path to derive stub keys. On Windows this will produce src\\module.rs\\... keys, while other parts of the pipeline/fixtures typically use forward slashes (e.g., src/module.rs/...). This can make stubs.json non-portable across platforms and can diverge from probe-verus’s key format. Consider normalizing the relative path to a consistent separator (e.g., /) before inserting into the stubs map.

Suggested change
let rel_path = path
.strip_prefix(structure_root)
.unwrap_or(path)
.to_string_lossy()
.to_string();
match parse_frontmatter(path) {
Ok(fm) => {
stubs.insert(rel_path, serde_json::to_value(fm)?);
let rel_path = path.strip_prefix(structure_root).unwrap_or(path);
let rel_path = rel_path
.to_string_lossy()
.replace('\\', "/");
match parse_frontmatter(path) {
Ok(fm) => {
stubs.insert(rel_path.clone(), serde_json::to_value(fm)?);

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
let src = std::path::PathBuf::from(&fixtures).join(fixture_file);
if let Some(dest) = output_path {
if let Some(parent) = std::path::Path::new(dest.as_str()).parent() {
let _ = fs::create_dir_all(parent);
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock only writes an output file when -o/--output is present; if it’s missing, the process exits successfully without producing anything. That can mask regressions (the caller thinks probe-verus succeeded, but artifacts are missing). Consider exiting non-zero when no output path is provided so failures are immediate and easier to diagnose.

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +236 to +240
let tmp = setup_project();
cli(&["atomize", "--no-probe"], tmp.path());
assert_success(
&cli(&["atomize", "--no-probe", "--check-only"], tmp.path()),
"atomize --check-only",
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, the initial atomize --no-probe invocation is not asserted to succeed. If that setup step fails, the subsequent --check-only assertion may become misleading (or fail for the wrong reason). Please assert the first command’s exit status (e.g., via assert_success) to ensure the test only fails on the behavior under test.

Copilot uses AI. Check for mistakes.
Comment thread tests/regression_test.rs
Comment on lines +595 to +599
let tmp = setup_project();
cli(&["atomize", "--no-probe"], tmp.path());

assert_success(&cli(&["verify", "--no-probe"], tmp.path()), "first run");
let first: serde_json::Value = read_json(&tmp.path().join(".verilib/stubs.json"));
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idempotency test runs atomize --no-probe as setup but doesn’t assert that it succeeded before comparing outputs. Please assert success on the atomize step to avoid false positives/unclear failures if atomize regresses.

Copilot uses AI. Check for mistakes.
Bare cli() calls used as test setup were silently discarding exit
codes, which could mask failures and produce confusing downstream
assertions. Wrap all setup calls with assert_success.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@astefano
Copy link
Copy Markdown
Collaborator Author

Closing as superseded by #37

@astefano astefano closed this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants