Bug fix of specs/implementations in benchmark by barabbs · Pull Request #19 · sunblaze-ucb/verina

barabbs · 2026-03-05T14:02:00Z

Bug fixes addressing errors listed in issue #16

In my tests, correct proofs were found for the following entries after fixing, so I am confident that the issues have been resolved:

verina_basic_40
verina_basic_104
verina_advanced_10
verina_advanced_12
verina_advanced_15

The entry verina_advanced_44 had no successful proof, but no system I tested was able to find a counterexample or proof of the negation after the fix, so the original issues should have been addressed at least.

I can provide more details for each of the fixes, if needed.

Copilot

Pull request overview

This PR fixes specification/implementation mismatches in 6 benchmark entries of the Verina dataset, as reported in issue #16. The fixes address incorrect algorithm logic, missing preconditions, and flawed postconditions in Lean 4 files.

Changes:

verina_basic_40: Fixed the secondSmallestAux condition to correctly handle the case where the current element equals the minimum (preventing it from incorrectly updating the second-minimum index) and the case where secondIdx points to a value ≤ the current minimum.
verina_basic_104: Added a unique_keys helper predicate and tightened the precondition from True to require unique keys in both input maps.
verina_advanced_10/12/15: Added a "completeness" precondition for findExponents (all prime factors of n must be in primes), rewrote the firstDuplicate postcondition to correctly capture the "first" occurrence semantics, and rewrote the increasingTriplet implementation using Option for the second tracker.
verina_advanced_44: Changed precondition to k > 1, fixed initialization sentinel value to 0, and updated the postcondition to only consider positive subarray sums.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`datasets/verina/verina_basic_40/task.lean`	Fixes the `else if` condition in `secondSmallestAux` to correctly handle equal-to-minimum elements and a misinitialized second index.
`datasets/verina/verina_basic_104/task.lean`	Introduces `unique_keys` and uses it as the precondition for `update_map`, replacing the trivially-`True` precondition.
`datasets/verina/verina_advanced_10/task.lean`	Adds a completeness condition to `findExponents_precond` ensuring all prime factors of `n` appear in the supplied `primes` list.
`datasets/verina/verina_advanced_12/task.lean`	Replaces the flawed index-based postcondition for `firstDuplicate` with a correct existential that captures the "first" duplicate semantics.
`datasets/verina/verina_advanced_15/task.lean`	Rewrites `increasingTriplet` using `Option Int` for the second-element tracker, fixing correctness in edge cases.
`datasets/verina/verina_advanced_44/task.lean`	Tightens precondition to `k > 1`, removes the fragile sentinel sentinel pattern, and filters the postcondition's subarray sums to only include positive values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T14:11:04Z

datasets/verina/verina_advanced_44/task.lean

 def maxSubarraySumDivisibleByK_precond (arr : Array Int) (k : Int) : Prop :=
  -- !benchmark @start precond
-  k > 0
+  k > 1


The precondition was changed from k > 0 to k > 1, but test.json still contains a test case with k = 1 (the 4th entry: arr = #[1, 2, 3, 4], k = 1, expected = 10). This test case now violates the updated precondition. Additionally, reject_inputs.json is currently empty but should include entries with k = 1 (and k ≤ 0) to be consistent with the new precondition. The test.json entry for k = 1 should either be removed or moved to reject_inputs.json.

Suggested change

k > 1

k > 0

prmbiy · 2026-03-06T07:55:21Z

Hey @barabbs, unrelated but for CodeGen task in verina_advanced_10, since the Nat.Prime is used in precond and not required in the generated code, it is never imported by the LLM. When the precond is added without the import during eval it fails, essentially making it unsolvable. Just wondering if you observed this too? Or am I missing something.

See #18 for more details

barabbs · 2026-03-06T14:34:12Z

Hi @prmbiy!

Unfortunately I have been dealing with ProofGen, so I didn't encounter such error. I'm also using a slightly different setup and haven't really looked at the evaluation scripts, but yeah it looks like you're right, the correct imports are not set for type=task (but don't quote me on that!) :)

LEAFERx · 2026-03-10T03:53:50Z

Hi @barabbs, thanks a lot for this bug report and fix! I believe some of them are already addressed in #17. I'll verify and merge the rest. Really appreciate your effort!

barabbs added 2 commits March 5, 2026 14:49

Bug fixes (basic_40, basic_104, advanced_10, advanced_12, advanced_15)

dc40978

Bug fixes (advanced_44)

b02dd75

Copilot AI review requested due to automatic review settings March 5, 2026 14:02

Copilot started reviewing on behalf of barabbs March 5, 2026 14:02 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix of specs/implementations in benchmark#19

Bug fix of specs/implementations in benchmark#19
barabbs wants to merge 2 commits intosunblaze-ucb:mainfrom
barabbs:main

barabbs commented Mar 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

prmbiy commented Mar 6, 2026 •

edited

Loading

Uh oh!

barabbs commented Mar 6, 2026

Uh oh!

LEAFERx commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

barabbs commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

prmbiy commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barabbs commented Mar 6, 2026

Uh oh!

LEAFERx commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

barabbs commented Mar 5, 2026 •

edited

Loading

prmbiy commented Mar 6, 2026 •

edited

Loading