Skip to content

Bug fix of specs/implementations in benchmark#19

Open
barabbs wants to merge 2 commits intosunblaze-ucb:mainfrom
barabbs:main
Open

Bug fix of specs/implementations in benchmark#19
barabbs wants to merge 2 commits intosunblaze-ucb:mainfrom
barabbs:main

Conversation

@barabbs
Copy link

@barabbs barabbs commented Mar 5, 2026

Bug fixes addressing errors listed in issue #16

In my tests, correct proofs were found for the following entries after fixing, so I am confident that the issues have been resolved:

  • verina_basic_40
  • verina_basic_104
  • verina_advanced_10
  • verina_advanced_12
  • verina_advanced_15

The entry verina_advanced_44 had no successful proof, but no system I tested was able to find a counterexample or proof of the negation after the fix, so the original issues should have been addressed at least.

I can provide more details for each of the fixes, if needed.

Copilot AI review requested due to automatic review settings March 5, 2026 14:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes specification/implementation mismatches in 6 benchmark entries of the Verina dataset, as reported in issue #16. The fixes address incorrect algorithm logic, missing preconditions, and flawed postconditions in Lean 4 files.

Changes:

  • verina_basic_40: Fixed the secondSmallestAux condition to correctly handle the case where the current element equals the minimum (preventing it from incorrectly updating the second-minimum index) and the case where secondIdx points to a value ≤ the current minimum.
  • verina_basic_104: Added a unique_keys helper predicate and tightened the precondition from True to require unique keys in both input maps.
  • verina_advanced_10/12/15: Added a "completeness" precondition for findExponents (all prime factors of n must be in primes), rewrote the firstDuplicate postcondition to correctly capture the "first" occurrence semantics, and rewrote the increasingTriplet implementation using Option for the second tracker.
  • verina_advanced_44: Changed precondition to k > 1, fixed initialization sentinel value to 0, and updated the postcondition to only consider positive subarray sums.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
datasets/verina/verina_basic_40/task.lean Fixes the else if condition in secondSmallestAux to correctly handle equal-to-minimum elements and a misinitialized second index.
datasets/verina/verina_basic_104/task.lean Introduces unique_keys and uses it as the precondition for update_map, replacing the trivially-True precondition.
datasets/verina/verina_advanced_10/task.lean Adds a completeness condition to findExponents_precond ensuring all prime factors of n appear in the supplied primes list.
datasets/verina/verina_advanced_12/task.lean Replaces the flawed index-based postcondition for firstDuplicate with a correct existential that captures the "first" duplicate semantics.
datasets/verina/verina_advanced_15/task.lean Rewrites increasingTriplet using Option Int for the second-element tracker, fixing correctness in edge cases.
datasets/verina/verina_advanced_44/task.lean Tightens precondition to k > 1, removes the fragile sentinel sentinel pattern, and filters the postcondition's subarray sums to only include positive values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def maxSubarraySumDivisibleByK_precond (arr : Array Int) (k : Int) : Prop :=
-- !benchmark @start precond
k > 0
k > 1
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precondition was changed from k > 0 to k > 1, but test.json still contains a test case with k = 1 (the 4th entry: arr = #[1, 2, 3, 4], k = 1, expected = 10). This test case now violates the updated precondition. Additionally, reject_inputs.json is currently empty but should include entries with k = 1 (and k ≤ 0) to be consistent with the new precondition. The test.json entry for k = 1 should either be removed or moved to reject_inputs.json.

Suggested change
k > 1
k > 0

Copilot uses AI. Check for mistakes.
@prmbiy
Copy link

prmbiy commented Mar 6, 2026

Hey @barabbs, unrelated but for CodeGen task in verina_advanced_10, since the Nat.Prime is used in precond and not required in the generated code, it is never imported by the LLM. When the precond is added without the import during eval it fails, essentially making it unsolvable. Just wondering if you observed this too? Or am I missing something.

See #18 for more details

@barabbs
Copy link
Author

barabbs commented Mar 6, 2026

Hi @prmbiy!

Unfortunately I have been dealing with ProofGen, so I didn't encounter such error. I'm also using a slightly different setup and haven't really looked at the evaluation scripts, but yeah it looks like you're right, the correct imports are not set for type=task (but don't quote me on that!) :)

@LEAFERx
Copy link
Member

LEAFERx commented Mar 10, 2026

Hi @barabbs, thanks a lot for this bug report and fix! I believe some of them are already addressed in #17. I'll verify and merge the rest. Really appreciate your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants