Bug fix of specs/implementations in benchmark#19
Bug fix of specs/implementations in benchmark#19barabbs wants to merge 2 commits intosunblaze-ucb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes specification/implementation mismatches in 6 benchmark entries of the Verina dataset, as reported in issue #16. The fixes address incorrect algorithm logic, missing preconditions, and flawed postconditions in Lean 4 files.
Changes:
- verina_basic_40: Fixed the
secondSmallestAuxcondition to correctly handle the case where the current element equals the minimum (preventing it from incorrectly updating the second-minimum index) and the case wheresecondIdxpoints to a value ≤ the current minimum. - verina_basic_104: Added a
unique_keyshelper predicate and tightened the precondition fromTrueto require unique keys in both input maps. - verina_advanced_10/12/15: Added a "completeness" precondition for
findExponents(all prime factors ofnmust be inprimes), rewrote thefirstDuplicatepostcondition to correctly capture the "first" occurrence semantics, and rewrote theincreasingTripletimplementation usingOptionfor the second tracker. - verina_advanced_44: Changed precondition to
k > 1, fixed initialization sentinel value to0, and updated the postcondition to only consider positive subarray sums.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
datasets/verina/verina_basic_40/task.lean |
Fixes the else if condition in secondSmallestAux to correctly handle equal-to-minimum elements and a misinitialized second index. |
datasets/verina/verina_basic_104/task.lean |
Introduces unique_keys and uses it as the precondition for update_map, replacing the trivially-True precondition. |
datasets/verina/verina_advanced_10/task.lean |
Adds a completeness condition to findExponents_precond ensuring all prime factors of n appear in the supplied primes list. |
datasets/verina/verina_advanced_12/task.lean |
Replaces the flawed index-based postcondition for firstDuplicate with a correct existential that captures the "first" duplicate semantics. |
datasets/verina/verina_advanced_15/task.lean |
Rewrites increasingTriplet using Option Int for the second-element tracker, fixing correctness in edge cases. |
datasets/verina/verina_advanced_44/task.lean |
Tightens precondition to k > 1, removes the fragile sentinel sentinel pattern, and filters the postcondition's subarray sums to only include positive values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def maxSubarraySumDivisibleByK_precond (arr : Array Int) (k : Int) : Prop := | ||
| -- !benchmark @start precond | ||
| k > 0 | ||
| k > 1 |
There was a problem hiding this comment.
The precondition was changed from k > 0 to k > 1, but test.json still contains a test case with k = 1 (the 4th entry: arr = #[1, 2, 3, 4], k = 1, expected = 10). This test case now violates the updated precondition. Additionally, reject_inputs.json is currently empty but should include entries with k = 1 (and k ≤ 0) to be consistent with the new precondition. The test.json entry for k = 1 should either be removed or moved to reject_inputs.json.
| k > 1 | |
| k > 0 |
|
Hey @barabbs, unrelated but for CodeGen task in See #18 for more details |
|
Hi @prmbiy! Unfortunately I have been dealing with ProofGen, so I didn't encounter such error. I'm also using a slightly different setup and haven't really looked at the evaluation scripts, but yeah it looks like you're right, the correct imports are not set for |
Bug fixes addressing errors listed in issue #16
In my tests, correct proofs were found for the following entries after fixing, so I am confident that the issues have been resolved:
The entry verina_advanced_44 had no successful proof, but no system I tested was able to find a counterexample or proof of the negation after the fix, so the original issues should have been addressed at least.
I can provide more details for each of the fixes, if needed.