Cleanup usage of bound variables #491

kfriedberger · 2025-07-13T10:05:03Z

The usage of bound variables and quantifier usage in the API was never consistent:

some solvers require bound variables in quantified formulas, some solvers just use regular variables.
some solvers visit bound variables in formulas, some solvers just handle them as regular variables.
some solvers can not use an empty list of variables in quantification, Princess does allow this.
This PR unifies the beaviour and marks the corresponding methods as deprecated:
all solvers provide only free variables toward the API/user
all sub-formulas are consistent when visiting quantified formulas, e.g. the body of a quantified formula is provided in a bound-free way, e.g, all bound variables are replaced by free counterparts (for Princess, we use a new temporary unique name, other solvers keep the original name of the variable).

The overall goal is an analysis of the blocking error in https://gitlab.com/sosy-lab/software/cpachecker/-/merge_requests/234 which might be caused by solver-specific and non-consistent quantifier handling (Z3 vs. other solver).

We already replace all bound variables when visiting the body of a quantified formula, so there should not be any bound variable left when visiting any formula, and we can remove the code for this case. Each formula or part of a formula that is visited is well-scoped, such that all found variables are free variables. The user needs to track scopes for quantified formulas on his own, e.g., handle a stack of bound variables when visiting quantified formulas.

…ication. Most solvers already enforced a non-empty list of variables for quantification. This step unifies the precondition check and improves the error message.

The user should never come across a situation without variables, and internally, we avoid this, too.

Generate fresh variable names based on the quantified formula to avoid different results when visiting the same formula twice.

PhilippWendler

all sub-formulas are consistent when visiting quantified formulas, e.g. the body of a quantified formula is provided in a bound-free way, e.g, all bound variables are replaced by free counterparts

Isn't this costly (or could be potentially costly at least)? It seems like it introduces a O(n) operation (with n being the DAG size of the formula) for an operation (visiting top-level node of formula) that one intuitively assumes to be O(1). I guess there are many cases of visitors that are used to only find out some cheap facts about the formula, but now these suddenly become expensive when passed a quantified formula. Even JavaSMT probably has plenty of these cases, such as for example conjunctionFinder and disjunctionFinder. IIRC using a visitor is in some (many?) cases the only way to get some information about a particular formula in JavaSMT, which would of course be suboptimal if it is not cheap even in cases where one does not need the expensive information.

The overall goal is an analysis of the blocking error in gitlab.com/sosy-lab/software/cpachecker/-/merge_requests/234 which might be caused by solver-specific and non-consistent quantifier handling (Z3 vs. other solver).

What blocking error is there? If there is such an error, it would be good if it is explained and discussed in that MR, but I don't see any related unresolved thread.

PhilippWendler · 2025-07-14T08:48:26Z

src/org/sosy_lab/java_smt/solvers/princess/PrincessFormulaCreator.java

+  /** This map keeps track of the bound/free variables created when visiting quantified terms. */
+  // TODO should we use a WeakHashMap?
+  //  We do not cleanup in other places, too, and the number of quantified formulas is small.
+  private final Map<IFormula, String> boundVariableNames = new LinkedHashMap<>();


Do you really want to introduce a leak here? I know that for native solvers memory management is difficult and for some solvers all created formulas are stored forever. But for a pure Java solver I would (as a user of JavaSMT) certainly expect that GC works as usual.

I am also surprised how you can know that the number of quantified formulas is low. Doesn't this depend on what JavaSMT is used for?

I would also wonder whether such a global cache works correctly across all visitors? For example, what happens if there are multiple ongoing visitations, especially of the same formula?
And doesn't this change what formulas I get even if I just change what read operations the program performs? For example, if I create two (quantified) formulas f and g, then visit both of them and print the body, I would guess it is desired that the result doesn't depend on whether f or g are visited first. Otherwise it could be inconvenient when one is for example debugging a problem and introducing additional debugging statements (with printfs and maybe some visitors to find out facts about a formula), but then the problematic formulas change.

Thank you for your feedback on my explicitly noted "TODO" in the code. :-)

On memory leaks, I agree that Java users expect standard GC behavior. The current cache assumes manageable formula counts. Regarding quantified formulas, you’re right that their number depends on usage. Our “low” assumption is based on typical cases, but we’re open to adaptive caching strategies. Any specific workloads where this fails, e.g., in CPAchecker?

For visitor determinism, we aim to ensure consistent results regardless of visitation order. We could use a deterministic naming scheme (e.g., based on formula structure or formula-hash) to avoid issues with randomly ordered visitations or debugging. For now, I would like to stay with the minimal working solution.

baierd

Very good thought to clean this up! I like it!
I don't see any problems in the code besides the small redundancy. Is this a breaking change? People relying on the old implementation can not use it fully (even if ignoring the deprecation) as far as i can see.

src/org/sosy_lab/java_smt/solvers/cvc4/CVC4QuantifiedFormulaManager.java

src/org/sosy_lab/java_smt/solvers/cvc5/CVC5QuantifiedFormulaManager.java

src/org/sosy_lab/java_smt/solvers/princess/PrincessQuantifiedFormulaManager.java

src/org/sosy_lab/java_smt/solvers/z3/Z3QuantifiedFormulaManager.java

baierd · 2025-07-14T10:13:29Z

all sub-formulas are consistent when visiting quantified formulas, e.g. the body of a quantified formula is provided in a bound-free way, e.g, all bound variables are replaced by free counterparts

Isn't this costly (or could be potentially costly at least)? It seems like it introduces a O(n) operation (with n being the DAG size of the formula) for an operation (visiting top-level node of formula) that one intuitively assumes to be O(1). I guess there are many cases of visitors that are used to only find out some cheap facts about the formula, but now these suddenly become expensive when passed a quantified formula. Even JavaSMT probably has plenty of these cases, such as for example conjunctionFinder and disjunctionFinder. IIRC using a visitor is in some (many?) cases the only way to get some information about a particular formula in JavaSMT, which would of course be suboptimal if it is not cheap even in cases where one does not need the expensive information.

Short answer: yes. But since the solvers are not consistent when it comes to bound/free variables, we handle it this way, as it is easier for users when creating quantified formulas.

Also, since we are usually dealing with native objects (besides the Java/Scala solvers) we generally need to request information about terms from the solver, which we only want to do when necessary. We could save more information about formulas directly in the formula-wrappers when creating them, but this would increase overhead.

daniel-raffler

Seems like a good idea overall. We should probably check to make sure that nobody was using the old interface, even if was broken for most solvers

daniel-raffler · 2025-07-14T10:43:19Z

src/org/sosy_lab/java_smt/solvers/bitwuzla/BitwuzlaFormulaCreator.java

@@ -393,6 +393,7 @@ public FormulaType<?> getFormulaType(Term formula) {
    return bitwuzlaSortToType(pType);
  }

+  @SuppressWarnings("deprecation")


Can't we just update visit for all the solvers to no longer use visitBoundVariable? As far as I can see Yices2 might be the only solver where we don't substitute bound variables with free variables. All other cases for bound variables would have been dead code anyway.

Cleanup can be done later.
I wanted to get a working and solver-independent consistent visitor implementation for quantifiers.

daniel-raffler · 2025-07-14T11:04:53Z

src/org/sosy_lab/java_smt/solvers/princess/PrincessFormulaCreator.java

+
+    // Princess uses de-Bruijn indices, so we have index 0 here for the most outer quantified scope
+    IVariable boundVariable = input.sort().boundVariable(0);
+    String boundVariableName = getFreshVariableNameForBody(body);


We could also store the old variable names when a quantified formula is created. That way we don't have to create new variable names for the visitor. This might be more consistent with what the other solvers are doing, but it would probably add some overhead.

Good point. I thought so as well.
However, then I found some problematic use cases:

we also support parsing SMT-LIB directly, where we do not directly create quantified formulas.

the solver can return quantified formulas in interpolants or unsat cores, or other formula-producing methods.

We can not track bound variables there and need to find a more general solution.

src/org/sosy_lab/java_smt/solvers/boolector/BoolectorFormulaCreator.java

PhilippWendler · 2025-07-14T15:31:38Z

all sub-formulas are consistent when visiting quantified formulas, e.g. the body of a quantified formula is provided in a bound-free way, e.g, all bound variables are replaced by free counterparts

Isn't this costly (or could be potentially costly at least)? It seems like it introduces a O(n) operation (with n being the DAG size of the formula) for an operation (visiting top-level node of formula) that one intuitively assumes to be O(1). I guess there are many cases of visitors that are used to only find out some cheap facts about the formula, but now these suddenly become expensive when passed a quantified formula. Even JavaSMT probably has plenty of these cases, such as for example conjunctionFinder and disjunctionFinder. IIRC using a visitor is in some (many?) cases the only way to get some information about a particular formula in JavaSMT, which would of course be suboptimal if it is not cheap even in cases where one does not need the expensive information.

Short answer: yes. But since the solvers are not consistent when it comes to bound/free variables, we handle it this way, as it is easier for users when creating quantified formulas.

It is clear that one cannot have quantifier handling fast and easy at the same time in this case. But is it really necessary that everyone pays the performance price whenever they use a visitor on a formula that happens to be quantified, even if they do not need it anyway? I see as the great advantage of JavaSMT that it allows one to use a lot of features of SMT solvers in a solver-independent manner without a disadvantage. This way I don't have to worry about the question whether I should use JavaSMT or access the solver directly. But if I have to pay a price when using JavaSMT (e.g., in reduced performance compared to direct access) then the question of using JavaSMT becomes a trade off and is no longer clear cut. So I would recommend to invest more effort here in a solution.

Also, since we are usually dealing with native objects (besides the Java/Scala solvers) we generally need to request information about terms from the solver, which we only want to do when necessary. We could save more information about formulas directly in the formula-wrappers when creating them, but this would increase overhead.

Not sure how this relates to the current discussion? Is there a trade off here between storing more information and the performance?

kfriedberger · 2025-07-18T19:40:21Z

You’re right that the current approach introduces an O(n) cost for visiting quantified formulas in Princess, which can be unexpected for operations assumed to be O(1), such as accessing data directly. Our implementation currently prioritizes consistency of use for creating and visiting quantified formulas over performance in some cases.

However, there are even more operations in Princess that are not in constant time, even if expected. E.g., Princess applies a shift to all bound variables from nested quantifications when a new free variable is made bound by quantification. Building a plain EXISTS(x, body) requires O(n) where n is formula size.

If the user shows with some more details (like measurements or benchmarks) that the visiting of quantified formulas is an issue), then we can report this to the Princess developers and try to find a better solution. For now, I prefer a consistent behavious (not performance) of the JavaSMT API.

Boolector does not support visiting formulas anyway.

kfriedberger · 2025-07-18T20:02:21Z

Very good thought to clean this up! I like it! I don't see any problems in the code besides the small redundancy. Is this a breaking change? People relying on the old implementation can not use it fully (even if ignoring the deprecation) as far as i can see.

Actually yes, this change breaks exactly those cases where bound variables were visited before. That happend to be Z3 and Yices2 before commit e700fb7, e.g., we already broke the behaviour at that old commit.

Checking once is good, checking twice is does no harm.

kfriedberger added 8 commits July 12, 2025 12:52

Improve error message when parsing miss-aligned brackets.

5a87fdc

QuantifiedFormulaManager: disallow an empty variable list for quantif…

195258d

…ication. Most solvers already enforced a non-empty list of variables for quantification. This step unifies the precondition check and improves the error message.

avoid compiler warning about unused parameters in deprecated method.

4f4bad4

Princess: improve visiting quantified formulas.

b2469a8

The user should never come across a situation without variables, and internally, we avoid this, too.

Princess: improve visiting quantified formulas.

5eafd01

Generate fresh variable names based on the quantified formula to avoid different results when visiting the same formula twice.

Princess: fix usage of ambiguous methods for Scala types.

f894510

Princess: fix usage of static methods from Scala.

a6bdbf9

kfriedberger requested review from daniel-raffler and baierd July 13, 2025 10:12

baierd assigned kfriedberger Jul 14, 2025

PhilippWendler reviewed Jul 14, 2025

View reviewed changes

baierd previously requested changes Jul 14, 2025

View reviewed changes

daniel-raffler previously approved these changes Jul 14, 2025

View reviewed changes

Boolector: remove deprecated and unused method for formula visitor.

2b21b02

Boolector does not support visiting formulas anyway.

kfriedberger dismissed daniel-raffler’s stale review via 2b21b02 July 18, 2025 19:58

kfriedberger requested a review from baierd July 18, 2025 20:03

This was referenced Jul 21, 2025

CVC5: Add parser support #474

Open

Draft: Prepare Release 6.0.0 #495

Draft

kfriedberger merged commit 2bf59c8 into master Jul 26, 2025
2 of 3 checks passed

kfriedberger deleted the feat/cleanup-bound-vars branch July 26, 2025 17:38

Cleanup usage of bound variables #491

Cleanup usage of bound variables #491

Uh oh!

Conversation

kfriedberger commented Jul 13, 2025

Uh oh!

PhilippWendler left a comment

Choose a reason for hiding this comment

Uh oh!

PhilippWendler Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

kfriedberger Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

baierd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baierd commented Jul 14, 2025

Uh oh!

daniel-raffler left a comment

Choose a reason for hiding this comment

Uh oh!

daniel-raffler Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

kfriedberger Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

daniel-raffler Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

kfriedberger Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PhilippWendler commented Jul 14, 2025

Uh oh!

kfriedberger commented Jul 18, 2025

Uh oh!

kfriedberger commented Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!