-
Notifications
You must be signed in to change notification settings - Fork 52
Cleanup usage of bound variables #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We already replace all bound variables when visiting the body of a quantified formula, so there should not be any bound variable left when visiting any formula, and we can remove the code for this case. Each formula or part of a formula that is visited is well-scoped, such that all found variables are free variables. The user needs to track scopes for quantified formulas on his own, e.g., handle a stack of bound variables when visiting quantified formulas.
…ication. Most solvers already enforced a non-empty list of variables for quantification. This step unifies the precondition check and improves the error message.
The user should never come across a situation without variables, and internally, we avoid this, too.
Generate fresh variable names based on the quantified formula to avoid different results when visiting the same formula twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- all sub-formulas are consistent when visiting quantified formulas, e.g. the body of a quantified formula is provided in a bound-free way, e.g, all bound variables are replaced by free counterparts
Isn't this costly (or could be potentially costly at least)? It seems like it introduces a O(n) operation (with n being the DAG size of the formula) for an operation (visiting top-level node of formula) that one intuitively assumes to be O(1). I guess there are many cases of visitors that are used to only find out some cheap facts about the formula, but now these suddenly become expensive when passed a quantified formula. Even JavaSMT probably has plenty of these cases, such as for example conjunctionFinder
and disjunctionFinder
. IIRC using a visitor is in some (many?) cases the only way to get some information about a particular formula in JavaSMT, which would of course be suboptimal if it is not cheap even in cases where one does not need the expensive information.
The overall goal is an analysis of the blocking error in gitlab.com/sosy-lab/software/cpachecker/-/merge_requests/234 which might be caused by solver-specific and non-consistent quantifier handling (Z3 vs. other solver).
What blocking error is there? If there is such an error, it would be good if it is explained and discussed in that MR, but I don't see any related unresolved thread.
/** This map keeps track of the bound/free variables created when visiting quantified terms. */ | ||
// TODO should we use a WeakHashMap? | ||
// We do not cleanup in other places, too, and the number of quantified formulas is small. | ||
private final Map<IFormula, String> boundVariableNames = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to introduce a leak here? I know that for native solvers memory management is difficult and for some solvers all created formulas are stored forever. But for a pure Java solver I would (as a user of JavaSMT) certainly expect that GC works as usual.
I am also surprised how you can know that the number of quantified formulas is low. Doesn't this depend on what JavaSMT is used for?
I would also wonder whether such a global cache works correctly across all visitors? For example, what happens if there are multiple ongoing visitations, especially of the same formula?
And doesn't this change what formulas I get even if I just change what read operations the program performs? For example, if I create two (quantified) formulas f
and g
, then visit both of them and print the body
, I would guess it is desired that the result doesn't depend on whether f
or g
are visited first. Otherwise it could be inconvenient when one is for example debugging a problem and introducing additional debugging statements (with printfs and maybe some visitors to find out facts about a formula), but then the problematic formulas change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback on my explicitly noted "TODO" in the code. :-)
On memory leaks, I agree that Java users expect standard GC behavior. The current cache assumes manageable formula counts. Regarding quantified formulas, you’re right that their number depends on usage. Our “low” assumption is based on typical cases, but we’re open to adaptive caching strategies. Any specific workloads where this fails, e.g., in CPAchecker?
For visitor determinism, we aim to ensure consistent results regardless of visitation order. We could use a deterministic naming scheme (e.g., based on formula structure or formula-hash) to avoid issues with randomly ordered visitations or debugging. For now, I would like to stay with the minimal working solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good thought to clean this up! I like it!
I don't see any problems in the code besides the small redundancy. Is this a breaking change? People relying on the old implementation can not use it fully (even if ignoring the deprecation) as far as i can see.
src/org/sosy_lab/java_smt/solvers/cvc4/CVC4QuantifiedFormulaManager.java
Show resolved
Hide resolved
src/org/sosy_lab/java_smt/solvers/cvc5/CVC5QuantifiedFormulaManager.java
Show resolved
Hide resolved
src/org/sosy_lab/java_smt/solvers/princess/PrincessQuantifiedFormulaManager.java
Show resolved
Hide resolved
Short answer: yes. But since the solvers are not consistent when it comes to bound/free variables, we handle it this way, as it is easier for users when creating quantified formulas. Also, since we are usually dealing with native objects (besides the Java/Scala solvers) we generally need to request information about terms from the solver, which we only want to do when necessary. We could save more information about formulas directly in the formula-wrappers when creating them, but this would increase overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea overall. We should probably check to make sure that nobody was using the old interface, even if was broken for most solvers
@@ -393,6 +393,7 @@ public FormulaType<?> getFormulaType(Term formula) { | |||
return bitwuzlaSortToType(pType); | |||
} | |||
|
|||
@SuppressWarnings("deprecation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just update visit
for all the solvers to no longer use visitBoundVariable
? As far as I can see Yices2
might be the only solver where we don't substitute bound variables with free variables. All other cases for bound variables would have been dead code anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup can be done later.
I wanted to get a working and solver-independent consistent visitor implementation for quantifiers.
|
||
// Princess uses de-Bruijn indices, so we have index 0 here for the most outer quantified scope | ||
IVariable boundVariable = input.sort().boundVariable(0); | ||
String boundVariableName = getFreshVariableNameForBody(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also store the old variable names when a quantified formula is created. That way we don't have to create new variable names for the visitor. This might be more consistent with what the other solvers are doing, but it would probably add some overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I thought so as well.
However, then I found some problematic use cases:
- we also support parsing SMT-LIB directly, where we do not directly create quantified formulas.
- the solver can return quantified formulas in interpolants or unsat cores, or other formula-producing methods.
We can not track bound variables there and need to find a more general solution.
src/org/sosy_lab/java_smt/solvers/boolector/BoolectorFormulaCreator.java
Outdated
Show resolved
Hide resolved
It is clear that one cannot have quantifier handling fast and easy at the same time in this case. But is it really necessary that everyone pays the performance price whenever they use a visitor on a formula that happens to be quantified, even if they do not need it anyway? I see as the great advantage of JavaSMT that it allows one to use a lot of features of SMT solvers in a solver-independent manner without a disadvantage. This way I don't have to worry about the question whether I should use JavaSMT or access the solver directly. But if I have to pay a price when using JavaSMT (e.g., in reduced performance compared to direct access) then the question of using JavaSMT becomes a trade off and is no longer clear cut. So I would recommend to invest more effort here in a solution.
Not sure how this relates to the current discussion? Is there a trade off here between storing more information and the performance? |
You’re right that the current approach introduces an O(n) cost for visiting quantified formulas in Princess, which can be unexpected for operations assumed to be O(1), such as accessing data directly. Our implementation currently prioritizes consistency of use for creating and visiting quantified formulas over performance in some cases. However, there are even more operations in Princess that are not in constant time, even if expected. E.g., Princess applies a shift to all bound variables from nested quantifications when a new free variable is made bound by quantification. Building a plain If the user shows with some more details (like measurements or benchmarks) that the visiting of quantified formulas is an issue), then we can report this to the Princess developers and try to find a better solution. For now, I prefer a consistent behavious (not performance) of the JavaSMT API. |
Boolector does not support visiting formulas anyway.
Actually yes, this change breaks exactly those cases where bound variables were visited before. That happend to be Z3 and Yices2 before commit e700fb7, e.g., we already broke the behaviour at that old commit. |
Checking once is good, checking twice is does no harm.
The usage of bound variables and quantifier usage in the API was never consistent:
This PR unifies the beaviour and marks the corresponding methods as deprecated:
The overall goal is an analysis of the blocking error in https://gitlab.com/sosy-lab/software/cpachecker/-/merge_requests/234 which might be caused by solver-specific and non-consistent quantifier handling (Z3 vs. other solver).