-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go: Fix bad join order using late inlined predicate #17437
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://codeql.github.com/docs/ql-language-reference/annotations/#pragma-inline-late, the bindingset covering all parameters means that all parameters should be bound (constrained by evaluating something else) before the helper is scheduled. Therefore it will usually go last (though if there was another predicate referring to all vars, say filter(g, outp, p, fd, ret, outcome)
, it would be ok for the filter and the helper to go in either order).
If it would be a better idea for the helper to go before guardChecks
, then it should be given a looser binding set -- so specifically excluding their common variables g
and outcome
, so that the join orderer could go for "I shall constrain g
and outcome
by running guardingFunctionHelper
, and only then run guardChecks
", which would presently be an illegal order. Of course it would still have the choice of the reverse order, since guardChecks
has no bindingset constraint, so that might be a hint that a different optimisation strategy, such as giving guardChecks
itself some binding-order hints.
// (and we have (this has outcome ==> arg is checked)) | ||
// but p.checkOn(ret, outcome, this) gives us (ret has outcome ==> p holds of this), | ||
// so we need to swap outcome and (specifically boolean) p: | ||
DataFlow::booleanProperty(pragma[only_bind_into](outcome)).checkOn(ret, p.asBoolean(), g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the binding order hint required here now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh
We want this predicate to be ordered after `guardChecks(g, arg.asExpr(), outcome)`, so that the relatively small number of possible `g` restricts the number of tuples from getting too large.
I've addressed the review comments. The join order for the standard case is now as below. Because the structure of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if DCA shows positive results.
DCA shows nothing at all. Is that acceptible? Or should it be removing the bad join order seen in the nightly DCA run? |
How did you notice it in the first place? I recall you said it was flagged in DCA? Can you tell if that signal has gone away? |
The DCA results didn't show the bad join order before and no bad join order after, as I'd hoped. Maybe I'm just struggling to run DCA in exactly the right way. I will try to pick this up again at some point. |
Currently it's evaluating DataFlow::booleanProperty(outcome).checkOn(ret, p.asBoolean(), g) first, which is massive, then joining ret = getUniqueOutputNode(fd, outp) , which makes it a bit smaller, then guardChecks(g, arg.asExpr(), outcome), which makes it empty. So basically, we are looking for code that looks like returns someFunction(arg) and only then checking if someFunction is a function that we care about. I think it would probably fix the problem to force it to evaluate guardChecks(g, arg.asExpr(), outcome) first, because it will generally be small. I tried putting either only_bind pragma on either of the places that outcome was used, but it made no difference. I then tried pulling out the two lines that aren't guardChecks(g, arg.asExpr(), outcome) and putting them in an inline late predicate. I wasn't sure what to use as the bindingset so I used all the variables. That fixed the join problem.
Before:
After: