Skip to content

Experiment: Make all data flow incremental #20028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Jul 11, 2025

I've polished up this branch for testing and to communicate a proof-of-concept for what it would look like to have all data flow incremental, either diff-informed or "overlay-informed". The last commit uses forceLocal to add in the base-to-base results, but it has some problems:

  • I wasn't able to test performance because it turns out we don't get any cache hits when forceLocal is in use. Alex is on it.
  • It only works for Java because the predicate that calls forceLocal can't currently be marked as local?. The shared code in this PR won't compile for other languages than Java because it requires Node to be local.

To achieve the full vision of incrementality, the type tracking library also needs to be overlay-aware. That library seems to be the main performance bottleneck that we know how to address in QL.

As I'm going out on holiday now, I want to leave this PR for @alexet and @aschackmull to work from or take inspiration from.

kaspersv and others added 4 commits July 11, 2025 10:52
To ensure good performance, always run data flow overlay-informed unless
the configuration has opted in to being diff-informed. This change
affects only databases with an overlay and therefore has no immediate
production consequences.
This won't work for other languages until we can annotate a `forceLocal`
call with `local?`.
// ensures that we re-evaluate flow that passes from a file in the
// diff (but in the base), through a file in the overlay with
// possibly important changes, back to the base.
isDiffFileNode(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks a bunch of caching in a subtle way. In the base this disjunction is equivalent to any(). However we don't have any ability to optimise this. So it ends up depending on isDiffFileNode(source) which depends on the extensible predicate which changes. We therefore can't cache it.

I don't see an easy way to fix this, without exposing isEvaluatingInOverlay as a hop (i.e. a bit like the isOverlayModeElse RA hop) and also have the evaluator implemented like Hennings idea. Fundermentally we need to depend on isDiffFileNode in the overlay and we have no way to have different dependecies between overlay and base until after expansion.

On the other hand deleting it mostly gets the performance we expect (but maybe with the accuracy problems as noted by the comment)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants