-
Notifications
You must be signed in to change notification settings - Fork 118
Support dynamic filters via remote scanner #4077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
249cce6 to
923f158
Compare
Test Results 7 files + 2 7 suites +2 3m 22s ⏱️ + 2m 6s Results for commit 95383dc. ± Comparison against base commit bbf4d42. This pull request removes 34 and adds 47 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
923f158 to
e9989d8
Compare
igalshilman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks 🔥
crates/storage-query-datafusion/src/remote_query_scanner_client.rs
Outdated
Show resolved
Hide resolved
crates/storage-query-datafusion/src/remote_query_scanner_client.rs
Outdated
Show resolved
Hide resolved
| if current_predicate_generation != predicate_generation { | ||
| predicate_generation = current_predicate_generation; | ||
| Some(RemoteQueryScannerPredicate { | ||
| serialized_physical_expression: encode_expr(predicate)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do you know what is the typical size of this, surely small-ish, just making sure.
- do you know how often we are expected to enter that branch? is it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small yeah - status = "completed" encodes to 33 bytes
i think in the topk case we will probably enter it every batch, as each batch that arrives at the topk exec will update the min or the max and lead to a tighter bound
e9989d8 to
30165dc
Compare
30165dc to
95383dc
Compare
Dynamic filters massively reduce the number of rows returned in topk operations like 'get the 50 most recent invocations'. They are not particularly helpful locally however, as we have to add a row to a record batch to be able to apply a filter, and at this point we have done 99% of the work of the scan. But remotely, they allow us to send less rows over the wire, which is valuable.
In handle_child_pushdown_result we track any dynamic filters and combine them with our initial filter. Then we are able to snapshot the dynamic filters and send them over the wire when they change, so on the remote side a stricter filter can be applied