forked from prestodb/presto
-
Notifications
You must be signed in to change notification settings - Fork 4
refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wraymo
wants to merge
27
commits into
y-scope:release-0.293-clp-connector
Choose a base branch
from
wraymo:metadata_refactor
base: release-0.293-clp-connector
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
68dde54
refactor progress
wraymo aa85805
progress
wraymo 7164767
finish refactoring metadata expression generation
wraymo 8e077fc
progress
wraymo b191322
add converter
wraymo 9cd2d4b
refactor tests
wraymo 999a6fa
format fix
wraymo 422ff69
doc update
wraymo e517b17
doc update
wraymo 01aefab
refactor
wraymo cdebd8d
fix compilation errors
wraymo 6430711
fix issues in TestClpSplitMetadataConfigCommon
wraymo c5b6776
fix TestClpUdfRewriter
wraymo d816d79
fix TestClpMySqlSplitMetadataConfig and TestClpSplit
wraymo e8e8334
clean up the docs
wraymo 9b54813
fix bugs
wraymo f08107d
add comments
wraymo 11cf752
minor work
wraymo 86f629c
doc improvement
wraymo 37e3235
Merge branch 'release-0.293-clp-connector' into metadata_refactor
wraymo 7a283a7
rename
wraymo a6b71f6
add more test cases and fix bugs
wraymo d988cbc
apply coderabbit suggestion
wraymo fbc6498
Added support for incorporating metadata columns during metadata reso…
wraymo f2db72c
apply coderabbit suggestions
wraymo 5e58a60
fix format issue
wraymo 8137399
validate required columns in split provider
wraymo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t throw before metadata filters are processed.
visitTableScanis invoked while rewriting every table scan, including ones that still have a metadata filter sitting above them. Because the new guard executes before the filter is analysed, any table that lists required columns now fails even when the query provides the correct metadata filter. We need to defer the “required filters must be specified” check until after we’ve inspected the filter’s RowExpression (so genuine missing filters still fail, but valid ones pass).Apply this diff to remove the premature throw:
Then, inside
processFilter, enforce the requirement only when no metadata expression was produced:Don’t forget to add
import java.util.Set;at the top. This keeps the enforcement intact for missing filters while allowing valid filtered queries to proceed.🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be no filters. No filter should be treated as a failure if required columns are specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we need to do it in
visitTableScanThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wraymo, you're absolutely right! The check in
visitTableScanis necessary and correct for the no-filter case.The visitor pattern here works as follows:
visitTableScanis called, and the required-columns check appropriately throws an exception.visitFilteris called, which directly invokesprocessFilteron the sourceTableScanNodewithout delegating tovisitTableScan. This meansvisitTableScanis never invoked in this path.So the check in
visitTableScanonly fires when there's no filter, which is exactly when you want to reject queries that don't provide the required metadata filters. It doesn't interfere with queries that have filters because those go through thevisitFilter→processFilterpath instead.CodeRabbit's concern about a "premature throw before metadata filters are processed" doesn't apply here—when a metadata filter exists in the plan,
visitTableScanisn't called at all.One minor observation (not blocking):
processFiltercould also add a check to ensure that when required columns exist and the filter produces nometadataExpression, an exception is thrown. This would catch the edge case where a filter exists but doesn't cover the required metadata columns. However, that might be a separate consideration beyond the scope of this specific discussion.🧠 Learnings used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wraymo, you're absolutely right! I see the distinction now.
The visitor flow means:
visitTableScanis called → your check catches it ✓visitFilteris called →processFilterhandles it directly →visitTableScanis NOT called for that TableScanNodeSo the check in
visitTableScanspecifically guards against queries that have zero filters when required columns are configured. That's the correct placement for that scenario.One follow-up question: for queries that do have a filter but the filter doesn't reference the required columns (doesn't produce a
metadataExpression), where is that validated? I see thatprocessFilterextracts themetadataExpressionbut doesn't appear to check whether required columns were satisfied. Is that validation happening later inClpMySqlSplitMetadataExpressionConverter.transform()during split generation, or should we add an explicit check inprocessFilterto fail fast at plan time?🧠 Learnings used