-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49557][SQL] Add SQL pipe syntax for the WHERE operator #48091
Conversation
exclude flaky ThriftServerQueryTestSuite for new golden file
switch to expression switch to expression switch to expression moving error checking to checkanalysis
cc @cloud-fan @gengliangwang here is the WHERE operator, the next one. The implementation is relatively simple, I tried to think of as many test cases as possible. |
}.getOrElse(Option(ctx.whereClause).map { c => | ||
// Add a table subquery boundary between the new filter and the input plan if one does not | ||
// already exist. This helps the analyzer behave as if we had added the WHERE clause after a | ||
// table subquery containing the input plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! This skips the tricky aggregate function pushdown stuff from Filter/Sort which complicates the analyzer quite a bit. We also don't need this with pipe syntax, as it's quite easy for users to filter on the aggregated query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that being said, seems like we don't need to add subquery alias if the child plan is UnresolvedRelation
. We don't need to isolate the table scan node here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline and found that updating the UnresolvedRelation
pattern match to this fixes the problem:
case u: UnresolvedRelation =>
u
In this way we don't add another redundant SubqueryAlias
when ResolveRelations
will already add one. Looking at the commit that performs this update, we see the analyzer plans improve accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it also fixes a regression. We can add a test for table t |> where spark_catalog.default.t.x = 1
, which didn't work before this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, this is done.
Thanks, merging to master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome feature!
-- Aggregations are allowed within expression subqueries in the pipe operator WHERE clause as long | ||
-- no aggregate functions exist in the top-level expression predicate. | ||
table t | ||
|> where (select any_value(a) from other where x = a limit 1) = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also supports correlated subqueries!
What changes were proposed in this pull request?
This PR adds SQL pipe syntax support for the WHERE operator.
For example:
Why are the changes needed?
The SQL pipe operator syntax will let users compose queries in a more flexible fashion.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.
Was this patch authored or co-authored using generative AI tooling?
No