[SPARK-49557][SQL] Add SQL pipe syntax for the WHERE operator#48091
[SPARK-49557][SQL] Add SQL pipe syntax for the WHERE operator#48091dtenedor wants to merge 16 commits into
Conversation
exclude flaky ThriftServerQueryTestSuite for new golden file
switch to expression switch to expression switch to expression moving error checking to checkanalysis
|
cc @cloud-fan @gengliangwang here is the WHERE operator, the next one. The implementation is relatively simple, I tried to think of as many test cases as possible. |
| }.getOrElse(Option(ctx.whereClause).map { c => | ||
| // Add a table subquery boundary between the new filter and the input plan if one does not | ||
| // already exist. This helps the analyzer behave as if we had added the WHERE clause after a | ||
| // table subquery containing the input plan. |
There was a problem hiding this comment.
This is great! This skips the tricky aggregate function pushdown stuff from Filter/Sort which complicates the analyzer quite a bit. We also don't need this with pipe syntax, as it's quite easy for users to filter on the aggregated query.
There was a problem hiding this comment.
that being said, seems like we don't need to add subquery alias if the child plan is UnresolvedRelation. We don't need to isolate the table scan node here.
There was a problem hiding this comment.
We talked offline and found that updating the UnresolvedRelation pattern match to this fixes the problem:
case u: UnresolvedRelation =>
u
In this way we don't add another redundant SubqueryAlias when ResolveRelations will already add one. Looking at the commit that performs this update, we see the analyzer plans improve accordingly.
There was a problem hiding this comment.
ah, it also fixes a regression. We can add a test for table t |> where spark_catalog.default.t.x = 1, which didn't work before this fix.
There was a problem hiding this comment.
Sounds good, this is done.
|
Thanks, merging to master |
| -- Aggregations are allowed within expression subqueries in the pipe operator WHERE clause as long | ||
| -- no aggregate functions exist in the top-level expression predicate. | ||
| table t | ||
| |> where (select any_value(a) from other where x = a limit 1) = 1; |
There was a problem hiding this comment.
it also supports correlated subqueries!
What changes were proposed in this pull request?
This PR adds SQL pipe syntax support for the WHERE operator.
For example:
Why are the changes needed?
The SQL pipe operator syntax will let users compose queries in a more flexible fashion.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.
Was this patch authored or co-authored using generative AI tooling?
No