Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49556][SQL] Add SQL pipe syntax for the SELECT operator #48047
[SPARK-49556][SQL] Add SQL pipe syntax for the SELECT operator #48047
Changes from 10 commits
2522fb2
0cd4f2a
599b294
fac88af
51a01d1
0ee5fc4
0d862eb
557bd0c
d0c375d
9340a37
bb8c706
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time understanding this recursive parser rule. How does it match continuous pipe operators? And what is the Operator Precedence with mixed classic SQL query syntax and the new pipe syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with ANTLR enough. So this recursive parser rule matches the SQL string from the end? e.g. it finds the first
operatorPipeRightSide
from the end, and then tries to match a chain of pipe operators.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem, I can try to explain it.
ANTLR tokenizes each SQL query it receives, converting the input string into a sequence of tokens (using
SqlBaseLexer.g4
). Then the parser's job (in this file) is to convert that sequence of tokens into an initial unresolved logical plan representing the parse tree.To do so, the parser checks each rule in the listed sequence, one-by-one, comparing the provided tokens at the current index in the sequence with the required tokens from the rule. If the rule matches, wherein all keywords and other components in the rule map to corresponding input tokens, then the parser generates the rule's unresolved logical plan tree using the logic in
AstBuilder.scala
.In this case, we define the new token
OPERATOR_PIPE: '|>';
inSqlBaseLexer.g4
. Then we add a new option to the existingqueryTerm
rule to allow any syntax matching an existingqueryTerm
to appear on the left side of this|>
token and the syntax ofoperatorPipeRightSide
on the right side (which in this PR is limited to only aselectClause
).ANTLR grammar allows left-recursive rules wherein any alternative may begin with a reference to the same rule, so the
queryTerm
on the left side may match any valid existing syntax for aqueryTerm
such asTABLE t
, a table subquery, etc. Since we are extendingqueryTerm
to also match againstqueryTerm OPERATOR_PIPE operatorPipeRightSide
, this alternative implements the recursion wherein we may chain multiple pipe operators together. For example, inTABLE t |> SELECT x |> LIMIT 2
,TABLE t
matches aqueryTerm
, thenTABLE t |> SELECT x
matches another, and finally the entire query (using the new recursive#operatorPipeStatement
alternative two times).Otherwise, if the rule does not match, then the parser moves on to try the next rule in the sequence, and so on, similar to a Scala pattern-match. This defines the precedence of the rules amongst each other: the ones appearing first in the list in
SqlBaseParser.g4
apply first.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parser generates a basic parse tree, and
AstBuilder.scala
transforms that into an unresolved logical plan? Thanks for the clear and detailed explanation! I'm adding SQL syntax too and this is very helpful.