-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51395][SQL] Refine handling of default values in procedures #50197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
@Nullable | ||
public Expression getExpression() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to follow the java getter naming style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally prefer not to have getXXX
. Unfortunately, ColumnDefaultValue
already uses this naming and I do plan to make ColumnDefaultValue
extend DefaultValue
in the future. Let me know your thoughts. We can of course deprecate getExpression
and getSql
in ColumnDefaultValue
but it may be an overkill given the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, let's keep it
import org.apache.spark.sql.connector.expressions.Expression; | ||
|
||
@Evolving | ||
public class DefaultValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should ColumnDefaultValue
extend it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It the future yes. That's the whole idea.
if (defaultValue.getSql != null) { | ||
defaultValue.getSql | ||
} else { | ||
ExpressionConverter.toV1(defaultValue.getExpression) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit waste as the caller side needs to parse the SQL again: https://github.com/apache/spark/pull/50197/files#diff-ec7257bce84eb229e6d3afd47819b331560b71d238f30ee9e2dab487d83cce9eR140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is a temp solution until we generalize the default value handling framework to work with expressions.
defaultValue.getSql | ||
} else { | ||
ExpressionConverter.toV1(defaultValue.getExpression) match { | ||
case Some(e) if !e.isInstanceOf[NonSQLExpression] => e.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong as NonSQLExpression
can be nested.
If we don't allow connectors to return v2 expression as the default value for now, maybe we can fail first and support it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK we have a test for it. Maybe we should refactor ResolveDefaultColumns.analyze
and make it accept Expression
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this too. I updated ResolveDefaultColumns
to accept an expression.
425e456
to
d2a91d2
Compare
statementType: String): Expression = { | ||
Option(defaultValue.getExpression) | ||
.flatMap(ExpressionConverter.toV1) | ||
.map(expr => analyze(colName, dataType, expr, expr.sql, statementType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we use Option(defaultValue.getSql()).getOrElse(expr.sql)
instead of expr.sql
?
@@ -205,4 +206,171 @@ object V2ExpressionUtils extends SQLConfHelper with Logging { | |||
None | |||
} | |||
} | |||
|
|||
def toCatalyst(expr: V2Expression): Option[Expression] = expr match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, I went for a simpler option and added the conversion to V2ExpressionUtils
. I initially thought about a single utility that could do bi-directional conversion but it probably won't be worth the complexity and the risk of introducing bugs in such a critical path.
The failed tests seem to be unrelated, @aokolnychyi can you re-trigger the tests? |
thanks, merging to master! |
It has merge conflicts with 4.0, @aokolnychyi can you open a new 4.0 PR? Thanks! |
@cloud-fan, will do. Thanks! |
### What changes were proposed in this pull request? This PR refines handling of default values in procedures that will be released in 4.0. ### Why are the changes needed? These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string. ### Does this PR introduce _any_ user-facing change? Yes, but the stored procedure API hasn't been released yet. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50197 from aokolnychyi/spark-51395. Authored-by: Anton Okolnychyi <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 738a503)
…bs at tests ### What changes were proposed in this pull request? This PR is a followup of #50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…bs at tests ### What changes were proposed in this pull request? This PR is a followup of #50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 53ce5c3) Signed-off-by: Hyukjin Kwon <[email protected]>
…bs at tests ### What changes were proposed in this pull request? This PR is a followup of apache#50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR refines handling of default values in procedures that will be released in 4.0.
Why are the changes needed?
These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string.
Does this PR introduce any user-facing change?
Yes, but the stored procedure API hasn't been released yet.
How was this patch tested?
This PR comes with tests.
Was this patch authored or co-authored using generative AI tooling?
No.