[SPARK-51395][SQL] Refine handling of default values in procedures #50197

aokolnychyi · 2025-03-06T22:55:00Z

What changes were proposed in this pull request?

This PR refines handling of default values in procedures that will be released in 4.0.

Why are the changes needed?

These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string.

Does this PR introduce any user-facing change?

Yes, but the stored procedure API hasn't been released yet.

How was this patch tested?

This PR comes with tests.

Was this patch authored or co-authored using generative AI tooling?

No.

cloud-fan · 2025-03-21T05:01:10Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DefaultValue.java

+  }
+
+  @Nullable
+  public Expression getExpression() {


do we really need to follow the java getter naming style?

I would personally prefer not to have getXXX. Unfortunately, ColumnDefaultValue already uses this naming and I do plan to make ColumnDefaultValue extend DefaultValue in the future. Let me know your thoughts. We can of course deprecate getExpression and getSql in ColumnDefaultValue but it may be an overkill given the benefit.

makes sense, let's keep it

cloud-fan · 2025-03-21T05:02:55Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DefaultValue.java

+import org.apache.spark.sql.connector.expressions.Expression;
+
+@Evolving
+public class DefaultValue {


should ColumnDefaultValue extend it?

It the future yes. That's the whole idea.

cloud-fan · 2025-03-21T06:22:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala

+    if (defaultValue.getSql != null) {
+      defaultValue.getSql
+    } else {
+      ExpressionConverter.toV1(defaultValue.getExpression) match {


It seems a bit waste as the caller side needs to parse the SQL again: https://github.com/apache/spark/pull/50197/files#diff-ec7257bce84eb229e6d3afd47819b331560b71d238f30ee9e2dab487d83cce9eR140

Exactly, this is a temp solution until we generalize the default value handling framework to work with expressions.

cloud-fan · 2025-03-21T06:23:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala

+      defaultValue.getSql
+    } else {
+      ExpressionConverter.toV1(defaultValue.getExpression) match {
+        case Some(e) if !e.isInstanceOf[NonSQLExpression] => e.sql


This seems wrong as NonSQLExpression can be nested.

If we don't allow connectors to return v2 expression as the default value for now, maybe we can fail first and support it later?

OK we have a test for it. Maybe we should refactor ResolveDefaultColumns.analyze and make it accept Expression directly.

I was thinking about this too. I updated ResolveDefaultColumns to accept an expression.

cloud-fan · 2025-04-01T02:55:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala

+      statementType: String): Expression = {
+    Option(defaultValue.getExpression)
+      .flatMap(ExpressionConverter.toV1)
+      .map(expr => analyze(colName, dataType, expr, expr.sql, statementType))


nit: shall we use Option(defaultValue.getSql()).getOrElse(expr.sql) instead of expr.sql?

aokolnychyi · 2025-04-05T07:17:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala

@@ -205,4 +206,171 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
        None
    }
  }
+
+  def toCatalyst(expr: V2Expression): Option[Expression] = expr match {


@cloud-fan, I went for a simpler option and added the conversion to V2ExpressionUtils. I initially thought about a single utility that could do bi-directional conversion but it probably won't be worth the complexity and the risk of introducing bugs in such a critical path.

cloud-fan · 2025-04-06T13:28:56Z

The failed tests seem to be unrelated, @aokolnychyi can you re-trigger the tests?

cloud-fan · 2025-04-07T13:56:01Z

thanks, merging to master!

cloud-fan · 2025-04-07T13:57:30Z

It has merge conflicts with 4.0, @aokolnychyi can you open a new 4.0 PR? Thanks!

aokolnychyi · 2025-04-07T19:52:53Z

@cloud-fan, will do. Thanks!

### What changes were proposed in this pull request? This PR refines handling of default values in procedures that will be released in 4.0. ### Why are the changes needed? These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string. ### Does this PR introduce _any_ user-facing change? Yes, but the stored procedure API hasn't been released yet. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50197 from aokolnychyi/spark-51395. Authored-by: Anton Okolnychyi <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 738a503)

…bs at tests ### What changes were proposed in this pull request? This PR is a followup of #50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…bs at tests ### What changes were proposed in this pull request? This PR is a followup of #50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 53ce5c3) Signed-off-by: Hyukjin Kwon <[email protected]>

…bs at tests ### What changes were proposed in this pull request? This PR is a followup of apache#50197 that fixes the tests to pass when ANSI is off. ### Why are the changes needed? The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Ran the unittests: ```bash SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50577 from HyukjinKwon/SPARK-51395-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

github-actions bot added the SQL label Mar 6, 2025

aokolnychyi closed this Mar 7, 2025

aokolnychyi reopened this Mar 7, 2025

cloud-fan reviewed Mar 21, 2025

View reviewed changes

aokolnychyi force-pushed the spark-51395 branch from 425e456 to d2a91d2 Compare March 31, 2025 17:07

cloud-fan reviewed Apr 1, 2025

View reviewed changes

[SPARK-51395][SQL] Refine handling of default values in procedures

6e408c2

aokolnychyi force-pushed the spark-51395 branch from d2a91d2 to 6e408c2 Compare April 5, 2025 07:15

aokolnychyi commented Apr 5, 2025

View reviewed changes

cloud-fan approved these changes Apr 6, 2025

View reviewed changes

aokolnychyi closed this Apr 7, 2025

aokolnychyi reopened this Apr 7, 2025

aokolnychyi changed the title ~~[WIP][SPARK-51395][SQL] Refine handling of default values in procedures~~ [SPARK-51395][SQL] Refine handling of default values in procedures Apr 7, 2025

cloud-fan closed this in 738a503 Apr 7, 2025

aokolnychyi mentioned this pull request Apr 9, 2025

[SPARK-51395][SQL] Refine handling of default values in procedures #50541

Closed

HyukjinKwon mentioned this pull request Apr 14, 2025

[SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests #50577

Closed

[SPARK-51395][SQL] Refine handling of default values in procedures #50197

[SPARK-51395][SQL] Refine handling of default values in procedures #50197

Uh oh!

Conversation

aokolnychyi commented Mar 6, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 6, 2025

Uh oh!

cloud-fan commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Apr 7, 2025

Uh oh!

aokolnychyi commented Apr 7, 2025

Uh oh!

Uh oh!

cloud-fan commented Apr 7, 2025 •

edited

Loading