Skip to content

[SPARK-51395][SQL] Refine handling of default values in procedures #50197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

aokolnychyi
Copy link
Contributor

What changes were proposed in this pull request?

This PR refines handling of default values in procedures that will be released in 4.0.

Why are the changes needed?

These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string.

Does this PR introduce any user-facing change?

Yes, but the stored procedure API hasn't been released yet.

How was this patch tested?

This PR comes with tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 6, 2025
@aokolnychyi aokolnychyi closed this Mar 7, 2025
@aokolnychyi aokolnychyi reopened this Mar 7, 2025
}

@Nullable
public Expression getExpression() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to follow the java getter naming style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer not to have getXXX. Unfortunately, ColumnDefaultValue already uses this naming and I do plan to make ColumnDefaultValue extend DefaultValue in the future. Let me know your thoughts. We can of course deprecate getExpression and getSql in ColumnDefaultValue but it may be an overkill given the benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, let's keep it

import org.apache.spark.sql.connector.expressions.Expression;

@Evolving
public class DefaultValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should ColumnDefaultValue extend it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It the future yes. That's the whole idea.

if (defaultValue.getSql != null) {
defaultValue.getSql
} else {
ExpressionConverter.toV1(defaultValue.getExpression) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, this is a temp solution until we generalize the default value handling framework to work with expressions.

defaultValue.getSql
} else {
ExpressionConverter.toV1(defaultValue.getExpression) match {
case Some(e) if !e.isInstanceOf[NonSQLExpression] => e.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong as NonSQLExpression can be nested.

If we don't allow connectors to return v2 expression as the default value for now, maybe we can fail first and support it later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we have a test for it. Maybe we should refactor ResolveDefaultColumns.analyze and make it accept Expression directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this too. I updated ResolveDefaultColumns to accept an expression.

statementType: String): Expression = {
Option(defaultValue.getExpression)
.flatMap(ExpressionConverter.toV1)
.map(expr => analyze(colName, dataType, expr, expr.sql, statementType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we use Option(defaultValue.getSql()).getOrElse(expr.sql) instead of expr.sql?

@@ -205,4 +206,171 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
None
}
}

def toCatalyst(expr: V2Expression): Option[Expression] = expr match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, I went for a simpler option and added the conversion to V2ExpressionUtils. I initially thought about a single utility that could do bi-directional conversion but it probably won't be worth the complexity and the risk of introducing bugs in such a critical path.

@cloud-fan
Copy link
Contributor

The failed tests seem to be unrelated, @aokolnychyi can you re-trigger the tests?

@aokolnychyi aokolnychyi closed this Apr 7, 2025
@aokolnychyi aokolnychyi reopened this Apr 7, 2025
@aokolnychyi aokolnychyi changed the title [WIP][SPARK-51395][SQL] Refine handling of default values in procedures [SPARK-51395][SQL] Refine handling of default values in procedures Apr 7, 2025
@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 7, 2025

thanks, merging to master!

@cloud-fan cloud-fan closed this in 738a503 Apr 7, 2025
@cloud-fan
Copy link
Contributor

It has merge conflicts with 4.0, @aokolnychyi can you open a new 4.0 PR? Thanks!

@aokolnychyi
Copy link
Contributor Author

@cloud-fan, will do. Thanks!

aokolnychyi added a commit to aokolnychyi/spark that referenced this pull request Apr 9, 2025
### What changes were proposed in this pull request?

This PR refines handling of default values in procedures that will be released in 4.0.

### Why are the changes needed?

These changes are needed as connectors like Iceberg may not have utilities to generate SQL strings containing Spark SQL dialects. The API should be changed to allow either a DSv2 expression or a SQL string.

### Does this PR introduce _any_ user-facing change?

Yes, but the stored procedure API hasn't been released yet.

### How was this patch tested?

This PR comes with tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50197 from aokolnychyi/spark-51395.

Authored-by: Anton Okolnychyi <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit 738a503)
HyukjinKwon added a commit that referenced this pull request Apr 14, 2025
…bs at tests

### What changes were proposed in this pull request?

This PR is a followup of #50197 that fixes the tests to pass when ANSI is off.

### Why are the changes needed?

The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Ran the unittests:

```bash
SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50577 from HyukjinKwon/SPARK-51395-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon added a commit that referenced this pull request Apr 14, 2025
…bs at tests

### What changes were proposed in this pull request?

This PR is a followup of #50197 that fixes the tests to pass when ANSI is off.

### Why are the changes needed?

The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Ran the unittests:

```bash
SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50577 from HyukjinKwon/SPARK-51395-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 53ce5c3)
Signed-off-by: Hyukjin Kwon <[email protected]>
vladimirg-db pushed a commit to vladimirg-db/spark that referenced this pull request Apr 15, 2025
…bs at tests

### What changes were proposed in this pull request?

This PR is a followup of apache#50197 that fixes the tests to pass when ANSI is off.

### Why are the changes needed?

The non-ANSI scheduled build is broken at https://github.com/apache/spark/actions/runs/14424930354/job/40452472736

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Ran the unittests:

```bash
SPARK_ANSI_SQL_MODE=false ./build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.v2.DataSourceV2StrategySuite -- -z round
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50577 from HyukjinKwon/SPARK-51395-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants