Skip to content

Commit

Permalink
[SPARK-48966][SQL] Improve error message with invalid unresolved colu…
Browse files Browse the repository at this point in the history
…mn reference in UDTF call

### What changes were proposed in this pull request?

This bug covers improving an error message in the event of invalid UDTF calls. For example:

```
select * from udtf(
  observed => TABLE(select column from t),
  value_col => classic_dollars
)
```
Currently we get:

```
Unsupported subquery expression: Table arguments are used in a function where they are not supported:
'UnresolvedTableValuedFunction [udtf], [observed => table-argument#68918 [], value_col => 'classic_dollars, false
   +- Project ...
       +- SubqueryAlias ...
          +- Relation ...
```

But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument.

The core reason is that `CheckAnalysis` checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references.

To fix it, we update `CheckAnalysis` to move the check for remaining `TABLE` arguments to later. Now that query returns something like:

```
{
  "errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
  "sqlState" : "42703",
  "messageParameters" : {
    "objectName" : "`classic_dollars`"
  },
  "queryContext" : [ {
    "objectType" : "",
    "objectName" : "",
    "startIndex" : 93,
    "stopIndex" : 109,
    "fragment" : "classic_dollars"
  } ]
}
```

### Why are the changes needed?

This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

This PR adds and updates golden file test coverage.

### Was this patch authored or co-authored using generative AI tooling?

Just some casual, everyday GitHub CoPilot usage.

Closes #47447 from dtenedor/improve-error-udtf.

Authored-by: Daniel Tenedorio <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
dtenedor authored and HyukjinKwon committed Aug 15, 2024
1 parent 470e3f9 commit 3e867a6
Show file tree
Hide file tree
Showing 7 changed files with 82 additions and 28 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,17 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB

case _ =>
})

// Check for unresolved TABLE arguments after the main check above to allow other analysis
// errors to apply first, providing better error messages.
getAllExpressions(operator).foreach(_.foreachUp {
case expr: FunctionTableSubqueryArgumentExpression =>
expr.failAnalysis(
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
messageParameters = Map("treeNode" -> planToString(plan)))
case _ =>
})

if (stagedError.isDefined) stagedError.get.apply()

operator match {
Expand Down Expand Up @@ -1078,9 +1089,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
checkCorrelationsInSubquery(expr.plan, isLateral = true)

case _: FunctionTableSubqueryArgumentExpression =>
expr.failAnalysis(
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
messageParameters = Map("treeNode" -> planToString(plan)))
// Do nothing here, since we will check for this pattern later.

case inSubqueryOrExistsSubquery =>
plan match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,17 +202,21 @@ SELECT * FROM explode(collection => TABLE(v))
-- !query analysis
org.apache.spark.sql.catalyst.ExtendedAnalysisException
{
"errorClass" : "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
"sqlState" : "0A000",
"errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
"sqlState" : "42K09",
"messageParameters" : {
"treeNode" : "'Generate explode(table-argument#x []), false\n: +- SubqueryAlias v\n: +- View (`v`, [id#xL])\n: +- Project [cast(id#xL as bigint) AS id#xL]\n: +- Project [id#xL]\n: +- Range (0, 8, step=1)\n+- OneRowRelation\n"
"inputSql" : "\"functiontablesubqueryargumentexpression()\"",
"inputType" : "\"STRUCT<id: BIGINT NOT NULL>\"",
"paramIndex" : "first",
"requiredType" : "(\"ARRAY\" or \"MAP\")",
"sqlExpr" : "\"explode(functiontablesubqueryargumentexpression())\""
},
"queryContext" : [ {
"objectType" : "",
"objectName" : "",
"startIndex" : 37,
"stopIndex" : 44,
"fragment" : "TABLE(v)"
"startIndex" : 15,
"stopIndex" : 45,
"fragment" : "explode(collection => TABLE(v))"
} ]
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -924,6 +924,27 @@ SELECT * FROM UDTFPartitionByIndexingBug(
[Analyzer test output redacted due to nondeterminism]


-- !query
SELECT * FROM
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column)
-- !query analysis
org.apache.spark.sql.catalyst.ExtendedAnalysisException
{
"errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
"sqlState" : "42703",
"messageParameters" : {
"objectName" : "`unresolved_column`"
},
"queryContext" : [ {
"objectType" : "",
"objectName" : "",
"startIndex" : 93,
"stopIndex" : 109,
"fragment" : "unresolved_column"
} ]
}


-- !query
DROP VIEW t1
-- !query analysis
Expand Down
6 changes: 6 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/udtf/udtf.sql
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,12 @@ SELECT * FROM UDTFPartitionByIndexingBug(
1.0 AS double_col
)
);
-- Exercise a query with both a valid TABLE argument and an invalid unresolved column reference.
-- The 'eval' method of this UDTF would later throw an exception, but that is not relevant here
-- because the analysis of this query should fail before that point. We just want to make sure that
-- this analysis failure returns a reasonable error message.
SELECT * FROM
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column);

-- cleanup
DROP VIEW t1;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,17 +185,21 @@ struct<>
-- !query output
org.apache.spark.sql.catalyst.ExtendedAnalysisException
{
"errorClass" : "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
"sqlState" : "0A000",
"errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
"sqlState" : "42K09",
"messageParameters" : {
"treeNode" : "'Generate explode(table-argument#x []), false\n: +- SubqueryAlias v\n: +- View (`v`, [id#xL])\n: +- Project [cast(id#xL as bigint) AS id#xL]\n: +- Project [id#xL]\n: +- Range (0, 8, step=1)\n+- OneRowRelation\n"
"inputSql" : "\"functiontablesubqueryargumentexpression()\"",
"inputType" : "\"STRUCT<id: BIGINT NOT NULL>\"",
"paramIndex" : "first",
"requiredType" : "(\"ARRAY\" or \"MAP\")",
"sqlExpr" : "\"explode(functiontablesubqueryargumentexpression())\""
},
"queryContext" : [ {
"objectType" : "",
"objectName" : "",
"startIndex" : 37,
"stopIndex" : 44,
"fragment" : "TABLE(v)"
"startIndex" : 15,
"stopIndex" : 45,
"fragment" : "explode(collection => TABLE(v))"
} ]
}

Expand Down
23 changes: 23 additions & 0 deletions sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,29 @@ NULL 1.0
NULL 1.0


-- !query
SELECT * FROM
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column)
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.ExtendedAnalysisException
{
"errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
"sqlState" : "42703",
"messageParameters" : {
"objectName" : "`unresolved_column`"
},
"queryContext" : [ {
"objectType" : "",
"objectName" : "",
"startIndex" : 93,
"stopIndex" : 109,
"fragment" : "unresolved_column"
} ]
}


-- !query
DROP VIEW t1
-- !query schema
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -317,19 +317,6 @@ class PythonUDTFSuite extends QueryTest with SharedSparkSession {
case other =>
failure(other)
}
withTable("t") {
sql("create table t(col array<int>) using parquet")
val query = "select * from explode(table(t))"
checkErrorMatchPVals(
exception = intercept[AnalysisException](sql(query)),
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
sqlState = None,
parameters = Map("treeNode" -> "(?s).*"),
context = ExpectedContext(
fragment = "table(t)",
start = 22,
stop = 29))
}

spark.udtf.registerPython(UDTFCountSumLast.name, pythonUDTFCountSumLast)
var plan = sql(
Expand Down

0 comments on commit 3e867a6

Please sign in to comment.