Skip to content

Commit 3e867a6

Browse files
dtenedorHyukjinKwon
authored andcommitted
[SPARK-48966][SQL] Improve error message with invalid unresolved column reference in UDTF call
### What changes were proposed in this pull request? This bug covers improving an error message in the event of invalid UDTF calls. For example: ``` select * from udtf( observed => TABLE(select column from t), value_col => classic_dollars ) ``` Currently we get: ``` Unsupported subquery expression: Table arguments are used in a function where they are not supported: 'UnresolvedTableValuedFunction [udtf], [observed => table-argument#68918 [], value_col => 'classic_dollars, false +- Project ... +- SubqueryAlias ... +- Relation ... ``` But the real error is that the user passed column identifier classic_dollars rather than string "classic_dollars" into the string argument. The core reason is that `CheckAnalysis` checks the analyzer output tree for unresolved TABLE arguments first before checking for remaining unresolved attribute references. To fix it, we update `CheckAnalysis` to move the check for remaining `TABLE` arguments to later. Now that query returns something like: ``` { "errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION", "sqlState" : "42703", "messageParameters" : { "objectName" : "`classic_dollars`" }, "queryContext" : [ { "objectType" : "", "objectName" : "", "startIndex" : 93, "stopIndex" : 109, "fragment" : "classic_dollars" } ] } ``` ### Why are the changes needed? This improves error messages for SQL queries that are invalid, but that a user might reasonably create accidentally while figuring out how the syntax works for calling table-valued functions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This PR adds and updates golden file test coverage. ### Was this patch authored or co-authored using generative AI tooling? Just some casual, everyday GitHub CoPilot usage. Closes #47447 from dtenedor/improve-error-udtf. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent 470e3f9 commit 3e867a6

File tree

7 files changed

+82
-28
lines changed

7 files changed

+82
-28
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,17 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
400400

401401
case _ =>
402402
})
403+
404+
// Check for unresolved TABLE arguments after the main check above to allow other analysis
405+
// errors to apply first, providing better error messages.
406+
getAllExpressions(operator).foreach(_.foreachUp {
407+
case expr: FunctionTableSubqueryArgumentExpression =>
408+
expr.failAnalysis(
409+
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
410+
messageParameters = Map("treeNode" -> planToString(plan)))
411+
case _ =>
412+
})
413+
403414
if (stagedError.isDefined) stagedError.get.apply()
404415

405416
operator match {
@@ -1078,9 +1089,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB
10781089
checkCorrelationsInSubquery(expr.plan, isLateral = true)
10791090

10801091
case _: FunctionTableSubqueryArgumentExpression =>
1081-
expr.failAnalysis(
1082-
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
1083-
messageParameters = Map("treeNode" -> planToString(plan)))
1092+
// Do nothing here, since we will check for this pattern later.
10841093

10851094
case inSubqueryOrExistsSubquery =>
10861095
plan match {

sql/core/src/test/resources/sql-tests/analyzer-results/named-function-arguments.sql.out

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -202,17 +202,21 @@ SELECT * FROM explode(collection => TABLE(v))
202202
-- !query analysis
203203
org.apache.spark.sql.catalyst.ExtendedAnalysisException
204204
{
205-
"errorClass" : "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
206-
"sqlState" : "0A000",
205+
"errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
206+
"sqlState" : "42K09",
207207
"messageParameters" : {
208-
"treeNode" : "'Generate explode(table-argument#x []), false\n: +- SubqueryAlias v\n: +- View (`v`, [id#xL])\n: +- Project [cast(id#xL as bigint) AS id#xL]\n: +- Project [id#xL]\n: +- Range (0, 8, step=1)\n+- OneRowRelation\n"
208+
"inputSql" : "\"functiontablesubqueryargumentexpression()\"",
209+
"inputType" : "\"STRUCT<id: BIGINT NOT NULL>\"",
210+
"paramIndex" : "first",
211+
"requiredType" : "(\"ARRAY\" or \"MAP\")",
212+
"sqlExpr" : "\"explode(functiontablesubqueryargumentexpression())\""
209213
},
210214
"queryContext" : [ {
211215
"objectType" : "",
212216
"objectName" : "",
213-
"startIndex" : 37,
214-
"stopIndex" : 44,
215-
"fragment" : "TABLE(v)"
217+
"startIndex" : 15,
218+
"stopIndex" : 45,
219+
"fragment" : "explode(collection => TABLE(v))"
216220
} ]
217221
}
218222

sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -924,6 +924,27 @@ SELECT * FROM UDTFPartitionByIndexingBug(
924924
[Analyzer test output redacted due to nondeterminism]
925925

926926

927+
-- !query
928+
SELECT * FROM
929+
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column)
930+
-- !query analysis
931+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
932+
{
933+
"errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
934+
"sqlState" : "42703",
935+
"messageParameters" : {
936+
"objectName" : "`unresolved_column`"
937+
},
938+
"queryContext" : [ {
939+
"objectType" : "",
940+
"objectName" : "",
941+
"startIndex" : 93,
942+
"stopIndex" : 109,
943+
"fragment" : "unresolved_column"
944+
} ]
945+
}
946+
947+
927948
-- !query
928949
DROP VIEW t1
929950
-- !query analysis

sql/core/src/test/resources/sql-tests/inputs/udtf/udtf.sql

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,12 @@ SELECT * FROM UDTFPartitionByIndexingBug(
159159
1.0 AS double_col
160160
)
161161
);
162+
-- Exercise a query with both a valid TABLE argument and an invalid unresolved column reference.
163+
-- The 'eval' method of this UDTF would later throw an exception, but that is not relevant here
164+
-- because the analysis of this query should fail before that point. We just want to make sure that
165+
-- this analysis failure returns a reasonable error message.
166+
SELECT * FROM
167+
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column);
162168

163169
-- cleanup
164170
DROP VIEW t1;

sql/core/src/test/resources/sql-tests/results/named-function-arguments.sql.out

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -185,17 +185,21 @@ struct<>
185185
-- !query output
186186
org.apache.spark.sql.catalyst.ExtendedAnalysisException
187187
{
188-
"errorClass" : "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
189-
"sqlState" : "0A000",
188+
"errorClass" : "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
189+
"sqlState" : "42K09",
190190
"messageParameters" : {
191-
"treeNode" : "'Generate explode(table-argument#x []), false\n: +- SubqueryAlias v\n: +- View (`v`, [id#xL])\n: +- Project [cast(id#xL as bigint) AS id#xL]\n: +- Project [id#xL]\n: +- Range (0, 8, step=1)\n+- OneRowRelation\n"
191+
"inputSql" : "\"functiontablesubqueryargumentexpression()\"",
192+
"inputType" : "\"STRUCT<id: BIGINT NOT NULL>\"",
193+
"paramIndex" : "first",
194+
"requiredType" : "(\"ARRAY\" or \"MAP\")",
195+
"sqlExpr" : "\"explode(functiontablesubqueryargumentexpression())\""
192196
},
193197
"queryContext" : [ {
194198
"objectType" : "",
195199
"objectName" : "",
196-
"startIndex" : 37,
197-
"stopIndex" : 44,
198-
"fragment" : "TABLE(v)"
200+
"startIndex" : 15,
201+
"stopIndex" : 45,
202+
"fragment" : "explode(collection => TABLE(v))"
199203
} ]
200204
}
201205

sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1095,6 +1095,29 @@ NULL 1.0
10951095
NULL 1.0
10961096

10971097

1098+
-- !query
1099+
SELECT * FROM
1100+
InvalidEvalReturnsNoneToNonNullableColumnScalarType(TABLE(SELECT 1 AS X), unresolved_column)
1101+
-- !query schema
1102+
struct<>
1103+
-- !query output
1104+
org.apache.spark.sql.catalyst.ExtendedAnalysisException
1105+
{
1106+
"errorClass" : "UNRESOLVED_COLUMN.WITHOUT_SUGGESTION",
1107+
"sqlState" : "42703",
1108+
"messageParameters" : {
1109+
"objectName" : "`unresolved_column`"
1110+
},
1111+
"queryContext" : [ {
1112+
"objectType" : "",
1113+
"objectName" : "",
1114+
"startIndex" : 93,
1115+
"stopIndex" : 109,
1116+
"fragment" : "unresolved_column"
1117+
} ]
1118+
}
1119+
1120+
10981121
-- !query
10991122
DROP VIEW t1
11001123
-- !query schema

sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -317,19 +317,6 @@ class PythonUDTFSuite extends QueryTest with SharedSparkSession {
317317
case other =>
318318
failure(other)
319319
}
320-
withTable("t") {
321-
sql("create table t(col array<int>) using parquet")
322-
val query = "select * from explode(table(t))"
323-
checkErrorMatchPVals(
324-
exception = intercept[AnalysisException](sql(query)),
325-
errorClass = "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT",
326-
sqlState = None,
327-
parameters = Map("treeNode" -> "(?s).*"),
328-
context = ExpectedContext(
329-
fragment = "table(t)",
330-
start = 22,
331-
stop = 29))
332-
}
333320

334321
spark.udtf.registerPython(UDTFCountSumLast.name, pythonUDTFCountSumLast)
335322
var plan = sql(

0 commit comments

Comments
 (0)