Skip to content

(native): correlated scalar subquery with string projection misses multiple-row failure #27709

@pramodsatya

Description

@pramodsatya

Your Environment

  • Presto version used: presto-native-tests branch
  • Velox version used: submodule 7dcf49cee4d8988f14ee274949a0c35d9052d6ea
  • Storage (HDFS/S3/GCS..): local native test data generated under presto-native-tests/target/velox_data/PARQUET
  • Data source and connector used: Hive connector through Presto native tests, PARQUET
  • Deployment (Cloud or On-prem): local Prestissimo debug worker, WORKER_COUNT=1, sidecarEnabled=true
  • Pastebin link to the complete debug logs: N/A; focused repro output and worker log notes are included below.

Expected Behavior

All three correlated scalar subqueries below should fail because the scalar subquery can produce more than one row for at least one outer row:

Scalar sub-query has returned multiple rows

This is the behavior asserted by AbstractTestQueries.testCorrelatedNonAggregationScalarSubqueries.

Current Behavior

The native engine only fails the integer constant projection case. The two string projection cases incorrectly complete with an empty result:

first=succeeded: MaterializedResult{rows=[], types=[varchar(25)], setSessionProperties={}, resetSessionProperties=[], clearTransactionId=false};
second=succeeded: MaterializedResult{rows=[], types=[varchar(25)], setSessionProperties={}, resetSessionProperties=[], clearTransactionId=false};
third=failed:  Scalar sub-query has returned multiple rows native.default.fail(28:INTEGER, Scalar sub-query has returned multiple rows:VARCHAR) Top-level Expression: and(switch(native.default.eq(true:BOOLEAN, is_distinct), true:BOOLEAN, cast((native.default.fail(28:INTEGER, Scalar sub-query has returned multiple rows:VARCHAR)) as BOOLEAN)), native.default.eq(1:INTEGER, expr))

The two successful empty results hide the cardinality violation instead of raising the scalar-subquery error.

Possible Solution

Preliminary root-cause analysis points at the native execution of the decorrelated scalar-subquery cardinality check. The failing integer case shows the native plan evaluating an is_distinct marker guard around native.default.fail(...). The string cases appear to let the outer comparison evaluate to false and return no rows before the multiple-row guard is surfaced.

The fix should ensure the scalar-subquery cardinality guard is evaluated independently of whether the outer predicate ultimately matches. A useful starting point is the Presto-to-Velox translation and expression evaluation around MarkDistinct/is_distinct and generated native.default.fail(...) predicates for correlated scalar subqueries.

Steps to Reproduce

Run the native test repro that executes the following queries:

SELECT name
FROM nation n
WHERE 'AFRICA' = (
    SELECT 'bleh'
    FROM region
    WHERE regionkey > n.regionkey
);

SELECT name
FROM nation n
WHERE 'AFRICA' = (
    SELECT name
    FROM region
    WHERE regionkey > n.regionkey
);

SELECT name
FROM nation n
WHERE 1 = (
    SELECT 1
    FROM region
    WHERE regionkey > n.regionkey
);

Context

This was uncovered with prestodb/presto#23671. The affected native test is AbstractTestQueriesNative.testCorrelatedNonAggregationScalarSubqueries, where two string-projection multiple-row assertions had been disabled pending investigation.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🆕 Unprioritized

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions