Skip to content

(native): reduce_agg accepts a non-literal array initial state for grouped aggregation #27710

@pramodsatya

Description

@pramodsatya

Your Environment

  • Presto version used: presto-native-tests branch
  • Velox version used: submodule 7dcf49cee4d8988f14ee274949a0c35d9052d6ea
  • Storage (HDFS/S3/GCS..): local native test data generated under presto-native-tests/target/velox_data/PARQUET
  • Data source and connector used: Hive connector through Presto native tests, PARQUET
  • Deployment (Cloud or On-prem): local Prestissimo debug worker, WORKER_COUNT=1, sidecarEnabled=true
  • Pastebin link to the complete debug logs: N/A; focused repro output and worker log notes are included below.

Expected Behavior

Presto Java rejects reduce_agg when the initial state is not a non-null literal. The grouped native query below should fail with an error matching:

REDUCE_AGG only supports non-NULL literal as the initial value

or the equivalent native validation error:

Initial value in reduce_agg must be constant

Current Behavior

The native grouped aggregation succeeds and uses the per-row array expression as the initial state, producing duplicated state values:

MaterializedResult{rows=[[1, [1, 2, 2, 1, 3, 3, 1, 4, 4]], [2, [2, 20, 20, 2, 30, 30, 2, 40, 40]]], types=[integer, array(integer)], setSessionProperties={}, resetSessionProperties=[], clearTransactionId=false}

This diverges from Presto Java semantics and is the reason the native version of testLambdaInAggregation had the array-initial-state case commented out.

Possible Solution

The root cause appears to be in Velox ReduceAgg: the single-group and intermediate paths validate the initial state with verifyInitialValueArg, but the grouped raw-input path converts rows to states through toStates without first validating that the initial state is constant for the selected rows.

Validate the grouped dictionary-wrapped initial state in ReduceAgg::toStates before building the lambda input, and add a grouped regression test that asserts a non-constant array initial state throws.

Steps to Reproduce

Run the native test repro that executes this SQL:

SELECT
    id,
    reduce_agg(value, array[id, value],
        (a, b) -> a || b,
        (a, b) -> a || b)
FROM (
    VALUES
        (1, 2), (1, 3), (1, 4),
        (2, 20), (2, 30), (2, 40)
) AS t(id, value)
GROUP BY id

Context

This was found with PR prestodb/presto#23671. The affected native test is AbstractTestQueriesNative.testLambdaInAggregation, where the array-initial-state reduce_agg assertion had been disabled pending investigation.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🆕 Unprioritized

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions