Your Environment
- Presto version used:
presto-native-tests branch
- Velox version used: submodule
7dcf49cee4d8988f14ee274949a0c35d9052d6ea
- Storage (HDFS/S3/GCS..): local native test data generated under
presto-native-tests/target/velox_data/PARQUET
- Data source and connector used: Hive connector through Presto native tests,
PARQUET
- Deployment (Cloud or On-prem): local Prestissimo debug worker,
WORKER_COUNT=1, sidecarEnabled=true
- Pastebin link to the complete debug logs: N/A; focused repro output and worker log notes are included below.
Expected Behavior
Presto Java rejects reduce_agg when the initial state is not a non-null literal. The grouped native query below should fail with an error matching:
REDUCE_AGG only supports non-NULL literal as the initial value
or the equivalent native validation error:
Initial value in reduce_agg must be constant
Current Behavior
The native grouped aggregation succeeds and uses the per-row array expression as the initial state, producing duplicated state values:
MaterializedResult{rows=[[1, [1, 2, 2, 1, 3, 3, 1, 4, 4]], [2, [2, 20, 20, 2, 30, 30, 2, 40, 40]]], types=[integer, array(integer)], setSessionProperties={}, resetSessionProperties=[], clearTransactionId=false}
This diverges from Presto Java semantics and is the reason the native version of testLambdaInAggregation had the array-initial-state case commented out.
Possible Solution
The root cause appears to be in Velox ReduceAgg: the single-group and intermediate paths validate the initial state with verifyInitialValueArg, but the grouped raw-input path converts rows to states through toStates without first validating that the initial state is constant for the selected rows.
Validate the grouped dictionary-wrapped initial state in ReduceAgg::toStates before building the lambda input, and add a grouped regression test that asserts a non-constant array initial state throws.
Steps to Reproduce
Run the native test repro that executes this SQL:
SELECT
id,
reduce_agg(value, array[id, value],
(a, b) -> a || b,
(a, b) -> a || b)
FROM (
VALUES
(1, 2), (1, 3), (1, 4),
(2, 20), (2, 30), (2, 40)
) AS t(id, value)
GROUP BY id
Context
This was found with PR prestodb/presto#23671. The affected native test is AbstractTestQueriesNative.testLambdaInAggregation, where the array-initial-state reduce_agg assertion had been disabled pending investigation.
Your Environment
presto-native-testsbranch7dcf49cee4d8988f14ee274949a0c35d9052d6eapresto-native-tests/target/velox_data/PARQUETPARQUETWORKER_COUNT=1,sidecarEnabled=trueExpected Behavior
Presto Java rejects
reduce_aggwhen the initial state is not a non-null literal. The grouped native query below should fail with an error matching:or the equivalent native validation error:
Current Behavior
The native grouped aggregation succeeds and uses the per-row array expression as the initial state, producing duplicated state values:
This diverges from Presto Java semantics and is the reason the native version of
testLambdaInAggregationhad the array-initial-state case commented out.Possible Solution
The root cause appears to be in Velox
ReduceAgg: the single-group and intermediate paths validate the initial state withverifyInitialValueArg, but the grouped raw-input path converts rows to states throughtoStateswithout first validating that the initial state is constant for the selected rows.Validate the grouped dictionary-wrapped initial state in
ReduceAgg::toStatesbefore building the lambda input, and add a grouped regression test that asserts a non-constant array initial state throws.Steps to Reproduce
Run the native test repro that executes this SQL:
Context
This was found with PR
prestodb/presto#23671. The affected native test isAbstractTestQueriesNative.testLambdaInAggregation, where the array-initial-statereduce_aggassertion had been disabled pending investigation.