Skip to content

Support array flatten() on List(LargeList(_)) types #17670

@Jefffrey

Description

@Jefffrey

Is your feature request related to a problem or challenge?

Currently cannot flatten a List (or FixedSizeList) that contains a LargeList as it's inner element.

We should be able to support this (at least for queries we expect to succeed). For example, expect something like this to succeed in array.slt:

query ???
select flatten(arrow_cast(make_array([1], [2, 3], [null], make_array(4, null, 5)), 'FixedSizeList(4, LargeList(Int64))')),
       flatten(arrow_cast(make_array([[1.1], [2.2]], [[3.3], [4.4]]), 'List(LargeList(FixedSizeList(1, Float64)))'));
----
[1, 2, 1, 3, 2] [1, 2, 3, NULL, 4, NULL, 5] [[1.1], [2.2], [3.3], [4.4]]

Currently it fails with:

1. query failed: DataFusion error: Execution error: flatten does not support type 'List(Field { name: "item", data_type: LargeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] select flatten(arrow_cast(make_array([1], [2, 3], [null], make_array(4, null, 5)), 'FixedSizeList(4, LargeList(Int64))')),
       flatten(arrow_cast(make_array([[1.1], [2.2]], [[3.3], [4.4]]), 'List(LargeList(FixedSizeList(1, Float64)))'));
at /Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/array.slt:7681

Describe the solution you'd like

Need to consider return type, see how LargeList is missing here for the inner field match:

List(field) | FixedSizeList(field, _) => match field.data_type() {
List(field) | FixedSizeList(field, _) => List(Arc::clone(field)),
_ => arg_types[0].clone(),
},

This is where current error is happening:

LargeList(_) => {
exec_err!("flatten does not support type '{:?}'", array.data_type())?
}

  • We just throw error without trying to see if it's possible

Perhaps we can try some sort of "best effort" where we try to downcast the LargeList child to a List and if that succeeds (i.e. all offsets of LargeList can fit inside a List) we can flatten it to the parent List, otherwise error; alternatively just upcast the parent List to a LargeList, though this might be tricky considering return_type() wouldn't know this until execution and I don't think we want to blindly upcast all parent Lists to LargeList.

Open to any other suggestions.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions