Skip to content

Conversation

@xloya
Copy link
Contributor

@xloya xloya commented Nov 26, 2025

Close #3620.
Currently, Lance does not support the Map data type. Importing Lance from a data source that supports Map data type requires special handling, which incurs significant processing costs, even though this type is very common in other data sources.

This PR aligns with the Map data type in Arrow, implementing the Map logical data type. In actual encoder, it uses an Offsets Array + List<Struct<key, value>> approach, which is similar to List data type .
And in actual decoder, it will decode the struct array and offset infos to restruct to the Arrow MapArray.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions bot added enhancement New feature or request python labels Nov 26, 2025
@xloya
Copy link
Contributor Author

xloya commented Nov 27, 2025

@westonpace @jackye1995 @Xuanwo @eddyxu PTAL when you have time, thanks!

Copy link
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Lance format 2.1 has been stabilized, so we can't add new data type support in its current form. Otherise older versions of Lance wouldn’t be able to read new Lance files.

We should add this feature in format version 2.2 instead.

@xloya
Copy link
Contributor Author

xloya commented Nov 27, 2025

Thank you for working on this!

Lance format 2.1 has been stabilized, so we can't add new data type support in its current form. Otherise older versions of Lance wouldn’t be able to read new Lance files.

We should add this feature in format version 2.2 instead.

@Xuanwo Thanks for your review! Currently, most of the logic for adding the Map data type is located in the Field, Encoder, and Decoder modules. These modules generally don't contain format version information. I'm unsure if it's appropriate to limit support for the Map data type to version 2.2+, since version 2.1+ uses the same logic for reading, writing, and encoding/decoding (different from version 2.0 and below). Do you have any suggestions on this?

@xloya xloya closed this Nov 27, 2025
@xloya xloya reopened this Nov 27, 2025
@Xuanwo
Copy link
Collaborator

Xuanwo commented Nov 27, 2025

@Xuanwo Thanks for your review! Currently, most of the logic for adding the Map data type is located in the Field, Encoder, and Decoder modules. These modules generally don't contain format version information. I'm unsure if it's appropriate to limit support for the Map data type to version 2.2+, since version 2.1+ uses the same logic for reading, writing, and encoding/decoding (different from version 2.0 and below). Do you have any suggestions on this?

Take Lance v1.0.0 as an example. If we add the Map type, users might find that data written in Lance v1.0.0 is not readable by Lance v0.39.0, and that's what we want to avoid.

@xloya xloya changed the title feat: support map data type in lance format version 2.1 feat: support map data type in lance format version 2.2 Nov 28, 2025
location: location!(),
});
}
if self.version < LanceFileVersion::V2_2 {
Copy link
Contributor Author

@xloya xloya Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versions 2.0 and below use a separate encoder / decoder, which inherently does not support Map types. Newer versions of the encoder / decoder disable the Map types use of version 2.1 because it is now stable.

@xloya xloya requested a review from Xuanwo November 28, 2025 07:22
Copy link
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here some nits about this PR. Maybe @westonpace wanna take an other look.

}
DataType::Map(entries_field, keys_sorted) => {
if *keys_sorted {
todo!("Maps are not supported keys_sorted=true yet")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to return an unsupported error instead of panic.

}

/// Create a new strategy from decoder config with file version
pub fn from_decoder_config_with_version(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an with_file_version API instead?

cache,
&FilterExpression::no_filter(),
&DecoderConfig::default(),
LanceFileVersion::V2_2, // Default to V2_2 for testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can introduce other 2.2 issues affecting our testing. I suggest configuring it separately for 2.2-related tests.

projection: ReaderProjection,
filter: FilterExpression,
decoder_config: DecoderConfig,
file_version: LanceFileVersion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ve introduced file_version in too many places, can we avoid it? In fact we just need to avoid users trying to write map data under 2.2 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsupported data type: Map

2 participants