Skip to content

Conversation

@steFaiz
Copy link
Collaborator

@steFaiz steFaiz commented Nov 21, 2025

This PR supports writing some metadata into datafile schema. See #5254

@github-actions github-actions bot added enhancement New feature or request java labels Nov 21, 2025
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@steFaiz steFaiz changed the title feat(java): support writing schema metadata through java LanceFileWriter API. feat(java): support writing schema metadata through java LanceFileWriter API Nov 21, 2025
@steFaiz
Copy link
Collaborator Author

steFaiz commented Nov 21, 2025

@majin1102 @westonpace This PR is ready for review, PTAL if you have some time!

* @param metadata metadata
* @throws IOException IOException
*/
public void writeSchemaMetadata(Map<String, String> metadata) throws IOException {
Copy link
Contributor

@majin1102 majin1102 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For field_ids, I wonder if we can just use:

  public void write(VectorSchemaRoot batch) throws IOException {
    try (ArrowArray ffiArrowArray = ArrowArray.allocateNew(allocator);
        ArrowSchema ffiArrowSchema = ArrowSchema.allocateNew(allocator)) {
      Data.exportVectorSchemaRoot(
          allocator, batch, dictionaryProvider, ffiArrowArray, ffiArrowSchema);
      writeNative(ffiArrowArray.memoryAddress(), ffiArrowSchema.memoryAddress());
    }
  }

to pass the metadata map? If we already know the metadata, I think we could eliminate this unnecessary double write. What do you think?

But yeah, this interface might be useful for adding comments and some other metadata after data was written. Could you elaberate more on the usage of this in your mind?

Copy link
Collaborator Author

@steFaiz steFaiz Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majin1102 Thanks for you advise! I think the key point is that computing engines and lakehouse formats will call write_data multiple times. They might only want to write some metadata e.g. fieldId on openning or closing file writers. I think keeping this method separate is alo feasible. Otherwise user might have to call write(null, metadata) if they want to set some metadata only, which is a little bit weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add some comments to tell the parameter metadata would override the existed one.

unsafe { env.get_rust_field::<_, _, BlockingFileWriter>(writer, NATIVE_WRITER) }?;
let mut writer = writer_guard.inner.lock().unwrap();
metadata.into_iter().for_each(|(k, v)| {
writer.add_schema_metadata(k, v);
Copy link
Contributor

@majin1102 majin1102 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it might be better to align the interface with rust api 'add_schema_metadata' cause the underlying action didn't really write anything. This is different from writing data because we only flush the final metadata.

@steFaiz
Copy link
Collaborator Author

steFaiz commented Nov 24, 2025

@majin1102 Thanks for your constructive advise. I've modified the code.

Copy link
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.
Thanks for this contribution!

@majin1102
Copy link
Contributor

Hello,@steFaiz
I guess you need to resolve the conflicts before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants