Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg and Delta target: sync schema field comments #575

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

emilie-wang
Copy link

Important Read

Aim to address #525

What is the purpose of the pull request

It is expected to capture the comments and sync the comments of field when syncing schema. The PR makes the comments of schema fields are captured and synced correctly.

Brief change log

  • For Delta target, extract the comment from internal schema and add to Spark schema representative.
  • For Iceberg target, make sure any difference of comments is captured during syncSchema() and the schema is updated with new comments.

Verify this pull request

This change added tests and can be verified as follows:

  • Added unit tests to test the comments in Iceberg schema sync test class
  • Added comments in Delta schema extractor test class

It fixed the issue that the comments are not synced correctly when the target table is Iceberg or Delta.
For Delta target, extract the comment from internal schema and add to Spark schema representative.
For Iceberg target, make sure any difference of comments is captured during syncSchema() and the schema is updated with new comments.
if (schema.getComment() != null) {
metadataBuilder.putString(COMMENT, schema.getComment());
}
return metadataBuilder.build();
}

public InternalSchema toInternalSchema(StructType structType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the Map keys/values or the array values have comments on the field? if so, it does not look like they are being extracted read into the InternalSchema representation currently

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, after checking the APIs, there is no comment can be added to array value or map key/values level:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/StructType.html
The name of the test is confusing, actually we tested the nested struct instead of array field: https://github.com/apache/incubator-xtable/pull/575/files#diff-9328d59be408d684281f3d5b2b7a9ca9d9b030e8e7075cd741796eb85eb2642cR328

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, everything else looks good to me

@the-other-tim-brown the-other-tim-brown merged commit ced9223 into apache:main Nov 4, 2024
2 checks passed
@the-other-tim-brown
Copy link
Contributor

Thank you for fixing this @emilie-wang!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants