Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Remove type widening metadata #4187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johanl-db
Copy link
Collaborator

Description

Type widening records type changes applied to a table in the table schema using the following metadata:

"metadata": {
  "delta.typeChanges": [{
    "toType": "short",
    "fromType": "byte"
  }]

The initial intent was to keep that metadata in schemas returned to users, e.g. via df.schema, as it may provide useful information.
That is a bad idea though:

  • This allows type widening metadata to leak outside of the table, possibly to other tables. While this isn't expected to cause correctness issues - the type widening metadata is mostly informative - it would still be confusing.
  • It is ultimately internal Delta metadata that shouldn't be surfaced to users. Other features, such a column mapping, explicitly remove their related metadata before surfacing schemas to users.

How was this patch tested?

  • Updated existing tests to remove type widening metadata when checking for returned schema.
  • Added test to ensure we don't leak type widening metadata

Does this PR introduce any user-facing changes?

Type widening metadata that was visible in dataframe schemas isn't surfaced anymore:
df.schema:
Before:

"fields": [{
  "name": "a",
  "type": "integer",
  "nullable": true,
  "metadata": {
    "delta.typeChanges": [{
      "toType": "integer",
      "fromType": "short"
    }]
  }
}]

After:

"fields": [{
  "name": "a",
  "type": "integer",
  "nullable": true,
  "metadata": {}
}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant