-
Notifications
You must be signed in to change notification settings - Fork 489
feat: introduce blob arrow extension type #5239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
…nto blob-v2-settings
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| use lance_core::{utils::address::RowAddress, Error, Result}; | ||
| use lance_io::traits::Reader; | ||
|
|
||
| pub const BLOB_VERSION_CONFIG_KEY: &str = "lance.blob.version"; | ||
|
|
||
| pub fn blob_version_from_config(config: &HashMap<String, String>) -> BlobVersion { | ||
| config | ||
| .get(BLOB_VERSION_CONFIG_KEY) | ||
| .and_then(|value| BlobVersion::from_config_value(value)) | ||
| .unwrap_or(BlobVersion::V1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respect legacy blob version metadata when config is absent
Blob version detection now relies exclusively on the dataset config (blob_version_from_config) and BlobVersion::from_config_value, while the previous code read the lance-encoding:blob-version field metadata. Existing tables written with v2 blobs before this change store the version only in field metadata and have no config entry. After this commit such tables will always default to BlobVersion::V1, so blob descriptors are parsed with the wrong layout and reads will fail or return incorrect data. Consider falling back to the per-field metadata key when the config value is missing so older manifests remain readable.
Useful? React with 👍 / 👎.
| self | ||
| } | ||
|
|
||
| pub fn with_blob_version(mut self, blob_version: BlobVersion) -> Self { | ||
| self.blob_version = blob_version; | ||
| self | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagate dataset blob version to all projections
The new Projection field blob_version defaults to BlobVersion::V1 and is only set when callers explicitly invoke with_blob_version. Many existing call sites still construct projections directly via Projection::full(...) or Projection::empty(...) (e.g. the scanner and DataFusion code) and will continue to operate with the default. On tables using the new blob v2 layout those projections will still expand blobs using the v1 descriptor schema, leading to mismatched field shapes and runtime errors. Either derive the blob version from the Projectable passed to Projection::full/empty or update all constructors to set the dataset’s blob version by default.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5239 +/- ##
==========================================
+ Coverage 82.21% 82.24% +0.03%
==========================================
Files 344 344
Lines 144879 144937 +58
Branches 144879 144937 +58
==========================================
+ Hits 119108 119202 +94
+ Misses 21836 21798 -38
- Partials 3935 3937 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR is based on #5210
This PR intends to add a blob arrow extension type (aka, logical type) in lance.
This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.