-
Notifications
You must be signed in to change notification settings - Fork 490
refactor: write bitmap index statistics in file instead #5251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: Xuanwo <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5251 +/- ##
==========================================
- Coverage 82.25% 82.24% -0.01%
==========================================
Files 344 344
Lines 144636 145004 +368
Branches 144636 145004 +368
==========================================
+ Hits 118967 119264 +297
- Misses 21742 21806 +64
- Partials 3927 3934 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
wjones127
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach. I think we should figure out whether or not we want index_types() method or if we want something else. Weston is working on something in parallel in #5221
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
| /// Returns the index type for this plugin | ||
| fn index_type(&self) -> IndexType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say someone creates a new custom index plugin, that doesn't fit any of the enum variant we have built into our library. What do they return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, could we have something like:
impl IndexType {
fn try_from_pb_name(pb_name: &str) -> Option<Self> { ... }
}Seems like we should be able to derive from the name, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @westonpace in case you have any thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We should be trying to move away from IndexType enums (because they cannot work with plugins). We should not add this method.
With the introduction of describe_indices every index has two names. The fully qualified type URL (unique, but not friendly, this is the name of the protobuf details message), for example /lance.index.pb.JsonIndexDetails and the short name (friendly, but not unique). The short name is provided by the ScalarIndexPlugin::name method.
The index statistics currently has a index_type. This should be updated to have both index_uri and index_typename. We can leave index_type for legacy / backwards compatibility. We can add a method near async fn index_statistics which maps from index_uri to index_type for the old indexes...
fn legacy_type_name(index_uri: &str) -> String {
match index_uri {
BTreeIndexDetails::name => IndexType::BTree.to_string(),
...
_ => "N/A".to_string() // This is a new index type, we don't populate this field any longer
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, PTAL.
wjones127
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions to modernize io tracking
Co-authored-by: Will Jones <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Close #4620
This PR will write bitmap index statistics in file instead so we don't need to load the entire index file to calculate it.
This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.