Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up searches by removing repeated memsets coming from vec.resize() #2327

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion bitpacker/src/bitpacker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ impl BitUnpacker {

// Decodes the range of bitpacked `u32` values with idx
// in [start_idx, start_idx + output.len()).
// It is guaranteed to completely fill `output` and not read from it, so passing a vector with
// un-initialized values is safe.
//
// #Panics
//
Expand Down Expand Up @@ -237,7 +239,19 @@ impl BitUnpacker {
data: &[u8],
positions: &mut Vec<u32>,
) {
positions.resize(id_range.len(), 0u32);
// We use the code below instead of positions.resize(id_range.len(), 0u32) for performance
// reasons: on some queries, the CPU cost of memsetting the array and of using a bigger
// vector than necessary is noticeable (~5%).
// In particular, searches are a few percent faster when using reserve_exact() as below
// instead of reserve().
// The un-initialized values are safe as get_batch_u32s() completely fills `positions`
// and does not read from it.
positions.clear();
positions.reserve_exact(id_range.len());
#[allow(clippy::uninit_vec)]
unsafe {
positions.set_len(id_range.len());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this is undefined behaviour due to the validity requirements on integers which newly allocated memory does not fulfil.

If you want to apply this optimization, get_batch_u32s needs to work with the return value of Vec::spare_capacity_mut to write into the MaybeUninit<u32> values and only call set_len after all of these values have been initialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for pointing this out, I see that in rust-lang/unsafe-code-guidelines#439 it was decided that the mere existence of an uninitialized integer is undefined behavior. There is also an interesting discussion on that here where it was controversially concluded that it was UB.

The trick is the same as in quickwit-oss/quickwit#4712, so we should probably decide whether we do neither or both. @fulmicoton

For that quickwit PR, I see that read_buf (see rfc, function doc) could be used, but it's not stable unfortunately.

For this PR, note that this vector ends up being filled by AVX2 instructions, so what you propose does not seem possible: we would either to take a performance hit by storing the output of the AVX2 computation to a temporary variable and write() it into the uninitialized vector, or use MaybeUninit.as_mut_ptr which would lead us back to UB. Or do you have an efficient suggestion to make it work with the AVX instructions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we should probably decide whether we do neither or both. @fulmicoton

I'd say we should do it for one and not for the other.
For the Read side, the one here is simply not needed. We also own the bitpacking crate. We can just add a method that takes a MaybeUninit.

For the Read trait, this is well known problem that has a workaround but it is not stabilized yet.
The problem is that if a Read implementation were to read the buffer we would be in big troubles.

We put that in a place where the Read impl is well defined so I think we are ok.

}
self.get_batch_u32s(id_range.start, data, positions);
crate::filter_vec::filter_vec_in_place(value_range, id_range.start, positions)
}
Expand Down
Loading