-
Notifications
You must be signed in to change notification settings - Fork 208
WIP/RFC: shift_remove and friends #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/index_map.rs
Outdated
} else { | ||
*pos = Some(Pos::new(index, entry.hash)); | ||
// Todo: Should this take in a parameter to allow it to only process the moved | ||
// elements? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Technically it's O(n) in both cases, but one is O(len()
) while the other is O(len() / 2
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (see subsequent push). I don't fully understand the hashing logic, so I may have missed a subtlety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, fix incoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add another test for the wrong logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for the wrong logic. However, I'm having a hard time figuring out what the correct logic should be.
I can replace only the affected entries' slots in indices
with None, and and restrict the recalculation to only the affected entries. But, that fails too, because we would also need to rewrite "unaffected" entries that would have "preferred" to have a slot value that has since been vacated. (There's no longer a mismatched entry there, so when probed, it shows no value present, and the lookup fails.)
I can revert the "optimization" of after_removal
to again re-process everything, and the test then passes. But we lose the putative optimization opportunity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a test, I'm going to try the re-hashing logic from remove_found
. It doesn't have the "robin hood" logic in it, but when I tried just leaving that logic intact I got an infinite loop. (Or I can abandon the efficiency quest for now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the logic from retain
back inline; I don't need to solve the problem of arbitrary removals, just a single removal. I switched to having just CoreMap::shift_remove_found, which now mirrors remove_found, plus an extra step to "shift" the appropriate index values after fixing up the removed slot. All IndexMap::shift_* operations are implemented in terms of shift_remove_found now. There's now moderate code duplication between remove_found
and shift_remove_found
, however.
Co-authored-by: Markus Reiter <[email protected]>
shift_remove_found is now the core operation. It follows the same logic as remove_found, but adds a step to re-write the shifted index positions. shift_remove_index is now implemented in IndexMap, by calling shift_remove_found.
The removal logic is still bad. I'm adding additional tests that verify that the map is fully internally consistent: every entry can be found, every index points to an entry consistent with the hash value, and the number of entries and non-empty indices is the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding IndexMap::assert_internally_consistent
as a #[cfg(test)]
-only method was useful in troubleshooting the shift logic, but it's kind of intrusive. I'm open to other approaches. All the added Debug impls are just there to support assert_eq! instead of assert!, which maybe isn't needed.
Once this is in a place where you might want to go forward with it, I'm prepared to re-submit the PR with a less chaotic history.
Per #557, I took a stab at implementing the
shift_remove_*
family of functions from upstreamIndexMap
. I'm sure there's better ways to do the things I did; there are some todo's around exactly how to do the fixup after removing fromentries
. I factored out the re-calculation logic from.retain
, but recalculating everything is probably overly conservative.Also, you may not even want these API functions. No worries if so! But if you're interested, I'm happy to make changes or re-do things.