Skip to content

WIP/RFC: shift_remove and friends #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

prutschman
Copy link
Contributor

Per #557, I took a stab at implementing the shift_remove_* family of functions from upstream IndexMap. I'm sure there's better ways to do the things I did; there are some todo's around exactly how to do the fixup after removing from entries. I factored out the re-calculation logic from .retain, but recalculating everything is probably overly conservative.

Also, you may not even want these API functions. No worries if so! But if you're interested, I'm happy to make changes or re-do things.

src/index_map.rs Outdated
} else {
*pos = Some(Pos::new(index, entry.hash));
// Todo: Should this take in a parameter to allow it to only process the moved
// elements?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Technically it's O(n) in both cases, but one is O(len()) while the other is O(len() / 2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (see subsequent push). I don't fully understand the hashing logic, so I may have missed a subtlety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, fix incoming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add another test for the wrong logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test for the wrong logic. However, I'm having a hard time figuring out what the correct logic should be.

I can replace only the affected entries' slots in indices with None, and and restrict the recalculation to only the affected entries. But, that fails too, because we would also need to rewrite "unaffected" entries that would have "preferred" to have a slot value that has since been vacated. (There's no longer a mismatched entry there, so when probed, it shows no value present, and the lookup fails.)

I can revert the "optimization" of after_removal to again re-process everything, and the test then passes. But we lose the putative optimization opportunity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a test, I'm going to try the re-hashing logic from remove_found. It doesn't have the "robin hood" logic in it, but when I tried just leaving that logic intact I got an infinite loop. (Or I can abandon the efficiency quest for now.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the logic from retain back inline; I don't need to solve the problem of arbitrary removals, just a single removal. I switched to having just CoreMap::shift_remove_found, which now mirrors remove_found, plus an extra step to "shift" the appropriate index values after fixing up the removed slot. All IndexMap::shift_* operations are implemented in terms of shift_remove_found now. There's now moderate code duplication between remove_found and shift_remove_found, however.

shift_remove_found is now the core operation. It follows the same logic as remove_found, but adds a step to re-write the shifted index positions.

shift_remove_index is now implemented in IndexMap, by calling shift_remove_found.
@prutschman
Copy link
Contributor Author

The removal logic is still bad. I'm adding additional tests that verify that the map is fully internally consistent: every entry can be found, every index points to an entry consistent with the hash value, and the number of entries and non-empty indices is the same.

Copy link
Contributor Author

@prutschman prutschman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding IndexMap::assert_internally_consistent as a #[cfg(test)]-only method was useful in troubleshooting the shift logic, but it's kind of intrusive. I'm open to other approaches. All the added Debug impls are just there to support assert_eq! instead of assert!, which maybe isn't needed.

Once this is in a place where you might want to go forward with it, I'm prepared to re-submit the PR with a less chaotic history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants