Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neater hashing interface #4524

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Neater hashing interface #4524

wants to merge 9 commits into from

Conversation

widlarizer
Copy link
Collaborator

@widlarizer widlarizer commented Aug 6, 2024

We want to be able to plug-and-play hash functions to improve hashlib structure collision rates and hashing speed. Currently, in some ways, hashes are incorrectly handled: for deep structure hashing, each substructure constructs a hash from scratch, and these hashes are then combined with addition, XOR, sometimes xorshifted to make this less iffy, but overall, this is risky, as it may degrade various hash functions to varying degrees. It seems to me that the correct combination of hashes is to have a hash state that is mutated with each datum hashed in sequence. That's what this PR does.

  • unsigned int hash() functions are replaced with hash_state_t hash_acc(hash_state_t h)
  • mkhash_add is deprecated, since it relies on having a second argument value adjacency preserving alternative hash function. Such a requirement seems to be at odds with desirable hash function qualities
  • hash_state_t mkhash_init() is now a function instead of a const
  • hash_t mkhash_finish(hash_state_t h) added
  • run_hash wrapper allows for "just give me the hash of this one thing" use cases, replacing unsigned int hash() methods. It has the advantage for covering any type that has hash_ops implemented. It's somewhat like mkhash was, but mkhash collides with the core hash function, and I want to keep that split off.

As it is right now, the PR isn't at odds with global state for something like xxhash. hash_state_t may become void in that case. There's no provisions for SSE/multi-lane hashing at the moment.

  • check performance impact since this PR isn't NFC as it changes how structures are hashed
  • clean up things left over from experiments with inheriting from Hashable
  • fix pyosys

hash_state_t hash_acc(hash_state_t h) const {
hash_state_t st = h;
st = mkhash(entries.size(), st);
for (auto &it : entries) {
Copy link
Member

@povik povik Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is order sensitive now, so two dict which are equal will fail to have identical hashes.

Copy link
Collaborator Author

@widlarizer widlarizer Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • put the order-agnostic xor back

I don't expect dicts of dicts or dict comparison to be common and therefore important for performance so that should be fine

@widlarizer
Copy link
Collaborator Author

Cool, I got hit in the face with a downright gcc bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants