Pivot to foldhash for hash implementation in IndexMap#15920
Pivot to foldhash for hash implementation in IndexMap#15920mtreinish wants to merge 4 commits intoQiskit:mainfrom
Conversation
This commit switches our use of ahash as the hashing algorithm for IndexMap to use foldhash instead. This mirrors the change that was done in hashbrown for the default hashing algorithm in its hashmap. Foldhash promises faster hashing than ahash with similar quality tradeoffs. Since IndexSet/IndexMap defaults to using the stdlib SipHash algorithm which is not good for performance we need to set a custom hash algorithm to maintain good performance. Since we use an IndexSet for the interners saving a small amount of time per lookup here is especially important. The only place that ahash is retained is for DAGNode in Python. The usage was specific to how ahash worked and switching it to an analogous foldhash interface caused test failures. As the performance is not critical for this code path as it's only used by the Python API for accessing the DAG and it's already not a performant path saving nanoseconds on hashing doesn't matter. There is probably an alternative API in foldhash we could use, but it didn't seem critical to figure out.
|
One or more of the following people are relevant to this code:
|
|
Have you got benchmarks showing the improvement? Should we be concerned at all about the lack of DoS resistance in |
|
I'm running benchmarks now. Initially testing is showing a small speedup on some benchmarks but also corresponding small regressions on others on my amd linux system. Although I'm a bit worried about system noise on these numbers so I will re-run it tomorrow on an idle system. I also want to test on different platforms too because I think there might be more of a benefit on some platforms (thinking arm mac mostly). I'm not worried about potential decreased HashDoS resistance since we're only using these hashmaps internally and not in a way where hash collision attacks would do anything except cause a small slowdown in lookups internally, and not to a degree where it's a potential DoS vector beyond already having a massive circuit. They're also mostly not used in a place with arbitrary user data, the only places where there is even a potential vector from this PR is gate names in some transpiler passes internal caching the rest are all just bit indices. None of the usage here is for cryptographic hashing or verification, it's just for internal data structures caching primarily. I'm also not really worried because we're already using foldhash internally everywhere we use |
|
The full asv run finished before I expected. It's looking like a better improvement on an idle system: |
Summary
This commit switches our use of ahash as the hashing algorithm for IndexMap to use foldhash instead. This mirrors the change that was done in hashbrown for the default hashing algorithm in its hashmap. Foldhash promises faster hashing than ahash with similar quality tradeoffs. Since IndexSet/IndexMap defaults to using the stdlib SipHash algorithm which is not good for performance we need to set a custom hash algorithm to maintain good performance. Since we use an IndexSet for the interners saving a small amount of time per lookup here is especially important.
The only place that ahash is retained is for DAGNode in Python. The usage was specific to how ahash worked and switching it to an analogous foldhash interface caused test failures. As the performance is not critical for this code path as it's only used by the Python API for accessing the DAG and it's already not a performant path saving nanoseconds on hashing doesn't matter. There is probably an alternative API in foldhash we could use, but it didn't seem critical to figure out.
Details and comments