-
Notifications
You must be signed in to change notification settings - Fork 80
Use carryless multiply in calculating Compressor offset vectors #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
And adjust the region and work packet sizes.
|
Some issues come to mind.
|
|
Plotty and here are the geomean results for clmul-enabled relative to clmul-disabled:
|
To clarify, do both builds include the change for computing multiple regions in one |
I tested this PR against upstream; Plotty and a tally of changes to stop-the-world times:
Some benchmarks see large improvements due to the better work balancing, some see small regressions due to worse locality with smaller region sizes. (I only found the mutator time to be consistently worse on tradesoap, which also has the worst STW regression at 5.4% slower.) |


This PR introduces an algorithm for computing the offset vector in the Compressor which uses the carryless multiply instruction, based on the branch-free and bit-parallel algorithm in https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-carry-less-multiply-pclmulqdq/