-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Runtime detection, take 2 #86
base: main
Are you sure you want to change the base?
Conversation
This isn't conditionally doing the improvements for the NEON backend via the |
Thanks a lot, i need time to review this |
That commit should get rid of a bunch of compile issues related to adding generic types to structs that don't take them. |
I'm still trying to figure out what the best way forward is to make |
Thanks a lot, I will review the PR this week. I will benchmark the performance at first~ |
9e6e4a6
to
7a3a864
Compare
@liuq19 Would you be up to benching the speed of an implementation for NEON that runtime dispatches the bitmask creation? We could technically cache the result whether NEON (or any other feature, really) is supported in globals. That way the performance loss shouldn't be too bad. Because hacking in runtime dispatch for bitmask creation otherwise is really tricky. |
Hacked in the version that dispatches on each bitmask call. Maybe the performance hit is too severe to justify.. |
Okay, I'm not sure why this is broken? It's on ARM64, right? |
Now I just need to find a way to properly express this in trait form, preferrably very generic. |
I benched in x86 and maybe the simd is not work in
|
That's weird. On my local machine, the change is somewhat in the ballpark of ~3-4%, which is acceptable (I'd need to profile it to get a closer idea of what's going on; where the performance is lost. Maybe some optimization opportunities that are too opaque for the compiler with all the generics)
|
could you remove or comment the config in |
Already wasn't active due to my global Cargo config. But for the benches above I set I added debug statements and the runtime correctly detects that my CPU supports AVX2, with and without |
So it is much slower without |
Never mind, I get what you mean. Let me look into it. |
maybe we can try to compare more benchmarks |
}; | ||
|
||
use super::{Mask, Simd}; | ||
use crate::impl_lanes; | ||
|
||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need the optimization. The generated asm from std::arch::is_x86_feature_detected
has optimizations.
https://rust.godbolt.org/z/sdqefTPxW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! That's nice to know. I'll be reverting that then
@@ -81,7 +81,8 @@ name = "value_operator" | |||
harness = false | |||
|
|||
[features] | |||
default = [] | |||
default = ["runtime-detection"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the runtime detection always has fewer overheads, I think it is better not to enable the feature in the default
any updates? |
Sorry, I've been busy the last two weeks, but I can hopefully do some work today at the airport |
What type of PR is this?
feat: A new feature
Check the PR title.
(Optional) More detailed description for this PR(en: English/zh: Chinese).
en:
This PR adds runtime detection of SIMD features but, unlike in #55, not on the level of SIMD instructions, but instead implements enum dispatch over multiple inner parsers that each either use AVX2, SSE2, or NEON (or the scalar fallback).
(Optional) Which issue(s) this PR fixes:
Closes #14
(optional) The PR that updates user documentation: