-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Enable SVE support in distances_simd.cpp for L2 metric #4674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @ThatikondV! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Hi @alexanderguzhva, I submitted this PR which enables ARM SVE optimizations in distances_simd.cpp for the L2 metric. Could you please take a look when you have a moment? Thanks! |
|
Hi @ThatikondV, we are refactoring the SIMD code this half. Subhadeep will take a look after it is complete. |
alexanderguzhva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| }; | ||
|
|
||
| struct ElementOpL2 { | ||
| static svfloat32_t op(svbool_t pg, svfloat32_t x, svfloat32_t y) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add inline, so static inline
| size_t current_min_idx = 0; | ||
| svfloat32_t current_min_v = svdup_n_f32(HUGE_VALF); | ||
|
|
||
| float tmp_buf[64]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, SVE supports up to 2048 bit registers, so this translates to 64 elements, right.
I would add a comment, why it is exactly 64 here, for those who are not familiar with SVE
| svst1_f32(pg, tmp_buf, res); | ||
|
|
||
| size_t cnt = (size_t) svcntw(); | ||
| for (size_t lane = 0; lane < cnt && (k + lane) < ny; ++lane) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment that this loop can be vectorized as well, but won't do it, because it's not worth it
|
@ThatikondV done. Thanks! |
Thank you for reviewing the PR and for the helpful suggestions. I really appreciate your time on this! |
|
Hi @mnorris11, hope the SIMD refactor is going well. Just following up on this to see if there’s any update on when the PR review might continue. Thanks for your time! |
Description:
This PR introduces ARM SVE (Scalable Vector Extension) optimization for L2 metric functions in
faiss/utils/distances_simd.cpp.The new implementation extends the existing NEON/SVE SIMD logic with SVE vectorized kernels for distance computations, improving both indexing and search performance.
Benchmarking Setup:
To evaluate the performance impact of the new SVE-enabled L2 distance computations, a dedicated benchmarking script was developed. This script systematically compares the earlier FAISS version and the latest SVE-optimized version using the SIFT1M dataset with 128-dimensional vectors. The benchmarks cover all major index types like FLAT, IVF_FLAT, IVF_PQ, IVF_SQ8, and HNSW and explore multiple configurations by varying parameters such as
nlist,m, andefSearch.Results:
The above results clearly demonstrate that enabling SVE vectorization yields consistent performance gains across all tested index types, where both indexing and search times improved significantly.
These improvements validate the effectiveness of the SVE implementation in accelerating L2 distance computations on modern ARM platforms.