Skip to content

Conversation

@ThatikondV
Copy link

Description:

This PR introduces ARM SVE (Scalable Vector Extension) optimization for L2 metric functions in faiss/utils/distances_simd.cpp.
The new implementation extends the existing NEON/SVE SIMD logic with SVE vectorized kernels for distance computations, improving both indexing and search performance.

Benchmarking Setup:

To evaluate the performance impact of the new SVE-enabled L2 distance computations, a dedicated benchmarking script was developed. This script systematically compares the earlier FAISS version and the latest SVE-optimized version using the SIFT1M dataset with 128-dimensional vectors. The benchmarks cover all major index types like FLAT, IVF_FLAT, IVF_PQ, IVF_SQ8, and HNSW and explore multiple configurations by varying parameters such as nlist, m, and efSearch.

Results:

Index Type FAISS-OLD Index (s) FAISS-NEW Index (s) % Improvement FAISS-OLD Search (ms) FAISS-NEW Search (ms) % Improvement
Flat 0.22 0.22 0% 1.172 1.077 8%
IVF1024-Flat 1.82 1.13 38% 0.039 0.033 14%
IVF4096-Flat 5.15 3.17 38% 0.024 0.019 19%
IVF16384-Flat 18.28 10.77 41% 0.035 0.024 29%
IVF4096-PQ32x8 5.29 3.84 27% 0.04 0.018 55%
IVF4096-PQ64x8 5.81 3.67 37% 0.06 0.037 64%
IVF4096-PQ128x8 6.74 4.44 34% 0.092 0.032 65%
HNSW-M16-ef64 12.37 11.47 7% 0.0048 0.0044 8%
HNSW-M32-ef128 27.5 25.92 6% 0.0117 0.011 6%
HNSW-M64-ef256 56.64 53.22 6% 0.027 0.025 7%
IVF-SQ8 4.53 2.81 38% 0.039 0.038 2%

The above results clearly demonstrate that enabling SVE vectorization yields consistent performance gains across all tested index types, where both indexing and search times improved significantly.

These improvements validate the effectiveness of the SVE implementation in accelerating L2 distance computations on modern ARM platforms.

@meta-cla
Copy link

meta-cla bot commented Nov 12, 2025

Hi @ThatikondV!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@ThatikondV ThatikondV closed this Nov 12, 2025
@ThatikondV ThatikondV deleted the feature-l2-sve branch November 12, 2025 05:59
@ThatikondV ThatikondV restored the feature-l2-sve branch November 13, 2025 01:45
@ThatikondV ThatikondV reopened this Nov 13, 2025
@meta-cla
Copy link

meta-cla bot commented Nov 13, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed label Nov 13, 2025
@ThatikondV
Copy link
Author

Hi @alexanderguzhva,

I submitted this PR which enables ARM SVE optimizations in distances_simd.cpp for the L2 metric.
All tests passed locally + benchmarks show strong improvements.

Could you please take a look when you have a moment? Thanks!

@mnorris11
Copy link

Hi @ThatikondV, we are refactoring the SIMD code this half. Subhadeep will take a look after it is complete.

Copy link
Contributor

@alexanderguzhva alexanderguzhva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

};

struct ElementOpL2 {
static svfloat32_t op(svbool_t pg, svfloat32_t x, svfloat32_t y) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add inline, so static inline

size_t current_min_idx = 0;
svfloat32_t current_min_v = svdup_n_f32(HUGE_VALF);

float tmp_buf[64];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, SVE supports up to 2048 bit registers, so this translates to 64 elements, right.
I would add a comment, why it is exactly 64 here, for those who are not familiar with SVE

svst1_f32(pg, tmp_buf, res);

size_t cnt = (size_t) svcntw();
for (size_t lane = 0; lane < cnt && (k + lane) < ny; ++lane) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment that this loop can be vectorized as well, but won't do it, because it's not worth it

@alexanderguzhva
Copy link
Contributor

@ThatikondV done. Thanks!

@ThatikondV
Copy link
Author

@ThatikondV done. Thanks!

@alexanderguzhva,

Thank you for reviewing the PR and for the helpful suggestions. I really appreciate your time on this!

@ThatikondV
Copy link
Author

Hi @mnorris11, hope the SIMD refactor is going well. Just following up on this to see if there’s any update on when the PR review might continue. Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants