Skip to content

Conversation

@Ray0907
Copy link
Contributor

@Ray0907 Ray0907 commented Nov 2, 2025

Replace std::sort with std::nth_element for median calculation in
IndexLSH training.

  Replace std::sort with std::nth_element for median calculation in
  IndexLSH training.
@meta-cla meta-cla bot added the CLA Signed label Nov 2, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 4, 2025

@limqiying has imported this pull request. If you are a Meta employee, you can view this in D86234669.

Comment on lines 89 to 98
// Use nth_element (O(n)) instead of sort (O(n log n)) for median
if (n % 2 == 1) {
std::nth_element(xi, xi + n / 2, xi + n);
thresholds[i] = xi[n / 2];
else
thresholds[i] = (xi[n / 2 - 1] + xi[n / 2]) / 2;
} else {
std::nth_element(xi, xi + n / 2, xi + n);
float median_high = xi[n / 2];
std::nth_element(xi, xi + n / 2 - 1, xi + n);
thresholds[i] = (xi[n / 2 - 1] + median_high) / 2;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can do like this?

// Use nth_element (O(n)) instead of sort (O(n log n)) for median
std::nth_element(xi, xi + n / 2, xi + n);
float median = xi[n / 2];
if (n % 2 == 0) {
    std::nth_element(xi, xi + n / 2 - 1, xi + n);
    median = (median + xi[n / 2 - 1]) / 2;
}
thresholds[i] = median;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Your version is cleaner!

@mnorris11
Copy link

Hi @Ray0907 , thanks for your contribution. Ditto the other perf changes: did you try benchmarking it? Do you observe speedup? If yes, can you paste the script you used here?

@mdouze
Copy link
Contributor

mdouze commented Nov 12, 2025

I agree that it's the proper way of computing the median but the performance improvement could be minimal.

std::nth_element(xi, xi + n / 2, xi + n);
float median = xi[n / 2];
if (n % 2 == 0) {
std::nth_element(xi, xi + n / 2 - 1, xi + n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, unless running 2x std::nth_element() is too costly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants