Skip to content

Conversation

lokyaan
Copy link

@lokyaan lokyaan commented Sep 11, 2025

Problem
score_threshold could be applied on raw FAISS scores (pre-normalization) and/or in multiple layers, leading to inconsistent behavior across distance strategies.

Change

  • Remove raw-score thresholding in similarity_search_with_score_by_vector.
  • Filter once in _similarity_search_with_relevance_scores (and async) after mapping to normalized relevance.
  • Clamp normalized relevance to [0,1].

Why
Predictable threshold semantics across MAX_INNER_PRODUCT / EUCLIDEAN_DISTANCE / COSINE and with/without L2 normalization.

Testing

  • Added a minimal integration test locally (FAISS inner-product case) confirming only items ≥ threshold are returned and all scores are in [0,1].

Notes
Happy to relocate/add tests to this repo’s preferred test dir if maintainers want.

docs: add Google-style docstring for _clamp01

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant