-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cosine for faiss engine #2242
Comments
we added an ingest pipeline to normalize the data for faiss engine who want to Cosine similarity |
I prefer option 2. It is more simple as we keep everything as is and only pass normalized data to faiss engine. Could you tell why we need |
For option 2, for re-scoring (exact search), because the vectors are not normalized, we would need to compute norms, as we do now. |
@luyuncheng interesting. What are your thoughts on having this support in k-NN plugin VS Ingest pipeline approach? |
For re-scoring we can just use cosine distance calculation instead of (normalization + innerproduct)? |
cosine distance calculation is implemented as normalization + innerproduct I believe |
@vamshin @jmazanec15 @heemin32 as @jmazanec15 says
cosine distance calculation is normalization + innerproduct also says in faiss#wiki before i read this issues, and i do need cosine distance in faiss. firstly i introduced so in the end, i introduced a new pipeline doing normalize, PROS:
CONS:
|
Description
Cosine similarity is one of the more popular space types. faiss does not support it directly. Instead, they prefer to have data be normalized and then use the inner product (which is equivalent to cosine --> <u,v>/||u||||v|| = cos(theta) (https://github.com/facebookresearch/faiss/blob/2c961cc308ade8a85b3aa10a550728ce3387f625/README.md?plain=1#L11). We should figure out how to add cosine for faiss now that it is default (#2163)
We have a couple different options:
I prefer option 3 mainly because it will allow us to use use knn vector values as synthetic source (see #1571), but also let us be efficient on search.
We would need to investigate how best to do this. One simple way to do it would be to add one extra dimension and store the normalized value there.
The text was updated successfully, but these errors were encountered: