-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is "Instant Similarity Query" not supported? #35
Comments
Hi, @bwlim , We think two aspects need to be considered on this issue, one is performance, the other is API design. When instant query is needed, the input of vector, comparison and the output of recommendation should be combined in one API call. Obviously, the cost of the call should be scale at O(n) level. According to our experience, assuming a 1-k dimensional base and cosinesq score function, the time cost of the call would be:
And one possible way to improve is to change cosinesq score to eclidean or manhattan, but the performance data need to be re-tested. On the aspect of API design, we think two API can be added:
Command qadd is designed for the scenario that we add a vector, register it and then get the recommedation result in only one api call. Before qadd call, you need vmk to create a vector set to hold the inputed vectors. While command qrec dose not store the inputed vectors, you only test an inputed vector against some predefined vector set. I think qadd or qrec should meet with your requirements. Advice are welcomed! |
Yes, qadd, qrec can be one solution using current implementation of simbase. When new vector is registered to vector DB, new vector instance is indexed to R-tree or something index structure (maybe in O(logN)), and similarity vector querying is performed by traversing R-tree or something index structure in O(log(N)) For database, query performance of O(N) is not good situation... How do you think about this? :D |
yes, @bwlim , R-tree is one possible way to speed up simbase, but also introduce much complexity of the code. We should balance gain and loss. For our own use cases:
For personalised recommendations, currently I prefer take another approach of locally sensitive hashing. Without real experiments on recommendation quality, performance and the development cost, I am not sure which approach we will finally take. In short term, we will keep current data structure, and will consider your advice in next big release of 0.2.0. Thanks. |
Sorry for closing it, I think we need more time to discuss this issue. |
Recommendation query seems to be allowed to the only registered vectors.
I think, when registering new vector (vadd command) all comparison calculation is performed (O(n)), and when querying recommendations, pre-caculated results are outputted (Constant Time Complexity?)
But in my application, user input arbitrary vector to the system for "instant" recommendation results..
Please give me some guides~
The text was updated successfully, but these errors were encountered: