is "Instant Similarity Query" not supported? #35

bwlim · 2014-06-30T02:32:06Z

Recommendation query seems to be allowed to the only registered vectors.
I think, when registering new vector (vadd command) all comparison calculation is performed (O(n)), and when querying recommendations, pre-caculated results are outputted (Constant Time Complexity?)
But in my application, user input arbitrary vector to the system for "instant" recommendation results..
Please give me some guides~

mountain · 2014-06-30T03:53:16Z

Hi, @bwlim ,

We think two aspects need to be considered on this issue, one is performance, the other is API design.

When instant query is needed, the input of vector, comparison and the output of recommendation should be combined in one API call. Obviously, the cost of the call should be scale at O(n) level.

According to our experience, assuming a 1-k dimensional base and cosinesq score function, the time cost of the call would be:

70k vectors: ~0.1s, this speed should be OK for instant input
700k vectors: ~1s, this speed should be slow for instant input

And one possible way to improve is to change cosinesq score to eclidean or manhattan, but the performance data need to be re-tested.

On the aspect of API design, we think two API can be added:

qadd: sourceVKey, targetVKey, sourceVectorComponents...
qrec: targetVKey, vectorComponents...

Command qadd is designed for the scenario that we add a vector, register it and then get the recommedation result in only one api call. Before qadd call, you need vmk to create a vector set to hold the inputed vectors.

While command qrec dose not store the inputed vectors, you only test an inputed vector against some predefined vector set.

I think qadd or qrec should meet with your requirements.

Advice are welcomed!

bwlim · 2014-06-30T04:47:49Z

Yes, qadd, qrec can be one solution using current implementation of simbase.
um.. but, isn't there another indexing structure which similarity query performance is less than O(N) (O(logN) would be great!).
Have you ever heard R-tree?
Using R-tree index structure, we can query spatial query(querying all points within a query 2D box(or 3D cube, ...) in O(logN) time complexity. R-tree or R-tree variants are the basic core technology for geospatial database. (Geospatial DB can be also considered as a low dimensional vector database)
I'm not super-expert in this area :D,
I'm wondering there are some similar technologies for similarity vector query system..

When new vector is registered to vector DB, new vector instance is indexed to R-tree or something index structure (maybe in O(logN)), and similarity vector querying is performed by traversing R-tree or something index structure in O(log(N))

For database, query performance of O(N) is not good situation...

How do you think about this? :D

mountain · 2014-06-30T06:33:06Z

yes, @bwlim , R-tree is one possible way to speed up simbase, but also introduce much complexity of the code. We should balance gain and loss.

For our own use cases:

document recommendation: the current implementation is OK for next several years
personalised recommendation: because the baseline of registered users is around 10M ~ 50M, we still need to improve performance

For personalised recommendations, currently I prefer take another approach of locally sensitive hashing. Without real experiments on recommendation quality, performance and the development cost, I am not sure which approach we will finally take.

In short term, we will keep current data structure, and will consider your advice in next big release of 0.2.0.

Thanks.

mountain · 2014-06-30T10:45:08Z

Sorry for closing it, I think we need more time to discuss this issue.

mountain added enhancement labels Jun 30, 2014

mountain closed this as completed Jun 30, 2014

mountain reopened this Jun 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is "Instant Similarity Query" not supported? #35

is "Instant Similarity Query" not supported? #35

bwlim commented Jun 30, 2014

mountain commented Jun 30, 2014

bwlim commented Jun 30, 2014

mountain commented Jun 30, 2014

mountain commented Jun 30, 2014

is "Instant Similarity Query" not supported? #35

is "Instant Similarity Query" not supported? #35

Comments

bwlim commented Jun 30, 2014

mountain commented Jun 30, 2014

bwlim commented Jun 30, 2014

mountain commented Jun 30, 2014

mountain commented Jun 30, 2014