[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

navneet1v · 2024-10-16T08:13:37Z

Oppurtunity

Currently in plugin, whenever we are doing disk based vector search we first do a ANN search on quantized index and then do the rescoring using full precision vectors. ref:

k-NN/src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java

Lines 64 to 75 in 7cf45c8

    
           boolean isShardLevelRescoringEnabled = KNNSettings.isShardLevelRescoringEnabledForDiskBasedVector(knnQuery.getIndexName()); 
        
           int dimension = knnQuery.getQueryVector().length; 
        
           int firstPassK = rescoreContext.getFirstPassK(finalK, isShardLevelRescoringEnabled, dimension); 
        
           perLeafResults = doSearch(indexSearcher, leafReaderContexts, knnWeight, firstPassK); 
        
           if (isShardLevelRescoringEnabled == true) { 
        
               ResultUtil.reduceToTopK(perLeafResults, firstPassK); 
        
           } 
        
           StopWatch stopWatch = new StopWatch().start(); 
        
           perLeafResults = doRescore(indexSearcher, leafReaderContexts, knnWeight, perLeafResults, finalK); 
        
           long rescoreTime = stopWatch.stop().totalTime().millis(); 
        
           log.debug("Rescoring results took {} ms. oversampled k:{}, segments:{}", rescoreTime, firstPassK, leafReaderContexts.size());

But this flow of first doing ANN search is not always true when efficient filters are present. For efficient filters we first see if exact search can be done or not. If exact search can be done then we do exact search rather than ANN search. ref:

k-NN/src/main/java/org/opensearch/knn/index/query/KNNWeight.java

Lines 132 to 156 in 7cf45c8

    
                   final BitSet filterBitSet = getFilteredDocsBitSet(context); 
        
                   int cardinality = filterBitSet.cardinality(); 
        
                   // We don't need to go to JNI layer if no documents are found which satisfy the filters 
        
                   // We should give this condition a deeper look that where it should be placed. For now I feel this is a good 
        
                   // place, 
        
                   if (filterWeight != null && cardinality == 0) { 
        
                       return Collections.emptyMap(); 
        
                   } 
        
                   /* 
        
                    * The idea for this optimization is to get K results, we need to at least look at K vectors in the HNSW graph 
        
                    * . Hence, if filtered results are less than K and filter query is present we should shift to exact search. 
        
                    * This improves the recall. 
        
                    */ 
        
                   if (isFilteredExactSearchPreferred(cardinality)) { 
        
                       return doExactSearch(context, filterBitSet, k); 
        
                   } 
        
                   Map<Integer, Float> docIdsToScoreMap = doANNSearch(context, filterBitSet, cardinality, k); 
        
                   // See whether we have to perform exact search based on approx search results 
        
                   // This is required if there are no native engine files or if approximate search returned 
        
                   // results less than K, though we have more than k filtered docs 
        
                   if (isExactSearchRequire(context, cardinality, docIdsToScoreMap.size())) { 
        
                       final BitSet docs = filterWeight != null ? filterBitSet : null; 
        
                       return doExactSearch(context, docs, k); 
        
                   } 
        
                   return docIdsToScoreMap;

For disk based index, this exact search during filters first fetches the full precision vectors from disk, do binary quantization on them. ref:

k-NN/src/main/java/org/opensearch/knn/index/query/iterators/VectorIdsKNNIterator.java

Lines 91 to 101 in 7cf45c8

    
           protected float computeScore() throws IOException { 
        
               final float[] vector = knnFloatVectorValues.getVector(); 
        
               if (segmentLevelQuantizationInfo != null && quantizedQueryVector != null) { 
        
                   byte[] quantizedVector = SegmentLevelQuantizationUtil.quantizeVector(vector, segmentLevelQuantizationInfo); 
        
                   return SpaceType.HAMMING.getKnnVectorSimilarityFunction().compare(quantizedQueryVector, quantizedVector); 
        
               } else { 
        
                   // Calculates a similarity score between the two vectors with a specified function. Higher similarity 
        
                   // scores correspond to closer vectors. 
        
                   return spaceType.getKnnVectorSimilarityFunction().compare(queryVector, vector); 
        
               } 
        
           }

The same vectors is also read when we do rescoring. Hence I think we can actually remove this double reading of vectors and during the first pass itself we should calculate the correct scores since we already have vectors with us.

We would have to some changes in reduceToTopK function as with the above code scores of documents from different segments are not using same space_type. But I think that is still an easy problem to solve where we can just avoid the docs from the segments where hamming distance is not used.

We would need to do some benchmarks to see the performance differences. But I think in case of constraint envs, this can really help in reducing disk reads, and not necessary the distance computations.

frejonb · 2024-11-10T13:36:20Z

Maybe related to this. We ran some latency benchmarks on the different compression levels for mode= on_disk on 2.18:

Upon inspecting the segments, we saw multiple ones with less than 15k documents. Setting index.knn.advanced.approximate_threshold=0 for >=8x compression we recover the expected latencies:

navneet1v · 2024-11-11T18:04:25Z

@frejonb thanks for providing the details. @VijayanB is looking into the issue.

navneet1v added Enhancements Increases software capabilities beyond original client specifications search-improvements labels Oct 16, 2024

navneet1v added this to Vector Search RoadMap Oct 16, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Oct 16, 2024

github-actions bot added the untriaged label Oct 16, 2024

navneet1v removed the untriaged label Oct 16, 2024

VijayanB self-assigned this Oct 18, 2024

navneet1v moved this from Backlog to 2.19.0 in Vector Search RoadMap Nov 6, 2024

vamshin added the v2.19.0 label Nov 11, 2024

VijayanB mentioned this issue Nov 14, 2024

Refactor scoring to map leaf reader context with results #2271

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

navneet1v commented Oct 16, 2024

frejonb commented Nov 10, 2024 •

edited

Loading

navneet1v commented Nov 11, 2024

[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

Comments

navneet1v commented Oct 16, 2024

Oppurtunity

frejonb commented Nov 10, 2024 • edited Loading

navneet1v commented Nov 11, 2024

frejonb commented Nov 10, 2024 •

edited

Loading