Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Reducing double vector reading and distance calculation for Disk based Filtered Vector Search #2215

Open
navneet1v opened this issue Oct 16, 2024 · 2 comments
Assignees
Labels
Enhancements Increases software capabilities beyond original client specifications search-improvements v2.19.0

Comments

@navneet1v
Copy link
Collaborator

Oppurtunity

Currently in plugin, whenever we are doing disk based vector search we first do a ANN search on quantized index and then do the rescoring using full precision vectors. ref:

boolean isShardLevelRescoringEnabled = KNNSettings.isShardLevelRescoringEnabledForDiskBasedVector(knnQuery.getIndexName());
int dimension = knnQuery.getQueryVector().length;
int firstPassK = rescoreContext.getFirstPassK(finalK, isShardLevelRescoringEnabled, dimension);
perLeafResults = doSearch(indexSearcher, leafReaderContexts, knnWeight, firstPassK);
if (isShardLevelRescoringEnabled == true) {
ResultUtil.reduceToTopK(perLeafResults, firstPassK);
}
StopWatch stopWatch = new StopWatch().start();
perLeafResults = doRescore(indexSearcher, leafReaderContexts, knnWeight, perLeafResults, finalK);
long rescoreTime = stopWatch.stop().totalTime().millis();
log.debug("Rescoring results took {} ms. oversampled k:{}, segments:{}", rescoreTime, firstPassK, leafReaderContexts.size());

But this flow of first doing ANN search is not always true when efficient filters are present. For efficient filters we first see if exact search can be done or not. If exact search can be done then we do exact search rather than ANN search. ref:

final BitSet filterBitSet = getFilteredDocsBitSet(context);
int cardinality = filterBitSet.cardinality();
// We don't need to go to JNI layer if no documents are found which satisfy the filters
// We should give this condition a deeper look that where it should be placed. For now I feel this is a good
// place,
if (filterWeight != null && cardinality == 0) {
return Collections.emptyMap();
}
/*
* The idea for this optimization is to get K results, we need to at least look at K vectors in the HNSW graph
* . Hence, if filtered results are less than K and filter query is present we should shift to exact search.
* This improves the recall.
*/
if (isFilteredExactSearchPreferred(cardinality)) {
return doExactSearch(context, filterBitSet, k);
}
Map<Integer, Float> docIdsToScoreMap = doANNSearch(context, filterBitSet, cardinality, k);
// See whether we have to perform exact search based on approx search results
// This is required if there are no native engine files or if approximate search returned
// results less than K, though we have more than k filtered docs
if (isExactSearchRequire(context, cardinality, docIdsToScoreMap.size())) {
final BitSet docs = filterWeight != null ? filterBitSet : null;
return doExactSearch(context, docs, k);
}
return docIdsToScoreMap;

For disk based index, this exact search during filters first fetches the full precision vectors from disk, do binary quantization on them. ref:

protected float computeScore() throws IOException {
final float[] vector = knnFloatVectorValues.getVector();
if (segmentLevelQuantizationInfo != null && quantizedQueryVector != null) {
byte[] quantizedVector = SegmentLevelQuantizationUtil.quantizeVector(vector, segmentLevelQuantizationInfo);
return SpaceType.HAMMING.getKnnVectorSimilarityFunction().compare(quantizedQueryVector, quantizedVector);
} else {
// Calculates a similarity score between the two vectors with a specified function. Higher similarity
// scores correspond to closer vectors.
return spaceType.getKnnVectorSimilarityFunction().compare(queryVector, vector);
}
}
The same vectors is also read when we do rescoring. Hence I think we can actually remove this double reading of vectors and during the first pass itself we should calculate the correct scores since we already have vectors with us.

We would have to some changes in reduceToTopK function as with the above code scores of documents from different segments are not using same space_type. But I think that is still an easy problem to solve where we can just avoid the docs from the segments where hamming distance is not used.

We would need to do some benchmarks to see the performance differences. But I think in case of constraint envs, this can really help in reducing disk reads, and not necessary the distance computations.

@navneet1v navneet1v added Enhancements Increases software capabilities beyond original client specifications search-improvements labels Oct 16, 2024
@VijayanB VijayanB self-assigned this Oct 18, 2024
@navneet1v navneet1v moved this from Backlog to 2.19.0 in Vector Search RoadMap Nov 6, 2024
@frejonb
Copy link

frejonb commented Nov 10, 2024

Maybe related to this. We ran some latency benchmarks on the different compression levels for mode= on_disk on 2.18:

image (12)

Upon inspecting the segments, we saw multiple ones with less than 15k documents. Setting index.knn.advanced.approximate_threshold=0 for >=8x compression we recover the expected latencies:

image (13)

@navneet1v
Copy link
Collaborator Author

@frejonb thanks for providing the details. @VijayanB is looking into the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications search-improvements v2.19.0
Projects
Status: 2.19.0
Development

No branches or pull requests

4 participants