Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Race condition with approximate query leading to Index has already been closed exception #2262

Open
kotwanikunal opened this issue Nov 8, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@kotwanikunal
Copy link
Member

What is the bug?
There is a race condition in the current code path for the native memory cache manager. An entry from the NativeMemoryCacheManager can be evicted before being read for the query, resulting in an "Index has already been closed" exception.

The issue stems from the current code flow, which acquires a read lock after the entry is loaded into the cache. This disjoint between loading and locking operations creates a window where the cache can evict the entry before the read lock is acquired.

Code References
Load to cache: KNNWeight.java#L286-L302
Read Lock: KNNWeight.java#L313
Exception: KNNWeight.java#L316-L318

How can one reproduce the bug?
The error is reproducible with force eviction turned on along with running a OSB vectorsearch benchmark with 5+ clients

What is the expected behavior?

  • Query to go through

What is your host/environment?

  • OS: Linux AL2
  • Version OS 2.17
  • Plugins - kNN
@kotwanikunal kotwanikunal added bug Something isn't working untriaged labels Nov 8, 2024
@kotwanikunal
Copy link
Member Author

kotwanikunal commented Nov 8, 2024

Proposed Solutions

  1. Evictable Flag
  • Add an evictable flag to the allocation object
  • Acts as a lock, released only when the read lock is acquired
  • Prevents deletion and indicates if the object was used after loading
  • Requires warmup to perform a separate cleanup (potentially a new API)
  1. Synchronized Block
  • Move nativememorycachemanager.get() behind a synchronized block
  • Pros: Simple implementation
  • Cons: Reduces concurrency and may create a bottleneck for loading
  1. Reference Counter
  • Implement a reference counter mechanism
  • Increment the counter when the object is created
  • Requires explicit decrement by the user of the allocation object
  • Similar to solution 4 without introducing new constructs
  1. Modified Get API
  • Update the get API with a new parameter
  • Preemptively acquire the read lock as soon as the load is complete

@kotwanikunal
Copy link
Member Author

Sample solution:

public NativeMemoryAllocation get(
        NativeMemoryEntryContext<?> nativeMemoryEntryContext,
        boolean isAbleToTriggerEviction,
        boolean acquirePreemptiveReadlock
    ) {
    
    ...
    
    boolean lockAcquired = false;
            try(nativeMemoryEntryContext) {
                nativeMemoryEntryContext.ensureAvailabilityOfAllocationObject();
                synchronized (this) {
                    if (getCacheSizeInKilobytes() + nativeMemoryEntryContext.calculateSizeInKB() >= maxWeight) {
                        Iterator<String> lruIterator = accessRecencyQueue.iterator();
                        while (lruIterator.hasNext()
                                && (getCacheSizeInKilobytes() + nativeMemoryEntryContext.calculateSizeInKB() >= maxWeight)) {

                            String keyToRemove = lruIterator.next();
                            NativeMemoryAllocation allocationToRemove = cache.getIfPresent(keyToRemove);
                            if (allocationToRemove != null) {
                                allocationToRemove.close();
                                cache.invalidate(keyToRemove);
                            }
                            lruIterator.remove();
                        }
                    }
                result = cache.get(key, nativeMemoryEntryContext::load);
                    // ensure that we take a lock on the allocation as when it is loaded in memory to ensure that it
                    // cannot be evicted as we are going to use this allocation.
                    if (acquirePreemptiveReadlock) {
                        result.incRef();
                        lockAcquired =  true;
                    }
                    accessRecencyQueue.addLast(key);

                    return result;
                    
      }

@kotwanikunal kotwanikunal changed the title [BUG] Race condition with approximate query leading to Index has already been closed exception. [BUG] Race condition with approximate query leading to Index has already been closed exception Nov 8, 2024
@harishbhakuni harishbhakuni self-assigned this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants