-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR. NO. 21] Fix Face Recognition Performance Bottleneck with Smart Clustering and Database Enhancements #354
base: main
Are you sure you want to change the base?
Conversation
- Updated the detect_faces function to support progress tracking. - Modified settings to define database paths and ensure necessary directories are created. - Added new HTML and SVG files for frontend assets. fixed merge conflicts
- Removed unused variable `is_from_device` from `delete_image` function. - Prefixed `is_from_device` in `delete_multiple_images` with an underscore for clarity. - Simplified error handling for file not found cases. - Enhanced response formatting in `delete_multiple_images` and `get_class_ids` functions. - Added a blank line in `facetagging.py` for consistency.
@Pranav0-0Aggarwal Would you please be able to review this PR when you have a moment? I've been keeping it up to date with the main branch, and it's ready for final review. Thank you ! |
Hey @priyankeshh, why does this method batch_insert_face_embeddings ? def batch_insert_face_embeddings(
embeddings_batch: List[Tuple[str, List[np.ndarray]]],
batch_size: int = 1000
) -> None:
"""
Efficient batch processing with memory optimization
""" As In the actual application, the face-tagging algorithm processes only a single image_path when a new image is added: So, at the moment, there doesn’t seem to be a use case for batch_insert_face_embeddings. On what basis, then, were these 📊 Performance Test Results & Key Performance Metrics derived? |
Hey @Jibesh10101011 So about the batch insertion - you're right that the current flow processes images one by one. The thing is, I noticed while testing that even though we process images individually, the bottleneck happens when these operations stack up. What was happening is that after every few images (around 5-6), the system would trigger a full reclustering operation with DBSCAN. This gets exponentially slower as more images are added to the system. So even though we're adding images one at a time, the clustering was becoming a major bottleneck. The batch insertion function is partly forward-looking - I'm planning to update the scheduler in a follow-up PR to better handle bulk operations. But even with the current flow, it helps optimize how we handle the database connections and reduces the clustering overhead. For the performance metrics, I wrote a test script that simulates adding multiple images to the system. It basically replicates what happens when users import a folder or when multiple images get processed in sequence. The 98% improvement comes mainly from avoiding unnecessary reclustering operations and optimizing the database interactions. |
I see your point about optimizing database connections, but this batch insertion approach is only beneficial when processing multiple images at once. However, in our current architecture, each image is first checked for its type—if it's a 'person,' only then does it proceed to clustering. This check happens specifically when handling @router.route("/images/add-folder"). Now, if I modify the architecture as per your suggestion, let’s analyze the impact. Suppose we have 1000 images, and 500 of them are of type 'person.' Your approach suggests inserting these 500 images in a batch. However, after insertion, all these images still need to be compared with existing face embeddings for clustering. This means the linear expansion issue persists—batching doesn't eliminate the fundamental scaling problem. |
Oh wait, I see what you're saying now. You're absolutely right - I got so focused on optimizing the database operations that I missed the bigger bottleneck in the clustering algorithm itself. I fixed the problem locally and I will updating this PR soon |
…ts for incremental clustering performance
I've now implemented incremental clustering to address the performance bottleneck: Before:
Now:
Added test file:
The changes maintain clustering accuracy while dramatically improving performance. This addresses the O(n²) complexity problem you pointed out, making the application responsive even with large image collections. Thank you @Jibesh10101011 for pointing this out |
@Pranav0-0Aggarwal can you take a look at this PR? |
Overview
This PR addresses the critical performance bottleneck in face recognition clustering while introducing comprehensive database optimizations and monitoring capabilities.
Core Changes
Performance Optimizations
Database Enhancements
Memory Management
Technical Details
Database Operations
Performance Monitoring
📊 Performance Test Results
Key Performance Metrics
Test Configuration
Breaking Changes
None. All changes are backward compatible.
Checklist
Fixes #350