Skip to content

Conversation

@xingfan-git
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves performance when counting documents in MongoDB collections by switching from countDocuments() to estimatedDocumentCount(). The change trades accuracy for significant performance gains, especially beneficial for large collections.

Key changes:

  • Replaces exact document counting with metadata-based estimation for O(1) complexity
  • Removes filtering support from the count operation (documented as future work)
  • Updates method signatures to reflect the simplified interface

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/services/tasks/copy-and-paste/documentdb/documentDbDocumentReader.ts Removes filter parameter and switches to estimated counting with performance justification comments
src/documentdb/ClustersClient.ts Adds new estimateDocumentCount() method that wraps MongoDB's estimatedDocumentCount()

Copy link
Collaborator

@tnaum-ms tnaum-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Perfect, thank you!

Please review what happens with our task progress reporting in case an unexpected value comes back from the estimatedDocumentCount, like a 0.

I'm not sure if we have safeguards in place where we compute the progress.

@tnaum-ms tnaum-ms merged commit 1c30539 into feature/copy-and-paste Sep 9, 2025
5 checks passed
@tnaum-ms tnaum-ms deleted the dev/xingfan/estimatedCount branch September 9, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copy-and-Paste: 9. Optional Data Size and Performance Considerations

3 participants