Optimize Query Performance counts with count scans #5538

benjaminpkane · 2025-03-04T21:17:06Z

What changes are proposed in this pull request?

With Query Performance enabled, excessive aggregation were still being run from the original sidebar mode.

Removing all aggregations except Count when query performance is enabled, and ensuring count scans occur when possible, yields at least 4x faster db responses for sample level scalars. Tested with BDD100k (69,863 samples). 172ms db time to 39ms db time.

How is this patch tested? If it is not, please explain why.

Server aggregation assertion

Release Notes

Optimized sidebar counts for :ref:Query Performance <app-optimizing-query-performance> mode

What areas of FiftyOne does this PR affect?

App: FiftyOne application changes
Build: Build and test infrastructure changes
Core: Core fiftyone Python library changes
Documentation: FiftyOne documentation changes
Other

Summary by CodeRabbit

New Features
- Introduced a performance toggle for aggregation queries that optimizes data processing.
- Added a queryPerformance flag to aggregation forms to influence performance behavior.
Schema Updates
- Updated aggregation input forms to include a performance flag.
- Adjusted dataset and aggregation result fields to provide more flexible, optional outputs.
Tests
- Enhanced test scenarios to validate the behavior of the performance toggle in aggregation queries.

coderabbitai · 2025-03-04T21:17:13Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)

app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request updates aggregation-related code across multiple modules. It introduces a new query performance flag into the state management, GraphQL schema, backend aggregation logic, and associated tests. The changes add new fields and parameters (such as queryPerformance and _optimize) to manage performance aspects and adjust the control flows in aggregation resolution.

Changes

File(s)	Change Summary
app/packages/state/src/recoil/aggregations.ts	Added an import for `queryPerformance` and injected the `queryPerformance` property into the `aggForm` object within the `aggregationQuery`, retrieving its value via `get(queryPerformance)`.
app/schema.graphql	Added a `queryPerformance: Boolean = false` field in the `AggregationForm` input type; changed `estimatedSampleCount` in `Dataset` from non-nullable to nullable; and modified `StringAggregation.values` from a non-nullable to a nullable list.
fiftyone/core/aggregations.py	Introduced a new optional `_optimize` parameter to the `Count` class constructor; updated the `to_mongo` method to conditionally append a match stage based on `_optimize`; and extended the `_parse_field_and_expr` function to accept an optimization flag.
fiftyone/server/aggregations.py	Added an optional `query_performance` attribute (default `False`) to the `AggregationForm` class; made `StringAggregation.values` optional; and updated the `_resolve_path_aggregation` function to conditionally append aggregations based on the `query_performance` flag.
tests/unittests/server_aggregations_tests.py	Refactored test setup in `test_group_mode_sidebar_counts` by separating the form dictionary construction and including a new test scenario with `"query_performance": True`, expecting `values` to be `None` when the flag is enabled.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Resolver
    participant Aggregator

    Client->>Server: Send aggregation request with queryPerformance flag
    Server->>Resolver: Process AggregationForm
    alt queryPerformance is False
        Resolver->>Aggregator: Append standard aggregations
    else queryPerformance is True
        Resolver->>Aggregator: Skip performance-intensive aggregations
    end
    Aggregator-->>Resolver: Return aggregation results
    Resolver-->>Server: Forward response
    Server-->>Client: Return final response

sequenceDiagram
    participant AggregationQuery
    participant CountInstance

    AggregationQuery->>CountInstance: Call to_mongo(sample_collection)
    alt _optimize is False
        CountInstance->>CountInstance: Append match stage to pipeline
    else _optimize is True
        CountInstance->>CountInstance: Skip match stage for optimization
    end
    CountInstance-->>AggregationQuery: Return aggregation data

Possibly related PRs

Added two env vars for enabling query performance #4917: The changes in the main PR are related to the addition of a queryPerformance property in the aggregation query, enhancing the handling of query performance data.
Optimize modal sample tagging #5417: The changes in the main PR align with the introduction of the query_performance attribute in the AggregationForm class.
Add select/exclude field stages to Query Performance #5460: The changes in the main PR relate to enhancements made regarding query performance, specifically in field selection and exclusion.

Suggested labels

enhancement, app

Suggested reviewers

sashankaryal
tom-vx51

Poem

Hopping through lines of code I roam,
A clever rabbit in a data home.
With flags and tests, I leap in delight,
Optimizing queries by day and night.
Code carrots crunching—oh, what a sight! 🐇✨

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

fiftyone/core/collections.py (1)
10229-10275: New async aggregation method for optimized query performance.

A new asynchronous aggregation method has been added that appears to be part of the query performance optimization. The method only processes facet-type aggregations, which aligns with the PR objective of optimizing sidebar counts in Query Performance mode.

The debug parameter is accepted but not used within the method body. This suggests it's intended for future use or debugging. Consider adding a docstring comment to clarify the purpose of this parameter or leverage it within the method implementation if it's already intended for use.
- async def _async_aggregate(self, aggregations, debug=False):
+ async def _async_aggregate(self, aggregations, debug=False):
+     """Asynchronously aggregates data for query performance optimization.
+     
+     Args:
+         aggregations: Aggregation instances to process
+         debug (False): Whether to include additional debugging information
+     
+     Returns:
+         The aggregation results
+     """
fiftyone/core/aggregations.py (1)
588-593: Implemented the key optimization logic.

This change implements the core of the optimization described in the PR objectives:

When _optimize is true, it skips adding a match stage that filters for non-null values

This significantly reduces processing time for large datasets

The code matches the logic with the business requirement of optimizing sidebar counts in query performance mode.

However, there's a minor code style improvement opportunity:
-if not self._optimize:
-    if not sample_collection._contains_videos() or path != "frames":
-        pipeline.append(
-            {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
-        )
+if not self._optimize and (not sample_collection._contains_videos() or path != "frames"):
+    pipeline.append(
+        {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
+    )
This small refactor combines the nested conditionals for better readability.

🧰 Tools

🪛 Ruff (0.8.2)

588-589: Use a single if statement instead of nested if statements

(SIM102)
fiftyone/server/aggregations.py (1)
234-254: Effective conditional aggregation logic for performance optimization.

This is the core optimization - when query_performance is True, only the basic Count aggregation is performed, skipping additional aggregations like CountValues and Bounds. This should significantly reduce the computational load during queries, which aligns with the PR objective of enhancing query performance.

Consider adding a brief comment explaining the performance implications of this change for future maintainers.
-    if not query_performance:
+    # Skip additional aggregations when query_performance is enabled to optimize query execution
+    if not query_performance:

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a92b96 and 1bcf43b.

⛔ Files ignored due to path filters (4)

app/packages/app/src/pages/datasets/__generated__/DatasetPageQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/fragments/__generated__/estimatedCountsFragment.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/queries/__generated__/datasetQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**

📒 Files selected for processing (6)

app/packages/state/src/recoil/aggregations.ts (2 hunks)
app/schema.graphql (4 hunks)
fiftyone/core/aggregations.py (6 hunks)
fiftyone/core/collections.py (1 hunks)
fiftyone/server/aggregations.py (5 hunks)
tests/unittests/server_aggregations_tests.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

app/packages/state/src/recoil/aggregations.ts

🪛 Ruff (0.8.2)

fiftyone/core/aggregations.py

588-589: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-app
GitHub Check: lint / eslint
GitHub Check: build / build
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (21)

app/packages/state/src/recoil/aggregations.ts (2)

16-16: Added import for the new query performance feature.

The addition of this import brings in the queryPerformance state atom that will be used to optimize aggregation operations.

89-89: Added queryPerformance flag to enable performance optimization.

The newly added property forwards the query performance flag from the client state to the GraphQL server. This enables the optimization described in the PR objectives that significantly improves database response time for sidebar counts.

tests/unittests/server_aggregations_tests.py (4)

97-118: Code refactoring improves test readability.

Restructuring the form dictionary into a standalone variable improves code readability and maintains better separation of test input creation from execution.

119-120: No change in test behavior during initial execution.

This execute call still retains the same functionality as the original code, just with improved structure.

136-138: Added key test case for query performance optimization.

This new test case validates that when query_performance is enabled, the aggregation engine properly skips unnecessary processing by returning None for the values field.

139-152: Test assertions validate the optimization's correctness.

The assertions verify that with query performance enabled:

The essential count information is still returned correctly (count: 1, exists: 1)

The values array is set to None, confirming that the optimization is working as designed

This matches the PR objective of "removing all aggregations except for the Count" to improve query performance.

app/schema.graphql (4)

35-35: Added queryPerformance flag to GraphQL schema.

The new parameter queryPerformance in the AggregationForm type allows clients to explicitly request performance optimization for aggregation operations. Setting it to default false maintains backward compatibility with existing clients.

242-242: Updated estimatedSampleCount to be nullable.

Making this field nullable (by changing from Int! to Int) increases flexibility in cases where the count might not be available or calculable, especially when performance optimization is enabled.

617-621: Improved dataset query method signature formatting and nullability.

The changes here:

Improve readability by placing parameters on separate lines

Update the return type to Dataset! making the return type non-nullable when the dataset exists

This change ensures better type safety in the GraphQL schema.

803-803: Made values field nullable to support query performance optimization.

Changing values from [StringAggregationValue!]! to [StringAggregationValue!] allows the field to be null when query performance is enabled, which is consistent with the optimization strategy described in the PR objectives.

fiftyone/core/aggregations.py (5)

540-546: Added optimization parameter to Count constructor.

The addition of the _optimize parameter gives the Count aggregation the ability to skip unnecessary operations for performance improvement.

549-550: Stored optimization flag as an instance attribute.

Properly storing the flag enables its use in the to_mongo method later.

585-586: Passed optimization flag to field parser.

This ensures the optimization flag is properly forwarded to the parsing logic.

3006-3006: Added optimize parameter to parse function.

The addition of the optimize parameter to _parse_field_and_expr allows this core function to properly support the query performance optimization feature.

3113-3114: Extended optimization to avoid unnecessary projections.

This change further enhances query performance by conditionally skipping projection stages when the optimize flag is enabled. The optimization is properly scoped to non-context cases to avoid affecting other use cases.
fiftyone/server/aggregations.py (6)

45-45: Good addition of the query_performance flag.

Adding this flag allows controlling performance optimization mode while maintaining backward compatibility with the default value of False.

96-96: Appropriate update to make values optional.

Making values optional is consistent with the new behavior where CountValues aggregation may be skipped when query performance optimization is enabled.

144-149: Correctly passing query_performance flag to aggregation resolver.

The change correctly forwards the query_performance flag to the _resolve_path_aggregation function for each path.

216-217: Function signature appropriately updated.

The function signature now includes the query_performance parameter which is used to control aggregation behavior.

219-221:
✅ Verification successful

Optimization for Count aggregation.

Good use of the _optimize parameter to optimize the Count aggregation when query_performance is enabled.

🏁 Script executed:
#!/bin/bash
# Verify if the Count class in the fiftyone.core.aggregations module supports the _optimize parameter
grep -r "_optimize" --include="*.py" fiftyone/core/aggregations.py
Length of output: 194
Count Aggregation _optimize Parameter Confirmation — Approved

The verification confirms that the core aggregation code correctly supports the _optimize parameter (defaulting to False and used to control the optimization behavior). The use of the _optimize parameter in the Count aggregation is implemented as intended.

257-309:
✅ Verification successful

Review the result processing logic for missing aggregations.

The from_results function needs to handle cases where certain aggregations weren't performed due to query_performance being True. While the current implementation seems to handle it correctly because it only processes aggregations that were actually performed, verify that no assumptions are made about the presence of specific aggregation results.

For example, when query_performance is True, data["values"] won't be set for StringAggregation since CountValues isn't performed. Ensure that all client code that consumes these results can handle missing fields.

🏁 Script executed:
#!/bin/bash
# Look for places where StringAggregation is used and check if they handle the case where values is None
grep -r "StringAggregation" --include="*.py" --include="*.js" --include="*.ts" fiftyone/
grep -r "values.*StringAggregation" --include="*.py" --include="*.js" --include="*.ts" fiftyone/
Length of output: 712
Final Verification: Missing Aggregations Are Handled as Expected

After reviewing the processing logic in fiftyone/server/aggregations.py and verifying via grep that the StringAggregation results are defined with an optional values field (defaulting to None), it appears that the code correctly processes only the aggregations that were executed. In the case where query_performance is True, the absence of a values key (for instance, in a StringAggregation) is intentional, and the Optional type annotation ensures that client code should be prepared to handle a missing field.

The optional definition (values: t.Optional[t.List[StringAggregationValue]] = None) confirms that no value is expected when count values aren’t performed.

Clients consuming these aggregation results must verify the presence of a field before use, which is consistent with how the aggregation results are processed.

No changes are required here as long as consumers properly check for the None value.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

fiftyone/core/aggregations.py (1)
588-593: Core optimization logic effectively reduces computational load.

This conditional is the key performance optimization in this PR. By skipping the match stage when _optimize is True, the aggregation pipeline becomes more efficient, which explains the 4x performance improvement mentioned in the PR objectives.

Consider simplifying the nested if statements as suggested by static analysis:
-        if not self._optimize:
-            if not sample_collection._contains_videos() or path != "frames":
-                pipeline.append(
-                    {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
-                )
+        if not self._optimize and (not sample_collection._contains_videos() or path != "frames"):
+            pipeline.append(
+                {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
+            )
🧰 Tools

🪛 Ruff (0.8.2)

588-589: Use a single if statement instead of nested if statements

(SIM102)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bcf43b and 8e580a9.

⛔ Files ignored due to path filters (4)

app/packages/app/src/pages/datasets/__generated__/DatasetPageQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/fragments/__generated__/estimatedCountsFragment.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
app/packages/relay/src/queries/__generated__/datasetQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**

📒 Files selected for processing (5)

app/packages/state/src/recoil/aggregations.ts (2 hunks)
app/schema.graphql (4 hunks)
fiftyone/core/aggregations.py (6 hunks)
fiftyone/server/aggregations.py (5 hunks)
tests/unittests/server_aggregations_tests.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

app/packages/state/src/recoil/aggregations.ts
tests/unittests/server_aggregations_tests.py
app/schema.graphql
fiftyone/server/aggregations.py

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/core/aggregations.py

588-589: Use a single if statement instead of nested if statements

(SIM102)

🔇 Additional comments (5)

fiftyone/core/aggregations.py (5)

540-547: Addition of optimization parameter enhances query performance.

The addition of _optimize parameter to the Count class constructor is a good implementation for the query performance enhancement. It provides a way to control whether to optimize the count aggregation pipeline for performance.

549-550: LGTM! Parameter storage is consistent with class style.

The parameter is properly stored as an instance attribute, consistent with the class's coding style and pattern.

585-586: LGTM! Parameter correctly passed to parsing function.

The optimize parameter is correctly passed to the _parse_field_and_expr function.

3006-3007: LGTM! Added parameter to support optimization logic.

The optimize parameter with a default value of False is correctly added to the _parse_field_and_expr function.

3113-3114: LGTM! Condition modified to account for optimization.

The condition now checks for both context and optimize flags before appending to the pipeline, which is consistent with the optimization goal.

kaixi-wang

So this removes label counts when qp is enabled? Not sure if this should go through product first

kaixi-wang · 2025-03-11T19:53:26Z

fiftyone/server/aggregations.py

@@ -42,6 +42,7 @@ class AggregationForm:
    slices: t.Optional[t.List[str]]
    view: BSONArray
    view_name: t.Optional[str] = None
+    query_performance: t.Optional[bool] = False


where does this value come from?/how is it set?

It comes from the App setting @fiftyone/state/queryPerformance atom

kaixi-wang · 2025-03-11T19:55:03Z

app/packages/relay/src/queries/__generated__/datasetQuery.graphql.ts

@@ -88,7 +88,7 @@ export type datasetQuery$data = {
      readonly slug: string | null;
    } | null;
    readonly " $fragmentSpreads": FragmentRefs<"datasetFragment">;
-  } | null;
+  };


is this safe? Could no dataset due to no permissions cause problems?

This shouldn't be changing. I need to figure out why generated output is different. Thanks!

minhtuev · 2025-03-11T19:59:39Z

fiftyone/core/aggregations.py

+        field_or_expr=None,
+        expr=None,
+        safe=False,
+        _optimize=False,


why don't we set it to true by default? :)

I will add a note. I'd like to do this in the future, but for now I am confident that the server only uses Count in a way that makes this optimization valid. I am unsure at the moment if this is true for all Count usage in the SDK.

benjaminpkane · 2025-03-11T20:01:17Z

app/schema.graphql

@@ -238,7 +239,7 @@ type Dataset {
  appConfig: DatasetAppConfig
  info: JSON
  estimatedFrameCount: Int
-  estimatedSampleCount: Int!
+  estimatedSampleCount: Int


It's unclear to me why the required ! was removed. I will follow up

benjaminpkane · 2025-03-11T20:07:56Z

So this removes label counts when qp is enabled? Not sure if this should go through product first

This removes nothing from the UI. It only omits queries that are not needed by QP sidebar UI, but are required by the older non-QP sidebar mode. So when QP is enabled, we omit

minhtuev

Change looks good to me, thanks Ben 🚢 small comments otherwise

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

fiftyone/core/aggregations.py (1)
553-554: Consider adding _optimize to _kwargs method

For consistency with other parameters, consider adding the _optimize parameter to the _kwargs method. This would ensure that serialization and deserialization of Count aggregations include the optimization settings.
def _kwargs(self):
-   return super()._kwargs() + [["_unwind", self._unwind]]
+   return super()._kwargs() + [["_unwind", self._unwind], ["_optimize", self._optimize]]

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e580a9 and 10c4850.

📒 Files selected for processing (3)

app/schema.graphql (3 hunks)
fiftyone/core/aggregations.py (6 hunks)
fiftyone/server/aggregations.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

app/schema.graphql
fiftyone/server/aggregations.py

🧰 Additional context used

🪛 Ruff (0.8.2)

fiftyone/core/aggregations.py

589-590: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (6)

fiftyone/core/aggregations.py (6)

539-550: Introduces optimization flag for Count aggregation

The addition of the _optimize parameter with a default value of False allows for conditionally skipping certain pipeline stages when counting samples, which should improve performance without changing the API's backward compatibility.

585-586: Propagates optimization flag to parsing function

The _optimize parameter is correctly passed to the _parse_field_and_expr function, ensuring consistent behavior throughout the pipeline construction.

3007-3008: Adds optimize parameter to parsing function

The function signature is updated to accept the new optimize parameter with default value False, maintaining backward compatibility.

3114-3115: Conditionally skip project stage when optimize is True

This optimization skips the pipeline stage that projects the path when optimize is True and there's no context, further improving performance.

545-545: Consider adding documentation for the _optimize parameter

The _optimize parameter is an important optimization feature, but it's not documented in the class docstring. Consider adding documentation to explain its purpose, usage, and impact.

Note: I see from previous review comments that you're considering making this True by default once validated for all SDK usage cases.

588-594:
✅ Verification successful

Conditionally skip None-value filtering when _optimize is True

This change allows the pipeline to skip the match stage that filters out None values when the _optimize flag is set, which should reduce computational overhead. The TODO comment suggests the team is considering if this optimization could be applied more broadly.

🏁 Script executed:
#!/bin/bash
# Search for other usages of Count aggregation in the codebase to ensure this optimization is safe
rg -A 2 "fo\.Count\(" --type py
Length of output: 1824
Review Verified: None-value Filtering Optimization Is Safe

The search results for fo.Count( indicate that this optimization only affects the filtering of None values when _optimize is set. There are no conflicting usages in either the core aggregations or test cases, and the tests confirm that the Count aggregation behaves as expected.

The optimization correctly skips the $match filtering stage when _optimize is True.

No evidence suggests that this change adversely affects any Count aggregation usage.

The TODO note about broader application of this optimization remains valid for future exploration.

🧰 Tools

🪛 Ruff (0.8.2)

589-590: Use a single if statement instead of nested if statements

(SIM102)

kaixi-wang

lgtm

kaixi-wang

👍

coderabbitai bot reviewed Mar 4, 2025

View reviewed changes

benjaminpkane self-assigned this Mar 4, 2025

optimize qp counts with count scans

8e580a9

benjaminpkane force-pushed the optimize-qp-counts branch from 1bcf43b to 8e580a9 Compare March 4, 2025 21:43

benjaminpkane added the bug Bug fixes label Mar 7, 2025

coderabbitai bot reviewed Mar 7, 2025

View reviewed changes

benjaminpkane changed the base branch from develop to release/v1.4.0 March 10, 2025 16:06

kaixi-wang reviewed Mar 11, 2025

View reviewed changes

minhtuev reviewed Mar 11, 2025

View reviewed changes

benjaminpkane commented Mar 11, 2025

View reviewed changes

minhtuev previously approved these changes Mar 11, 2025

View reviewed changes

optional fixes

10c4850

benjaminpkane dismissed minhtuev’s stale review via 10c4850 March 11, 2025 21:11

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

kaixi-wang previously approved these changes Mar 12, 2025

View reviewed changes

update gql output

0e0c5e9

benjaminpkane dismissed kaixi-wang’s stale review via 0e0c5e9 March 12, 2025 00:25

kaixi-wang approved these changes Mar 12, 2025

View reviewed changes

benjaminpkane merged commit e2af2fc into release/v1.4.0 Mar 12, 2025
14 checks passed

benjaminpkane deleted the optimize-qp-counts branch March 12, 2025 00:51

coderabbitai bot mentioned this pull request Mar 12, 2025

Merge release/v1.4.0 to develop #5569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Query Performance counts with count scans #5538

Optimize Query Performance counts with count scans #5538

benjaminpkane commented Mar 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2025 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

kaixi-wang left a comment

kaixi-wang Mar 11, 2025

benjaminpkane Mar 11, 2025

kaixi-wang Mar 11, 2025

benjaminpkane Mar 11, 2025

minhtuev Mar 11, 2025

benjaminpkane Mar 11, 2025

benjaminpkane Mar 11, 2025

benjaminpkane commented Mar 11, 2025

minhtuev left a comment •

edited

Loading

coderabbitai bot left a comment

kaixi-wang left a comment

kaixi-wang left a comment

Optimize Query Performance counts with count scans #5538

Optimize Query Performance counts with count scans #5538

Conversation

benjaminpkane commented Mar 4, 2025 • edited by coderabbitai bot Loading

What changes are proposed in this pull request?

How is this patch tested? If it is not, please explain why.

Release Notes

What areas of FiftyOne does this PR affect?

Summary by CodeRabbit

coderabbitai bot commented Mar 4, 2025 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

kaixi-wang left a comment

Choose a reason for hiding this comment

kaixi-wang Mar 11, 2025

Choose a reason for hiding this comment

benjaminpkane Mar 11, 2025

Choose a reason for hiding this comment

kaixi-wang Mar 11, 2025

Choose a reason for hiding this comment

benjaminpkane Mar 11, 2025

Choose a reason for hiding this comment

minhtuev Mar 11, 2025

Choose a reason for hiding this comment

benjaminpkane Mar 11, 2025

Choose a reason for hiding this comment

benjaminpkane Mar 11, 2025

Choose a reason for hiding this comment

benjaminpkane commented Mar 11, 2025

minhtuev left a comment • edited Loading

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

kaixi-wang left a comment

Choose a reason for hiding this comment

kaixi-wang left a comment

Choose a reason for hiding this comment

benjaminpkane commented Mar 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2025 •

edited

Loading

minhtuev left a comment •

edited

Loading