Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Query Performance counts with count scans #5538

Merged
merged 3 commits into from
Mar 12, 2025

Conversation

benjaminpkane
Copy link
Contributor

@benjaminpkane benjaminpkane commented Mar 4, 2025

What changes are proposed in this pull request?

With Query Performance enabled, excessive aggregation were still being run from the original sidebar mode.

Removing all aggregations except Count when query performance is enabled, and ensuring count scans occur when possible, yields at least 4x faster db responses for sample level scalars. Tested with BDD100k (69,863 samples). 172ms db time to 39ms db time.

How is this patch tested? If it is not, please explain why.

Server aggregation assertion

Release Notes

  • Optimized sidebar counts for :ref:Query Performance <app-optimizing-query-performance> mode

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features

    • Introduced a performance toggle for aggregation queries that optimizes data processing.
    • Added a queryPerformance flag to aggregation forms to influence performance behavior.
  • Schema Updates

    • Updated aggregation input forms to include a performance flag.
    • Adjusted dataset and aggregation result fields to provide more flexible, optional outputs.
  • Tests

    • Enhanced test scenarios to validate the behavior of the performance toggle in aggregation queries.

Copy link
Contributor

coderabbitai bot commented Mar 4, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request updates aggregation-related code across multiple modules. It introduces a new query performance flag into the state management, GraphQL schema, backend aggregation logic, and associated tests. The changes add new fields and parameters (such as queryPerformance and _optimize) to manage performance aspects and adjust the control flows in aggregation resolution.

Changes

File(s) Change Summary
app/packages/state/src/recoil/aggregations.ts Added an import for queryPerformance and injected the queryPerformance property into the aggForm object within the aggregationQuery, retrieving its value via get(queryPerformance).
app/schema.graphql Added a queryPerformance: Boolean = false field in the AggregationForm input type; changed estimatedSampleCount in Dataset from non-nullable to nullable; and modified StringAggregation.values from a non-nullable to a nullable list.
fiftyone/core/aggregations.py Introduced a new optional _optimize parameter to the Count class constructor; updated the to_mongo method to conditionally append a match stage based on _optimize; and extended the _parse_field_and_expr function to accept an optimization flag.
fiftyone/server/aggregations.py Added an optional query_performance attribute (default False) to the AggregationForm class; made StringAggregation.values optional; and updated the _resolve_path_aggregation function to conditionally append aggregations based on the query_performance flag.
tests/unittests/server_aggregations_tests.py Refactored test setup in test_group_mode_sidebar_counts by separating the form dictionary construction and including a new test scenario with "query_performance": True, expecting values to be None when the flag is enabled.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Resolver
    participant Aggregator

    Client->>Server: Send aggregation request with queryPerformance flag
    Server->>Resolver: Process AggregationForm
    alt queryPerformance is False
        Resolver->>Aggregator: Append standard aggregations
    else queryPerformance is True
        Resolver->>Aggregator: Skip performance-intensive aggregations
    end
    Aggregator-->>Resolver: Return aggregation results
    Resolver-->>Server: Forward response
    Server-->>Client: Return final response
Loading
sequenceDiagram
    participant AggregationQuery
    participant CountInstance

    AggregationQuery->>CountInstance: Call to_mongo(sample_collection)
    alt _optimize is False
        CountInstance->>CountInstance: Append match stage to pipeline
    else _optimize is True
        CountInstance->>CountInstance: Skip match stage for optimization
    end
    CountInstance-->>AggregationQuery: Return aggregation data
Loading

Possibly related PRs

Suggested labels

enhancement, app

Suggested reviewers

  • sashankaryal
  • tom-vx51

Poem

Hopping through lines of code I roam,
A clever rabbit in a data home.
With flags and tests, I leap in delight,
Optimizing queries by day and night.
Code carrots crunching—oh, what a sight! 🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
fiftyone/core/collections.py (1)

10229-10275: New async aggregation method for optimized query performance.

A new asynchronous aggregation method has been added that appears to be part of the query performance optimization. The method only processes facet-type aggregations, which aligns with the PR objective of optimizing sidebar counts in Query Performance mode.

The debug parameter is accepted but not used within the method body. This suggests it's intended for future use or debugging. Consider adding a docstring comment to clarify the purpose of this parameter or leverage it within the method implementation if it's already intended for use.

- async def _async_aggregate(self, aggregations, debug=False):
+ async def _async_aggregate(self, aggregations, debug=False):
+     """Asynchronously aggregates data for query performance optimization.
+     
+     Args:
+         aggregations: Aggregation instances to process
+         debug (False): Whether to include additional debugging information
+     
+     Returns:
+         The aggregation results
+     """
fiftyone/core/aggregations.py (1)

588-593: Implemented the key optimization logic.

This change implements the core of the optimization described in the PR objectives:

  1. When _optimize is true, it skips adding a match stage that filters for non-null values
  2. This significantly reduces processing time for large datasets

The code matches the logic with the business requirement of optimizing sidebar counts in query performance mode.

However, there's a minor code style improvement opportunity:

-if not self._optimize:
-    if not sample_collection._contains_videos() or path != "frames":
-        pipeline.append(
-            {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
-        )
+if not self._optimize and (not sample_collection._contains_videos() or path != "frames"):
+    pipeline.append(
+        {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
+    )

This small refactor combines the nested conditionals for better readability.

🧰 Tools
🪛 Ruff (0.8.2)

588-589: Use a single if statement instead of nested if statements

(SIM102)

fiftyone/server/aggregations.py (1)

234-254: Effective conditional aggregation logic for performance optimization.

This is the core optimization - when query_performance is True, only the basic Count aggregation is performed, skipping additional aggregations like CountValues and Bounds. This should significantly reduce the computational load during queries, which aligns with the PR objective of enhancing query performance.

Consider adding a brief comment explaining the performance implications of this change for future maintainers.

-    if not query_performance:
+    # Skip additional aggregations when query_performance is enabled to optimize query execution
+    if not query_performance:
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a92b96 and 1bcf43b.

⛔ Files ignored due to path filters (4)
  • app/packages/app/src/pages/datasets/__generated__/DatasetPageQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/fragments/__generated__/estimatedCountsFragment.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/queries/__generated__/datasetQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
📒 Files selected for processing (6)
  • app/packages/state/src/recoil/aggregations.ts (2 hunks)
  • app/schema.graphql (4 hunks)
  • fiftyone/core/aggregations.py (6 hunks)
  • fiftyone/core/collections.py (1 hunks)
  • fiftyone/server/aggregations.py (5 hunks)
  • tests/unittests/server_aggregations_tests.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

  • app/packages/state/src/recoil/aggregations.ts
🪛 Ruff (0.8.2)
fiftyone/core/aggregations.py

588-589: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-app
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (21)
app/packages/state/src/recoil/aggregations.ts (2)

16-16: Added import for the new query performance feature.

The addition of this import brings in the queryPerformance state atom that will be used to optimize aggregation operations.


89-89: Added queryPerformance flag to enable performance optimization.

The newly added property forwards the query performance flag from the client state to the GraphQL server. This enables the optimization described in the PR objectives that significantly improves database response time for sidebar counts.

tests/unittests/server_aggregations_tests.py (4)

97-118: Code refactoring improves test readability.

Restructuring the form dictionary into a standalone variable improves code readability and maintains better separation of test input creation from execution.


119-120: No change in test behavior during initial execution.

This execute call still retains the same functionality as the original code, just with improved structure.


136-138: Added key test case for query performance optimization.

This new test case validates that when query_performance is enabled, the aggregation engine properly skips unnecessary processing by returning None for the values field.


139-152: Test assertions validate the optimization's correctness.

The assertions verify that with query performance enabled:

  1. The essential count information is still returned correctly (count: 1, exists: 1)
  2. The values array is set to None, confirming that the optimization is working as designed

This matches the PR objective of "removing all aggregations except for the Count" to improve query performance.

app/schema.graphql (4)

35-35: Added queryPerformance flag to GraphQL schema.

The new parameter queryPerformance in the AggregationForm type allows clients to explicitly request performance optimization for aggregation operations. Setting it to default false maintains backward compatibility with existing clients.


242-242: Updated estimatedSampleCount to be nullable.

Making this field nullable (by changing from Int! to Int) increases flexibility in cases where the count might not be available or calculable, especially when performance optimization is enabled.


617-621: Improved dataset query method signature formatting and nullability.

The changes here:

  1. Improve readability by placing parameters on separate lines
  2. Update the return type to Dataset! making the return type non-nullable when the dataset exists

This change ensures better type safety in the GraphQL schema.


803-803: Made values field nullable to support query performance optimization.

Changing values from [StringAggregationValue!]! to [StringAggregationValue!] allows the field to be null when query performance is enabled, which is consistent with the optimization strategy described in the PR objectives.

fiftyone/core/aggregations.py (5)

540-546: Added optimization parameter to Count constructor.

The addition of the _optimize parameter gives the Count aggregation the ability to skip unnecessary operations for performance improvement.


549-550: Stored optimization flag as an instance attribute.

Properly storing the flag enables its use in the to_mongo method later.


585-586: Passed optimization flag to field parser.

This ensures the optimization flag is properly forwarded to the parsing logic.


3006-3006: Added optimize parameter to parse function.

The addition of the optimize parameter to _parse_field_and_expr allows this core function to properly support the query performance optimization feature.


3113-3114: Extended optimization to avoid unnecessary projections.

This change further enhances query performance by conditionally skipping projection stages when the optimize flag is enabled. The optimization is properly scoped to non-context cases to avoid affecting other use cases.

fiftyone/server/aggregations.py (6)

45-45: Good addition of the query_performance flag.

Adding this flag allows controlling performance optimization mode while maintaining backward compatibility with the default value of False.


96-96: Appropriate update to make values optional.

Making values optional is consistent with the new behavior where CountValues aggregation may be skipped when query performance optimization is enabled.


144-149: Correctly passing query_performance flag to aggregation resolver.

The change correctly forwards the query_performance flag to the _resolve_path_aggregation function for each path.


216-217: Function signature appropriately updated.

The function signature now includes the query_performance parameter which is used to control aggregation behavior.


219-221:

✅ Verification successful

Optimization for Count aggregation.

Good use of the _optimize parameter to optimize the Count aggregation when query_performance is enabled.


🏁 Script executed:

#!/bin/bash
# Verify if the Count class in the fiftyone.core.aggregations module supports the _optimize parameter
grep -r "_optimize" --include="*.py" fiftyone/core/aggregations.py

Length of output: 194


Count Aggregation _optimize Parameter Confirmation — Approved

The verification confirms that the core aggregation code correctly supports the _optimize parameter (defaulting to False and used to control the optimization behavior). The use of the _optimize parameter in the Count aggregation is implemented as intended.


257-309:

✅ Verification successful

Review the result processing logic for missing aggregations.

The from_results function needs to handle cases where certain aggregations weren't performed due to query_performance being True. While the current implementation seems to handle it correctly because it only processes aggregations that were actually performed, verify that no assumptions are made about the presence of specific aggregation results.

For example, when query_performance is True, data["values"] won't be set for StringAggregation since CountValues isn't performed. Ensure that all client code that consumes these results can handle missing fields.


🏁 Script executed:

#!/bin/bash
# Look for places where StringAggregation is used and check if they handle the case where values is None
grep -r "StringAggregation" --include="*.py" --include="*.js" --include="*.ts" fiftyone/
grep -r "values.*StringAggregation" --include="*.py" --include="*.js" --include="*.ts" fiftyone/

Length of output: 712


Final Verification: Missing Aggregations Are Handled as Expected

After reviewing the processing logic in fiftyone/server/aggregations.py and verifying via grep that the StringAggregation results are defined with an optional values field (defaulting to None), it appears that the code correctly processes only the aggregations that were executed. In the case where query_performance is True, the absence of a values key (for instance, in a StringAggregation) is intentional, and the Optional type annotation ensures that client code should be prepared to handle a missing field.

  • The optional definition (values: t.Optional[t.List[StringAggregationValue]] = None) confirms that no value is expected when count values aren’t performed.
  • Clients consuming these aggregation results must verify the presence of a field before use, which is consistent with how the aggregation results are processed.

No changes are required here as long as consumers properly check for the None value.

@benjaminpkane benjaminpkane self-assigned this Mar 4, 2025
@benjaminpkane benjaminpkane added the bug Bug fixes label Mar 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/core/aggregations.py (1)

588-593: Core optimization logic effectively reduces computational load.

This conditional is the key performance optimization in this PR. By skipping the match stage when _optimize is True, the aggregation pipeline becomes more efficient, which explains the 4x performance improvement mentioned in the PR objectives.

Consider simplifying the nested if statements as suggested by static analysis:

-        if not self._optimize:
-            if not sample_collection._contains_videos() or path != "frames":
-                pipeline.append(
-                    {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
-                )
+        if not self._optimize and (not sample_collection._contains_videos() or path != "frames"):
+            pipeline.append(
+                {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
+            )
🧰 Tools
🪛 Ruff (0.8.2)

588-589: Use a single if statement instead of nested if statements

(SIM102)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bcf43b and 8e580a9.

⛔ Files ignored due to path filters (4)
  • app/packages/app/src/pages/datasets/__generated__/DatasetPageQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/fragments/__generated__/estimatedCountsFragment.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/queries/__generated__/datasetQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
📒 Files selected for processing (5)
  • app/packages/state/src/recoil/aggregations.ts (2 hunks)
  • app/schema.graphql (4 hunks)
  • fiftyone/core/aggregations.py (6 hunks)
  • fiftyone/server/aggregations.py (5 hunks)
  • tests/unittests/server_aggregations_tests.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/packages/state/src/recoil/aggregations.ts
  • tests/unittests/server_aggregations_tests.py
  • app/schema.graphql
  • fiftyone/server/aggregations.py
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/core/aggregations.py

588-589: Use a single if statement instead of nested if statements

(SIM102)

🔇 Additional comments (5)
fiftyone/core/aggregations.py (5)

540-547: Addition of optimization parameter enhances query performance.

The addition of _optimize parameter to the Count class constructor is a good implementation for the query performance enhancement. It provides a way to control whether to optimize the count aggregation pipeline for performance.


549-550: LGTM! Parameter storage is consistent with class style.

The parameter is properly stored as an instance attribute, consistent with the class's coding style and pattern.


585-586: LGTM! Parameter correctly passed to parsing function.

The optimize parameter is correctly passed to the _parse_field_and_expr function.


3006-3007: LGTM! Added parameter to support optimization logic.

The optimize parameter with a default value of False is correctly added to the _parse_field_and_expr function.


3113-3114: LGTM! Condition modified to account for optimization.

The condition now checks for both context and optimize flags before appending to the pipeline, which is consistent with the optimization goal.

@benjaminpkane benjaminpkane changed the base branch from develop to release/v1.4.0 March 10, 2025 16:06
Copy link
Contributor

@kaixi-wang kaixi-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this removes label counts when qp is enabled? Not sure if this should go through product first

@@ -42,6 +42,7 @@ class AggregationForm:
slices: t.Optional[t.List[str]]
view: BSONArray
view_name: t.Optional[str] = None
query_performance: t.Optional[bool] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this value come from?/how is it set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from the App setting @fiftyone/state/queryPerformance atom

@@ -88,7 +88,7 @@ export type datasetQuery$data = {
readonly slug: string | null;
} | null;
readonly " $fragmentSpreads": FragmentRefs<"datasetFragment">;
} | null;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe? Could no dataset due to no permissions cause problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be changing. I need to figure out why generated output is different. Thanks!

field_or_expr=None,
expr=None,
safe=False,
_optimize=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we set it to true by default? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a note. I'd like to do this in the future, but for now I am confident that the server only uses Count in a way that makes this optimization valid. I am unsure at the moment if this is true for all Count usage in the SDK.

@@ -238,7 +239,7 @@ type Dataset {
appConfig: DatasetAppConfig
info: JSON
estimatedFrameCount: Int
estimatedSampleCount: Int!
estimatedSampleCount: Int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why the required ! was removed. I will follow up

@benjaminpkane
Copy link
Contributor Author

So this removes label counts when qp is enabled? Not sure if this should go through product first

This removes nothing from the UI. It only omits queries that are not needed by QP sidebar UI, but are required by the older non-QP sidebar mode. So when QP is enabled, we omit

minhtuev
minhtuev previously approved these changes Mar 11, 2025
Copy link
Contributor

@minhtuev minhtuev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, thanks Ben 🚢 small comments otherwise

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/core/aggregations.py (1)

553-554: Consider adding _optimize to _kwargs method

For consistency with other parameters, consider adding the _optimize parameter to the _kwargs method. This would ensure that serialization and deserialization of Count aggregations include the optimization settings.

def _kwargs(self):
-   return super()._kwargs() + [["_unwind", self._unwind]]
+   return super()._kwargs() + [["_unwind", self._unwind], ["_optimize", self._optimize]]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e580a9 and 10c4850.

📒 Files selected for processing (3)
  • app/schema.graphql (3 hunks)
  • fiftyone/core/aggregations.py (6 hunks)
  • fiftyone/server/aggregations.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/schema.graphql
  • fiftyone/server/aggregations.py
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/core/aggregations.py

589-590: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-app
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (6)
fiftyone/core/aggregations.py (6)

539-550: Introduces optimization flag for Count aggregation

The addition of the _optimize parameter with a default value of False allows for conditionally skipping certain pipeline stages when counting samples, which should improve performance without changing the API's backward compatibility.


585-586: Propagates optimization flag to parsing function

The _optimize parameter is correctly passed to the _parse_field_and_expr function, ensuring consistent behavior throughout the pipeline construction.


3007-3008: Adds optimize parameter to parsing function

The function signature is updated to accept the new optimize parameter with default value False, maintaining backward compatibility.


3114-3115: Conditionally skip project stage when optimize is True

This optimization skips the pipeline stage that projects the path when optimize is True and there's no context, further improving performance.


545-545: Consider adding documentation for the _optimize parameter

The _optimize parameter is an important optimization feature, but it's not documented in the class docstring. Consider adding documentation to explain its purpose, usage, and impact.

Note: I see from previous review comments that you're considering making this True by default once validated for all SDK usage cases.


588-594:

✅ Verification successful

Conditionally skip None-value filtering when _optimize is True

This change allows the pipeline to skip the match stage that filters out None values when the _optimize flag is set, which should reduce computational overhead. The TODO comment suggests the team is considering if this optimization could be applied more broadly.


🏁 Script executed:

#!/bin/bash
# Search for other usages of Count aggregation in the codebase to ensure this optimization is safe
rg -A 2 "fo\.Count\(" --type py

Length of output: 1824


Review Verified: None-value Filtering Optimization Is Safe

The search results for fo.Count( indicate that this optimization only affects the filtering of None values when _optimize is set. There are no conflicting usages in either the core aggregations or test cases, and the tests confirm that the Count aggregation behaves as expected.

  • The optimization correctly skips the $match filtering stage when _optimize is True.
  • No evidence suggests that this change adversely affects any Count aggregation usage.
  • The TODO note about broader application of this optimization remains valid for future exploration.
🧰 Tools
🪛 Ruff (0.8.2)

589-590: Use a single if statement instead of nested if statements

(SIM102)

kaixi-wang
kaixi-wang previously approved these changes Mar 12, 2025
Copy link
Contributor

@kaixi-wang kaixi-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@kaixi-wang kaixi-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@benjaminpkane benjaminpkane merged commit e2af2fc into release/v1.4.0 Mar 12, 2025
14 checks passed
@benjaminpkane benjaminpkane deleted the optimize-qp-counts branch March 12, 2025 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants