Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge release/v1.4.0 to develop #5569

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
Open

Conversation

voxel51-bot
Copy link
Collaborator

@voxel51-bot voxel51-bot commented Mar 12, 2025

Merge release/v1.4.0 to develop

Summary by CodeRabbit

  • New Features

    • Introduced a performance toggle for aggregation queries, allowing enhanced control over query processing.
    • Updated the API schema to support more flexible input and response structures with optional fields.
  • Tests

    • Expanded test coverage to verify the adjusted behavior when the performance option is activated.

sashankaryal and others added 16 commits March 10, 2025 11:21
* ensure read_only is accessed for fo fields only

* safe field.read_only

* another one

* re-fix, we cannot update metadata for non-FO Field

* more field.read_only protections

* Revert "more field.read_only protections"

This reverts commit a51251d.

* read_only protection that is necessary in obscure case

---------

Co-authored-by: brimoor <[email protected]>
Co-authored-by: Stuart Wheaton <[email protected]>
Fiftyone->FiftyOne

rename teams docs folders

rename teams files

teams->enterprise

add redirects from teams files to now enterprise

fix nav/footer layout
…t-so-2

[DOCS] Update docs for FiftyOne Enterprise
* optimize qp counts with count scans

* optional fixes

* update gql output
Copy link
Contributor

coderabbitai bot commented Mar 12, 2025

Walkthrough

The changes introduce a new query performance feature across several components. A new property is added to the aggregation query in the state package and mirrored in the GraphQL schema via the AggregationForm input type. Additionally, both core and server aggregation logic have been updated to support optional filtering based on optimization flags (_optimize and query_performance). The test suite is extended to cover scenarios where query performance is toggled.

Changes

File(s) Summary
app/.../recoil/aggregations.ts Added import for queryPerformance and a new property queryPerformance in the variables function of aggregationQuery.
app/schema.graphql Introduced queryPerformance: Boolean = false in AggregationForm; updated nullability for estimatedSampleCount in Dataset and values in StringAggregation.
fiftyone/core/aggregations.py Added an optional _optimize parameter to the Count class and updated the to_mongo and _parse_field_and_expr methods to conditionally adjust filtering logic.
fiftyone/server/aggregations.py Added query_performance attribute to AggregationForm; initialized default values in aggregation types; updated _resolve_path_aggregation to use the query_performance flag.
tests/unittests/server_aggregations_tests.py Modified test structure to include a new scenario with "query_performance": True, expecting the values field to be None.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GraphQL
    participant AggregationHandler
    participant AggregationLogic

    Client->>GraphQL: Send Aggregation Request (queryPerformance flag)
    GraphQL->>AggregationHandler: Map input to AggregationForm
    AggregationHandler->>AggregationLogic: Invoke aggregation with _optimize/query_performance
    AggregationLogic-->>AggregationHandler: Return aggregation result (conditional filtering)
    AggregationHandler-->>GraphQL: Return processed results
    GraphQL-->>Client: Respond with final aggregation output
Loading

Possibly related PRs

Suggested reviewers

  • findtopher
  • brimoor

Poem

Hop, hop, in lines of code I bound,
Query flags added, new paths found,
With each change I swiftly scurry,
Ensuring performance in a flurry,
Celebrate these tweaks with a happy, bouncy sound! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/core/aggregations.py (1)

588-594: Conditional filtering based on optimization flag.

When the _optimize flag is set to True, the code now skips filtering out None values. This improves performance by removing a potentially expensive match stage from the aggregation pipeline.

This code could be slightly simplified by combining the conditional checks.

-        # todo: can we omit filtering out none in all situations
-        if not self._optimize:
-            if not sample_collection._contains_videos() or path != "frames":
-                pipeline.append(
-                    {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
-                )
+        # todo: can we omit filtering out none in all situations
+        if not self._optimize and (not sample_collection._contains_videos() or path != "frames"):
+            pipeline.append(
+                {"$match": {"$expr": {"$gt": ["$" + path, None]}}}
+            )
🧰 Tools
🪛 Ruff (0.8.2)

589-590: Use a single if statement instead of nested if statements

(SIM102)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 915badc and 83bd7ac.

⛔ Files ignored due to path filters (2)
  • app/packages/relay/src/fragments/__generated__/estimatedCountsFragment.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
  • app/packages/relay/src/queries/__generated__/aggregationsQuery.graphql.ts is excluded by !**/__generated__/**, !**/__generated__/**
📒 Files selected for processing (5)
  • app/packages/state/src/recoil/aggregations.ts (2 hunks)
  • app/schema.graphql (3 hunks)
  • fiftyone/core/aggregations.py (6 hunks)
  • fiftyone/server/aggregations.py (7 hunks)
  • tests/unittests/server_aggregations_tests.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

  • app/packages/state/src/recoil/aggregations.ts
🪛 Ruff (0.8.2)
fiftyone/core/aggregations.py

589-590: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: build
🔇 Additional comments (19)
app/packages/state/src/recoil/aggregations.ts (2)

16-16: New import added for query performance support.

This import brings in the queryPerformance state that will be used to control aggregation optimization behavior.


89-89: Added queryPerformance to aggregation form variables.

The queryPerformance flag is now included in the GraphQL aggregation form, enabling the server to optimize query execution based on this flag. This works in conjunction with the schema changes that add this field to the AggregationForm input type.

app/schema.graphql (3)

35-35: Added queryPerformance option to AggregationForm input type.

This boolean field (defaulting to false) allows clients to request optimized query processing. This change coordinates with the state management updates in the aggregations.ts file.


242-242: Changed estimatedSampleCount from non-nullable to nullable.

The estimatedSampleCount field is now nullable (Int instead of Int!), allowing for cases where sample count estimation may not be available.


799-799: Made StringAggregation values field nullable.

This change allows StringAggregation results to return without values when query performance is enabled. The field type has changed from [StringAggregationValue!]! to [StringAggregationValue!], supporting the optimization where values can be omitted to improve performance.

tests/unittests/server_aggregations_tests.py (2)

97-118: Refactored test to use an explicit form dictionary.

The test has been restructured to define a separate form dictionary that is then passed to the execute function. This makes the code more maintainable and sets up for testing the query performance feature.


136-152: Added test case for query performance optimization.

This new test case verifies that when query_performance is set to True, the values field in the response is set to None. This confirms the new optimization behavior where value calculations are skipped to improve performance.

fiftyone/core/aggregations.py (4)

540-550: Added _optimize parameter to Count class constructor.

The Count aggregation now supports a new _optimize parameter that controls whether certain filtering operations are performed. This enables performance improvements when query performance is prioritized.


585-585: Passing optimize parameter to _parse_field_and_expr function.

The newly added _optimize flag is now passed to the parsing function, allowing optimization behavior to propagate through the aggregation processing.


3007-3007: Added optimize parameter to _parse_field_and_expr function.

The function now accepts an optimize parameter which controls whether certain pipeline stages are included, enabling performance optimizations.


3114-3115: Skip project stage when optimize is True.

When the optimize flag is enabled and no context is present, the code now skips adding a project stage to the pipeline, further improving performance for optimized queries.

fiftyone/server/aggregations.py (8)

45-45: Good addition of query_performance parameter

The new optional query_performance parameter with a default value of False is well-structured and logically placed. This addition will allow for toggling query performance optimizations, which aligns with the PR's purpose of enhancing aggregation performance.


57-58: Improved initialization of boolean properties

The changes to initialize false and true properties with a default value of 0 is a good practice. This prevents potential issues with uninitialized variables and makes the code more robust.


74-78: Proper initialization of float-related properties

Adding default values of 0 to inf, nan, and ninf properties is a good improvement. This ensures these properties have sensible initial values, which prevents potential issues later in the code execution.


96-96: Made values property properly optional

Making the values property optional with a default of None is appropriate as these values might not always be present in the aggregation results. This change aligns with the actual behavior of the aggregation system.


145-148: Correctly passing query_performance parameter

The parameter is now properly passed to the _resolve_path_aggregation function in the list comprehension. This ensures the performance optimization settings are respected throughout the aggregation process.


216-216: Updated function signature

The function signature has been correctly updated to include the new query_performance parameter, maintaining consistency with the changes in the calling function.


219-221: Added optimization flag to Count aggregation

The _optimize parameter is now passed to the Count aggregation, controlled by the query_performance value. This enables conditional optimization for counting operations, which can significantly improve performance for large datasets.


234-254: Well-structured conditional aggregation logic

The implementation of conditional aggregation logic based on the query_performance flag is well-designed. When performance optimization is not required (query_performance=False), additional aggregations are appended based on field type:

  • CountValues for boolean fields
  • Bounds for date, datetime, and integer fields
  • Bounds with special parameters for float fields
  • CountValues with limit for string fields

This approach provides a good balance between rich data collection and performance optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants