Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lasso with point field #5534

Open
wants to merge 8 commits into
base: release/v1.4.0
Choose a base branch
from
Open

Lasso with point field #5534

wants to merge 8 commits into from

Conversation

ritch
Copy link
Contributor

@ritch ritch commented Mar 3, 2025

What changes are proposed in this pull request?

Uses the new points_field param added in https://github.com/voxel51/fiftyone-brain/pull/236/files to allow for mongodb only spatial querying when lassoing in the embeddings plot.

Depends on this fiftyone.brain PR

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced selection handling now incorporates detailed spatial data, improving the accuracy and responsiveness of lasso selections.
    • Introduced advanced state management that integrates additional interaction data for a more dynamic user experience.
    • Enabled refined spatial filtering capabilities, allowing users to filter results using custom polygon selections.
    • Added a new state management function for handling "pointsField" data.
  • Documentation

    • Updated documentation for the GeoWithin class to clarify the new create_index parameter.

Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

Walkthrough

This pull request introduces updates to the selection handling and state management within the embeddings package. The UI now passes additional parameters (lasso points) during selection/deselection events. A new hook, usePointsField, has been added that integrates with existing effect hooks, which now include both lasso points and point field data. Additionally, the backend has been enhanced to process spatial queries from lasso selections, allowing for more comprehensive handling of selection states.

Changes

Files Change Summary
app/.../EmbeddingsPlot.tsx, app/.../usePlotSelection.tsx Modified selection events to pass an extra lassoPoints parameter; updated handleSelected signature and added a new Recoil atom for managing lasso points.
app/.../useBrainResult.tsx, app/.../useExtendedStageEffect.tsx, app/.../useViewChangeEffect.tsx Added a new hook usePointsField and integrated it into effect hooks; expanded dependencies to include pointsField and lassoPoints; updated state management on view change.
fiftyone/.../embeddings.py Enhanced _post_sync to handle lasso_points and points_field; modified control flow for spatial queries based on lasso selections.
fiftyone/.../stages.py Added create_index parameter to GeoWithin class docstring to clarify indexing behavior.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI
    participant PlotSelection
    participant ExtendedEffect
    participant Server

    User->>UI: Initiates lasso selection
    UI->>PlotSelection: onSelected(pointIds, lassoPoints)
    PlotSelection->>PlotSelection: Update state with lassoPoints
    PlotSelection->>ExtendedEffect: Notify state change
    ExtendedEffect->>Server: fetchExtendedStage(pointsField, lassoPoints)
    Server->>Server: Process lassoPoints via _get_fiftyone_geowithin()
    Server-->>ExtendedEffect: Return filtered query/result
    ExtendedEffect-->>UI: Update plot data
Loading
sequenceDiagram
    participant UI
    participant ViewEffect
    participant Server

    UI->>ViewEffect: Trigger view change
    ViewEffect->>Server: Fetch new plot data
    Server-->>ViewEffect: Return response with points_field
    ViewEffect->>ViewEffect: setPointsField(response.points_field)
Loading

Suggested reviewers

  • imanjra
  • brimoor

Poem

I'm a coding rabbit, so spry and keen,
Hopping through logic where changes are seen.
Lasso points dance with point fields in flight,
New hooks and effects twinkle bright.
In every line, a carrot of delight,
A joyful hop in our coding night!
🐇🌟

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
fiftyone/server/routes/embeddings.py (2)

244-245: Use consistent naming

lassoPoints = data.get("lassoPoints", None) is descriptive, but consider consistently naming variables (e.g., lasso_points) throughout the codebase to match Python conventions for clarity.


246-263: Ensure spatial indexing for efficient queries

When creating a Mongo $geoWithin query, confirm that an appropriate geospatial index (e.g., a 2dsphere index) is in place on the point_field. Otherwise, queries might be slow or fail.

app/packages/embeddings/src/usePlotSelection.tsx (3)

1-6: Unify import style

Imports look fine here, but confirm no default or named import collisions exist elsewhere. Also ensure Recoil dependencies match the project’s version constraints.


35-35: Leverage typed state usage

const [lassoPoints, setLassoPoints] = useRecoilState(atoms.lassoPoints); works. If you want stronger type safety, consider specifying the atom’s state interface.


47-50: Clarify “unknown cases” and fallback behavior

The TODO mentions that handleSelected might be called without lassoPoints. Provide a fallback if lassoPoints is undefined, or investigate and document those “unknown cases” for maintainability.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ad6efb and 264368c.

📒 Files selected for processing (6)
  • app/packages/embeddings/src/EmbeddingsPlot.tsx (1 hunks)
  • app/packages/embeddings/src/useBrainResult.tsx (1 hunks)
  • app/packages/embeddings/src/useExtendedStageEffect.tsx (4 hunks)
  • app/packages/embeddings/src/usePlotSelection.tsx (2 hunks)
  • app/packages/embeddings/src/useViewChangeEffect.tsx (2 hunks)
  • fiftyone/server/routes/embeddings.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

  • app/packages/embeddings/src/useBrainResult.tsx
  • app/packages/embeddings/src/useViewChangeEffect.tsx
  • app/packages/embeddings/src/EmbeddingsPlot.tsx
  • app/packages/embeddings/src/useExtendedStageEffect.tsx
  • app/packages/embeddings/src/usePlotSelection.tsx
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: test / test-app
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: build
🔇 Additional comments (14)
app/packages/embeddings/src/useBrainResult.tsx (1)

9-9: Clean implementation of the point field hook.

The new usePointField hook follows the same well-established pattern as the existing useBrainResult hook, ensuring consistency in state management throughout the application.

app/packages/embeddings/src/EmbeddingsPlot.tsx (2)

95-95: Correctly extended selection handling to include lasso points.

The updated call to handleSelected now passes the selected point IDs along with the lasso points from the plotly selection event, enabling the spatial querying feature mentioned in the PR objectives.


98-98: Properly handles deselection case with the updated parameters.

The deselection handler now passes two null values instead of one, maintaining consistency with the updated function signature for handleSelected. This change ensures that both the point IDs and lasso points are properly cleared on deselection.

app/packages/embeddings/src/useViewChangeEffect.tsx (3)

6-6: Import updated to include the new hook.

The import statement has been updated to include the newly added usePointField hook, correctly maintaining the module's existing imports.


14-14: Correctly implemented point field state management.

The new hook is used to create state for tracking the point field, following the same pattern as other state variables in this component.


76-76: Properly updates point field state from API response.

The point field value is correctly extracted from the plot data response and stored in state using the setter function from the hook. This ensures the value is available for use in other components.

app/packages/embeddings/src/useExtendedStageEffect.tsx (4)

6-7: Added necessary imports for lasso points and point field functionality.

The imports for selection atoms and the point field hook are correctly added to support the new features.


21-22: Properly initialized state values for spatial querying.

The component now correctly retrieves both the lasso points from Recoil state and the point field using the new hook, making them available for use in the effect.


32-33: Enhanced fetchExtendedStage with spatial query parameters.

The function call has been updated to include the lasso points and point field parameters, enabling the MongoDB-specific spatial querying capability described in the PR objectives.


42-49: Correctly updated effect dependencies.

The dependency array for the useEffect hook has been properly expanded to include the new state values, ensuring the effect re-runs when the lasso points or point field change. The array is also well-formatted for readability.

fiftyone/server/routes/embeddings.py (2)

238-238: Validate pointField before usage

Currently, the code assumes the "pointField" key always exists in data. If there's any chance it could be missing, consider handling a KeyError or providing a default value.


361-401: Confirm coordinate ordering for MongoDB $geoWithin

This function combines x and y into [x, y] pairs, which might require [longitude, latitude] ordering for a 2dsphere index. Verify that this matches the expected coordinate system.

app/packages/embeddings/src/usePlotSelection.tsx (2)

13-18: Atom definition is clear

Defining lassoPoints in a dedicated atom is straightforward. Good approach for global state. No concerns here.


38-41: Signature includes optional parameter

function handleSelected(selectedResults, lassoPoints) is a good addition. Ensure calls to handleSelected always provide either null or a valid object for the second argument to avoid runtime errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
fiftyone/server/routes/embeddings.py (2)

268-286: Consider validating lasso_points structure before use.

The code assumes that lasso_points has the expected structure with 'x' and 'y' keys. Without validation, this could lead to KeyErrors if malformed data is provided.

        if (
            lasso_points is not None
            and points_field is not None
            and (patches_field is None or is_patches_view)
        ):
+           # Validate lasso_points structure
+           if not (isinstance(lasso_points, dict) and "x" in lasso_points and "y" in lasso_points):
+               raise ValueError("Invalid lasso_points format. Expected dict with 'x' and 'y' keys.")
+
            # Lasso points via spatial index
            # Unfortunately we can't use $geoWithin to filter nested arrays.
            # The user must have switched to a patches view for this

274-281: Consider closing the polygon for complete MongoDB compatibility.

MongoDB requires polygons to be closed (first point = last point) for certain geospatial operations. While $geoWithin may work with unclosed polygons, it's safer to explicitly close the polygon.

                            points_field: {
                                "$geoWithin": {
                                    "$polygon": [
                                        [x, y]
                                        for x, y in zip(
                                            lasso_points["x"],
                                            lasso_points["y"],
-                                       )
+                                       )
+                                   ] + [[lasso_points["x"][0], lasso_points["y"][0]]]  # Close the polygon
                                    ]
                                }
                            }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 264368c and 26a4114.

📒 Files selected for processing (5)
  • app/packages/embeddings/src/useBrainResult.tsx (1 hunks)
  • app/packages/embeddings/src/useExtendedStageEffect.tsx (4 hunks)
  • app/packages/embeddings/src/useViewChangeEffect.tsx (2 hunks)
  • fiftyone/core/stages.py (1 hunks)
  • fiftyone/server/routes/embeddings.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/packages/embeddings/src/useExtendedStageEffect.tsx
  • app/packages/embeddings/src/useBrainResult.tsx
  • app/packages/embeddings/src/useViewChangeEffect.tsx
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: build
🔇 Additional comments (6)
fiftyone/server/routes/embeddings.py (5)

243-244: LGTM! Good use of safe dictionary access for new parameters.

Using data.get() with a default value of None is a robust way to extract potentially missing keys from the request data, ensuring backward compatibility with existing clients.


255-263: Clear explanation of limitations with nested arrays.

The comment effectively explains why the lasso filtering can't work with nested arrays in MongoDB and the requirement for the user to switch to a patches view for this functionality to work. This is important context for future maintainers.


263-267: Correct field path resolution for patches.

This block correctly handles the resolution of the points field path when working with patches, ensuring proper integration with the existing field path resolution mechanism.


287-287: Correct modification of conditional.

The change from if to elif is necessary to make this part of the conditional chain and avoid executing this block if the lasso points condition is met. This ensures the correct logical flow through the various selection scenarios.


255-287:

❓ Verification inconclusive

Implementation of spatial queries using MongoDB's $geoWithin looks good.

The implementation correctly uses MongoDB's spatial query capabilities to filter documents based on lasso points. The code properly converts the lasso points into a polygon format suitable for MongoDB's $geoWithin operator. The conditional logic ensures this feature is only used in supported scenarios (when not dealing with nested arrays).

A few considerations:

  • There's no validation that lasso_points has the expected structure with 'x' and 'y' keys
  • For optimal performance, ensure that points_field has a geospatial index in MongoDB

🏁 Script executed:

#!/bin/bash
# Check if points_field is used in other parts of the codebase and if it's indexed

# Look for index creation on the points field
rg -A 3 -B 3 "create_index.*point" --glob "*.py"

# Check if points_field is used in other parts of the codebase
rg "points_field" --glob "*.py" | grep -v "embeddings.py"

Length of output: 106


Spatial Query Implementation in embeddings.py

  • The spatial query setup using $geoWithin is correctly implemented for converting lasso points into a polygon.
  • There is no validation to ensure that lasso_points contains the expected keys ('x' and 'y'). Adding such validation would help avoid potential runtime errors.
  • It is unclear whether the field used in the geospatial query (points_field) is indexed appropriately (e.g., a 2dsphere index). Manual verification or an explicit index creation in the codebase should be ensured for optimal performance.
fiftyone/core/stages.py (1)

3298-3299: Documentation addition looks good

The addition of the create_index parameter documentation is well-formatted and consistent with the rest of the docstring. This parameter is important for users to understand as it allows control over whether a spherical index is automatically created for geospatial queries, which can significantly impact performance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/server/routes/embeddings.py (1)

248-248: Remove or replace this debug print statement

It’s preferable to use proper logging or remove the debug statement before committing to production to avoid unintentional output.

- print(points_field)
+ # import logging
+ # logging.debug(points_field)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26a4114 and 97e9b35.

📒 Files selected for processing (1)
  • fiftyone/server/routes/embeddings.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: build / build
  • GitHub Check: build / changes
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: lint / eslint
  • GitHub Check: build
🔇 Additional comments (4)
fiftyone/server/routes/embeddings.py (4)

79-79: No issues found with referencing points_field

This addition looks good and follows the existing pattern of reading configuration fields.


155-155: Exposing points_field in the response is beneficial

Including "points_field" in the final JSON allows consumers to easily fetch and use this data in the client side.


245-247: Validate the presence and shape of lassoPoints

Be sure to verify that lassoPoints keys ("x", "y") exist and contain matching-length arrays. Malformed inputs could lead to runtime errors when building the $polygon.

Do you want me to generate a shell script or propose a code snippet to validate lassoPoints and guard against malformed data?


259-291: Add geometry validation for lasso-based $geoWithin queries

While constructing a MongoDB $polygon, a mismatch or missing data in lasso_points["x"] or lasso_points["y"] will trigger errors. Additionally, consider closing the polygon if needed by your geometry convention.

Here is a suggested snippet to validate input before building the $polygon:

 if (
     lasso_points is not None
     and points_field is not None
     and (patches_field is None or is_patches_view)
 ):
+    # Validate lassoPoints data
+    if (
+        not isinstance(lasso_points, dict)
+        or "x" not in lasso_points
+        or "y" not in lasso_points
+        or not isinstance(lasso_points["x"], list)
+        or not isinstance(lasso_points["y"], list)
+        or len(lasso_points["x"]) != len(lasso_points["y"])
+    ):
+        raise ValueError("Invalid lassoPoints data: mismatch in x,y arrays")

     if patches_field is not None:
         _, points_field = view._get_label_field_path(patches_field, points_field)

@brimoor brimoor changed the base branch from develop to release/v1.4.0 March 11, 2025 04:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/packages/embeddings/src/tracesToData.tsx (1)

41-46: Consider adding type annotations to arrays

While pre-allocating arrays is efficient, consider adding TypeScript type annotations to these arrays for better type safety:

-const x = new Array(trace.length);
-const y = new Array(trace.length);
-const ids = new Array(trace.length);
-const labelsForColors =
-  isContinuous && !isUncolored ? new Array(trace.length) : null;
+const x: number[] = new Array(trace.length);
+const y: number[] = new Array(trace.length);
+const ids: string[] = new Array(trace.length);
+const labelsForColors: number[] | null =
+  isContinuous && !isUncolored ? new Array(trace.length) : null;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc590e0 and 0fbd0c1.

📒 Files selected for processing (1)
  • app/packages/embeddings/src/tracesToData.tsx (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{ts,tsx}`: Review the Typescript and React code for co...

**/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

  • app/packages/embeddings/src/tracesToData.tsx
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: build
🔇 Additional comments (6)
app/packages/embeddings/src/tracesToData.tsx (6)

23-26: Good selection optimization with Set data structure

Using a Set for fast O(1) lookups when checking if an item is selected is an efficient approach, especially with potentially large datasets in embeddings visualizations.


27-30: Performance improvement: pre-computing colorscale mapping

Moving the colorscale mapping outside the trace iteration is a good optimization. This prevents redundant calculations for each trace when the colorscale is the same for all traces.


47-64: Efficient imperative approach for data processing

Replacing functional programming methods with a direct loop is more efficient for large datasets as it avoids creating intermediate arrays. This change aligns well with handling spatial queries for the lasso selection feature.


59-63: Selection logic implementation looks correct

The selection logic correctly checks both id and sample_id against the selection set, which is important for handling various identifier types in the data.


67-80: Direct array assignment in return object is more efficient

Using pre-populated arrays directly in the return object rather than creating them inline with mapping functions is a good performance optimization.


91-91: Good handling of empty selectedpoints

Setting selectedpoints to null when empty rather than an empty array is a good practice for Plotly, as it helps the library distinguish between "no selection" and "empty selection" states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants