Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding map_values() stage and edit_field_values operator #5561

Open
wants to merge 1 commit into
base: release/v1.4.0
Choose a base branch
from

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Mar 10, 2025

Change log

  • Adds a map_values() view stage that generalizes map_labels() to any field or embedded field
  • Adds an edit_field_values operator that allows for editing field values from the App
  • ViewExpression.map_values(mapping) now supports mapping dict with None keys

Example usage

map_values()

import random

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

ANIMALS = [
    "bear", "bird", "cat", "cow", "dog", "elephant", "giraffe",
    "horse", "sheep", "zebra"
]

dataset = foz.load_zoo_dataset("quickstart")

values = [random.choice(ANIMALS) for _ in range(len(dataset))]
dataset.set_values("str_field", values)
dataset.set_values("list_field", [[v] for v in values])
dataset.set_field("ground_truth.detections.tags", [F("label")]).save()

# Map all animals to string "animal"
mapping = {a: "animal" for a in ANIMALS}

#
# Map values in top-level fields
#

view = dataset.map_values("str_field", mapping)
print(view.count_values("str_field"))
# {"animal": 200}

view = dataset.map_values("list_field", mapping)
print(view.count_values("list_field"))
# {"animal": 200}

#
# Map values in nested fields
#

view = dataset.map_values("ground_truth.detections.label", mapping)
print(view.count_values("ground_truth.detections.label"))
# {"animal": 183, ...}

view = dataset.map_values("ground_truth.detections.tags", mapping)
print(view.count_values("ground_truth.detections.tags"))
# {"animal": 183, ...}

edit_field_values

# continuing from the example above
session = fo.launch_app(dataset)
edit-field-values.mov

Summary by CodeRabbit

  • New Features

    • Launched an enhanced field mapping capability that lets users transform and normalize dataset fields—including nested fields and special handling for missing values.
    • Introduced a new operator for easily editing field values during data processing workflows.
  • Documentation

    • Updated user guides and examples to reflect the improved field mapping functionality and to provide clearer usage instructions.

@brimoor brimoor added the feature Work on a feature request label Mar 10, 2025
Copy link
Contributor

coderabbitai bot commented Mar 10, 2025

Walkthrough

The changes update FiftyOne’s documentation and core functionality by replacing map_labels() with map_values(). New methods and classes have been introduced for mapping field values, including an enhanced mapping mechanism in expressions and a new MapValues view stage. Additionally, a new operator (EditFieldValues) has been added with its supporting helper functions and configuration, and tests have been implemented to verify the mapping functionality.

Changes

File(s) Change Summary
docs/source/user_guide/using_aggregations.rst, docs/source/user_guide/using_views.rst Updated documentation to reference map_values() instead of map_labels(), with revised examples for data transformation and aggregation.
fiftyone/__public__.py, fiftyone/core/collections.py, fiftyone/core/expressions.py, fiftyone/core/stages.py Added map_values() functionality: new method in collections, enhanced conditional mapping logic in expressions, and a new MapValues class registered as a view stage.
plugins/operators/__init__.py, plugins/operators/fiftyone.yml Introduced new operator EditFieldValues with associated helper functions and operator registration to support editing field values via mapping.
tests/unittests/view_tests.py Added tests (test_map_values and test_map_values_none) to verify the new mapping functionality and proper handling of None values.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant E as EditFieldValues Operator
  participant DV as DatasetView
  participant SC as SampleCollection

  User->>E: Submit field edit request
  E->>DV: Resolve inputs & call map_values(field, mapping)
  DV->>SC: Apply mapping transformation
  SC-->>DV: Return updated values
  E->>User: Confirm changes and trigger dataset reload
Loading

Suggested labels

bug

Suggested reviewers

  • benjaminpkane
  • ritch
  • minhtuev

Poem

In the field of code, I hop with delight,
Mapping values from morning till night.
With every change, our paths grow bright,
A bunny’s dance in digital light.
Carrots and code—what a joyful sight!
🥕🐰

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
tests/unittests/view_tests.py (1)

3367-3367: Consider renaming the ambiguous variable l.

The variable name l can be easily confused with the number 1 in some fonts. Consider using a more descriptive name like label or orig_label for better readability.

-            for lv, l in f:
-                if l.label in mapping:
-                    self.assertEqual(lv.label, mapping[l.label])
-                else:
-                    self.assertEqual(lv.label, l.label)
+            for lv, orig_label in f:
+                if orig_label.label in mapping:
+                    self.assertEqual(lv.label, mapping[orig_label.label])
+                else:
+                    self.assertEqual(lv.label, orig_label.label)
🧰 Tools
🪛 Ruff (0.8.2)

3367-3367: Ambiguous variable name: l

(E741)

docs/source/user_guide/using_views.rst (1)

2413-2414: Clarify Field Transformation API Documentation

The updated introduction to field transformation methods now correctly lists the new map_values() method along with set_field() and map_labels(). This clarifies that users can temporarily modify field values in a view using these methods. It might be useful to include a brief note mentioning that map_values() supports mappings that include None keys if that detail is important for user expectations.

docs/source/user_guide/using_aggregations.rst (1)

802-805: Integrate New Transformation Methods in Aggregation Workflows

The transformation section now explicitly mentions using map_values() (alongside map_labels()) with aggregations. This addition is valuable for users who want to pre-process or normalize data before computing statistics. Consider expanding the accompanying text to note the benefits—such as supporting None keys—in cases where label standardization is required.

plugins/operators/__init__.py (3)

12-13: Consider handling invalid ObjectId strings
Currently, there's no safeguard against malformed strings that could cause ObjectId(...) to raise an exception. If user input is untrusted, you may want to add explicit error handling or validation before converting into an ObjectId.


189-293: Potential performance concerns for large datasets
Loading all distinct field values in _edit_field_values_inputs (via target_view.get_field_schema and enumerating big radio groups) may become expensive for large datasets. Consider adding lazy loading of options or restricting the number of retrieved values when the field cardinality is very high.


295-327: Consider optimizing count_values() and distinct() usage for large fields
_build_map_values_entry calls count_values and then falls back to distinct if using the full dataset. For fields with large cardinalities, this could lead to heavy queries. Explore caching or incremental retrieval strategies if performance becomes an issue.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8610cf4 and 4f991ce.

📒 Files selected for processing (9)
  • docs/source/user_guide/using_aggregations.rst (2 hunks)
  • docs/source/user_guide/using_views.rst (3 hunks)
  • fiftyone/__public__.py (1 hunks)
  • fiftyone/core/collections.py (1 hunks)
  • fiftyone/core/expressions.py (1 hunks)
  • fiftyone/core/stages.py (2 hunks)
  • plugins/operators/__init__.py (3 hunks)
  • plugins/operators/fiftyone.yml (1 hunks)
  • tests/unittests/view_tests.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/__public__.py

214-214: .core.stages.MapValues imported but unused

Remove unused import

(F401)

tests/unittests/view_tests.py

3367-3367: Ambiguous variable name: l

(E741)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: lint / eslint
  • GitHub Check: build
🔇 Additional comments (13)
fiftyone/core/expressions.py (1)

1999-2022: Improved implementation of map_values with None key handling

The new implementation enhances the map_values method by adding support for None keys in the mapping dictionary. This is a valuable addition that enables transforming null values within datasets.

The code uses a nested conditional structure:

  1. First checks if the current value is greater than None (effectively, if it's not None)
  2. If true, applies the standard mapping logic
  3. If false, returns the value associated with None in the mapping

This implementation is well-structured and handles edge cases appropriately.

fiftyone/__public__.py (1)

214-214: Added MapValues to public API

The MapValues class has been exposed in the public API, making it available for users to map values in both top-level and nested fields. This is consistent with the PR objective of generalizing the existing mapping functionality.

Note that the static analysis tool flagged this import as unused, but this is a false positive since it's being exported as part of the public interface.

🧰 Tools
🪛 Ruff (0.8.2)

214-214: .core.stages.MapValues imported but unused

Remove unused import

(F401)

plugins/operators/fiftyone.yml (1)

9-9: Added edit_field_values operator

Added a new operator edit_field_values that enables users to edit field values directly from the FiftyOne App interface. This complements the existing edit_field_info operator and enhances user interaction with the dataset by providing a straightforward way to modify values.

fiftyone/core/collections.py (1)

5818-5883: Great addition! This generalized mapping function is a useful enhancement.

The new map_values() method provides a flexible way to transform any field values in a collection, generalizing the existing map_labels() functionality. This enables users to easily map both top-level and embedded field values using a dictionary mapping.

The implementation follows the established pattern for view stages, and the documentation includes comprehensive examples showing its usage with different field types. This should be immediately useful for data transformation workflows.

tests/unittests/view_tests.py (2)

3341-3372: LGTM! Good test coverage for the map_values functionality.

This test method appropriately verifies that the map_values() method correctly transforms both single labels and lists of labels according to the provided mapping dictionary. The test covers both Classification and Detection objects as well as their plural counterparts.

🧰 Tools
🪛 Ruff (0.8.2)

3367-3367: Ambiguous variable name: l

(E741)


3373-3402: LGTM! Good handling of edge cases with None values.

This test properly verifies that map_values() can handle None as both keys and values in the mapping dictionary, which is an important edge case to cover. The test confirms that None values can be transformed to actual values and back again.

fiftyone/core/stages.py (2)

4257-4382: Well-implemented generalization of MapLabels for arbitrary field values

The MapValues stage elegantly extends the functionality of the existing MapLabels stage by allowing value mapping for any field type rather than just label fields. The implementation follows consistent patterns with other view stages in the file, with proper handling for list fields, frames, and group slices. The documentation examples clearly demonstrate the flexibility of this new stage for mapping values in both top-level and embedded fields.


9031-9031: Correctly registered MapValues in available stages list

The new MapValues stage has been properly registered in the _STAGES list, making it available for use within the FiftyOne framework.

docs/source/user_guide/using_views.rst (1)

2421-2422: Update Example for Renaming Labels with map_values()

The example now correctly demonstrates how to use map_values() to rename labels (e.g., mapping a group of animal names to a single category). This usage aligns well with the new API. Please verify that the resulting behavior (e.g. the final count of labels) matches the intended semantics of the new transformation.

docs/source/user_guide/using_aggregations.rst (1)

822-829: Validate Example for Normalized Label Aggregation

The updated example that maps "cat" and "dog" to "pet" using map_values() is clear and aligns with the intended usage for aggregating transformed data. Please check that the resulting histogram (i.e. the count for "pet") reflects the proper grouping and that it is fully consistent with the new API's behavior.

plugins/operators/__init__.py (3)

132-139: Class definition and operator config look good
The class name EditFieldValues is consistent with the operator’s purpose, and the dynamic config appropriately handles runtime form generation.


141-148: No issues with the input resolution logic
The method cleanly delegates form construction to _edit_field_values_inputs. The returned types.Property definition is consistent with other operators in this file.


2599-2599: Operator registration is correct
Registering EditFieldValues ensures that the new operator is available. This finalizes integration with the rest of the plugin system.

Comment on lines +174 to +187
def _make_parse_field_fcn(ctx, path):
field = ctx.dataset.get_field(path)

if isinstance(field, fo.StringField):
return lambda v: str(v) if v != "None" else None
elif isinstance(field, fo.BooleanField):
return lambda v: v == "True" if v != "None" else None
elif isinstance(field, fo.ObjectIdField):
return lambda v: ObjectId(v) if v != "None" else None
elif isinstance(field, fo.IntField):
return lambda v: int(v) if v != "None" else None
else:
return lambda v: v

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Gracefully handle potential conversion errors in _make_parse_field_fcn
A user providing an invalid int or object ID will trigger a runtime exception. Consider wrapping these conversions in a try/except block to provide meaningful feedback.

Comment on lines +150 to +172
def execute(self, ctx):
path = ctx.params["path"]
map = ctx.params["map"]
target = ctx.params.get("target", None)

target_view = _get_target_view(ctx, target)

f = _make_parse_field_fcn(ctx, path)

_map = {}
for d in map:
current = d["current"]
new = d["new"]

if not isinstance(current, list):
current = [current]

for c in current:
_map[f(c)] = f(new)

target_view.map_values(path, _map).save()
ctx.trigger("reload_dataset")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and exception handling for missing or invalid mapping entries
While iterating over the map objects, the code assumes "current" and "new" are always present. A missing key or invalid data could cause unhandled errors. Additionally, any conversion errors from _make_parse_field_fcn (e.g., casting to int) will raise exceptions without being caught.

Here’s a possible approach to safeguard the loop:

 def execute(self, ctx):
     path = ctx.params["path"]
     map_vals = ctx.params["map"]
     target = ctx.params.get("target", None)

     target_view = _get_target_view(ctx, target)
     f = _make_parse_field_fcn(ctx, path)

     _map = {}
     for d in map_vals:
+        if "current" not in d or "new" not in d:
+            # Optionally handle or skip invalid entries
+            continue
         current = d["current"]
         new = d["new"]

         if not isinstance(current, list):
             current = [current]

         for c in current:
+            try:
                 _map[f(c)] = f(new)
+            except (ValueError, TypeError):
+                # Handle conversion errors
+                continue

     target_view.map_values(path, _map).save()
     ctx.trigger("reload_dataset")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def execute(self, ctx):
path = ctx.params["path"]
map = ctx.params["map"]
target = ctx.params.get("target", None)
target_view = _get_target_view(ctx, target)
f = _make_parse_field_fcn(ctx, path)
_map = {}
for d in map:
current = d["current"]
new = d["new"]
if not isinstance(current, list):
current = [current]
for c in current:
_map[f(c)] = f(new)
target_view.map_values(path, _map).save()
ctx.trigger("reload_dataset")
def execute(self, ctx):
path = ctx.params["path"]
map_vals = ctx.params["map"]
target = ctx.params.get("target", None)
target_view = _get_target_view(ctx, target)
f = _make_parse_field_fcn(ctx, path)
_map = {}
for d in map_vals:
if "current" not in d or "new" not in d:
# Optionally handle or skip invalid entries
continue
current = d["current"]
new = d["new"]
if not isinstance(current, list):
current = [current]
for c in current:
try:
_map[f(c)] = f(new)
except (ValueError, TypeError):
# Handle conversion errors
continue
target_view.map_values(path, _map).save()
ctx.trigger("reload_dataset")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant