-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding map_values()
stage and edit_field_values
operator
#5561
base: release/v1.4.0
Are you sure you want to change the base?
Conversation
WalkthroughThe changes update FiftyOne’s documentation and core functionality by replacing Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant E as EditFieldValues Operator
participant DV as DatasetView
participant SC as SampleCollection
User->>E: Submit field edit request
E->>DV: Resolve inputs & call map_values(field, mapping)
DV->>SC: Apply mapping transformation
SC-->>DV: Return updated values
E->>User: Confirm changes and trigger dataset reload
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
tests/unittests/view_tests.py (1)
3367-3367
: Consider renaming the ambiguous variablel
.The variable name
l
can be easily confused with the number1
in some fonts. Consider using a more descriptive name likelabel
ororig_label
for better readability.- for lv, l in f: - if l.label in mapping: - self.assertEqual(lv.label, mapping[l.label]) - else: - self.assertEqual(lv.label, l.label) + for lv, orig_label in f: + if orig_label.label in mapping: + self.assertEqual(lv.label, mapping[orig_label.label]) + else: + self.assertEqual(lv.label, orig_label.label)🧰 Tools
🪛 Ruff (0.8.2)
3367-3367: Ambiguous variable name:
l
(E741)
docs/source/user_guide/using_views.rst (1)
2413-2414
: Clarify Field Transformation API DocumentationThe updated introduction to field transformation methods now correctly lists the new
map_values()
method along withset_field()
andmap_labels()
. This clarifies that users can temporarily modify field values in a view using these methods. It might be useful to include a brief note mentioning thatmap_values()
supports mappings that includeNone
keys if that detail is important for user expectations.docs/source/user_guide/using_aggregations.rst (1)
802-805
: Integrate New Transformation Methods in Aggregation WorkflowsThe transformation section now explicitly mentions using
map_values()
(alongsidemap_labels()
) with aggregations. This addition is valuable for users who want to pre-process or normalize data before computing statistics. Consider expanding the accompanying text to note the benefits—such as supportingNone
keys—in cases where label standardization is required.plugins/operators/__init__.py (3)
12-13
: Consider handling invalid ObjectId strings
Currently, there's no safeguard against malformed strings that could causeObjectId(...)
to raise an exception. If user input is untrusted, you may want to add explicit error handling or validation before converting into anObjectId
.
189-293
: Potential performance concerns for large datasets
Loading all distinct field values in_edit_field_values_inputs
(viatarget_view.get_field_schema
and enumerating big radio groups) may become expensive for large datasets. Consider adding lazy loading of options or restricting the number of retrieved values when the field cardinality is very high.
295-327
: Consider optimizingcount_values()
anddistinct()
usage for large fields
_build_map_values_entry
callscount_values
and then falls back todistinct
if using the full dataset. For fields with large cardinalities, this could lead to heavy queries. Explore caching or incremental retrieval strategies if performance becomes an issue.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/source/user_guide/using_aggregations.rst
(2 hunks)docs/source/user_guide/using_views.rst
(3 hunks)fiftyone/__public__.py
(1 hunks)fiftyone/core/collections.py
(1 hunks)fiftyone/core/expressions.py
(1 hunks)fiftyone/core/stages.py
(2 hunks)plugins/operators/__init__.py
(3 hunks)plugins/operators/fiftyone.yml
(1 hunks)tests/unittests/view_tests.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/__public__.py
214-214: .core.stages.MapValues
imported but unused
Remove unused import
(F401)
tests/unittests/view_tests.py
3367-3367: Ambiguous variable name: l
(E741)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-app
- GitHub Check: e2e / test-e2e
- GitHub Check: build / build
- GitHub Check: lint / eslint
- GitHub Check: build
🔇 Additional comments (13)
fiftyone/core/expressions.py (1)
1999-2022
: Improved implementation of map_values with None key handlingThe new implementation enhances the
map_values
method by adding support forNone
keys in the mapping dictionary. This is a valuable addition that enables transforming null values within datasets.The code uses a nested conditional structure:
- First checks if the current value is greater than
None
(effectively, if it's not None)- If true, applies the standard mapping logic
- If false, returns the value associated with
None
in the mappingThis implementation is well-structured and handles edge cases appropriately.
fiftyone/__public__.py (1)
214-214
: Added MapValues to public APIThe
MapValues
class has been exposed in the public API, making it available for users to map values in both top-level and nested fields. This is consistent with the PR objective of generalizing the existing mapping functionality.Note that the static analysis tool flagged this import as unused, but this is a false positive since it's being exported as part of the public interface.
🧰 Tools
🪛 Ruff (0.8.2)
214-214:
.core.stages.MapValues
imported but unusedRemove unused import
(F401)
plugins/operators/fiftyone.yml (1)
9-9
: Added edit_field_values operatorAdded a new operator
edit_field_values
that enables users to edit field values directly from the FiftyOne App interface. This complements the existingedit_field_info
operator and enhances user interaction with the dataset by providing a straightforward way to modify values.fiftyone/core/collections.py (1)
5818-5883
: Great addition! This generalized mapping function is a useful enhancement.The new
map_values()
method provides a flexible way to transform any field values in a collection, generalizing the existingmap_labels()
functionality. This enables users to easily map both top-level and embedded field values using a dictionary mapping.The implementation follows the established pattern for view stages, and the documentation includes comprehensive examples showing its usage with different field types. This should be immediately useful for data transformation workflows.
tests/unittests/view_tests.py (2)
3341-3372
: LGTM! Good test coverage for the map_values functionality.This test method appropriately verifies that the
map_values()
method correctly transforms both single labels and lists of labels according to the provided mapping dictionary. The test covers both Classification and Detection objects as well as their plural counterparts.🧰 Tools
🪛 Ruff (0.8.2)
3367-3367: Ambiguous variable name:
l
(E741)
3373-3402
: LGTM! Good handling of edge cases with None values.This test properly verifies that
map_values()
can handleNone
as both keys and values in the mapping dictionary, which is an important edge case to cover. The test confirms thatNone
values can be transformed to actual values and back again.fiftyone/core/stages.py (2)
4257-4382
: Well-implemented generalization of MapLabels for arbitrary field valuesThe
MapValues
stage elegantly extends the functionality of the existingMapLabels
stage by allowing value mapping for any field type rather than just label fields. The implementation follows consistent patterns with other view stages in the file, with proper handling for list fields, frames, and group slices. The documentation examples clearly demonstrate the flexibility of this new stage for mapping values in both top-level and embedded fields.
9031-9031
: Correctly registered MapValues in available stages listThe new
MapValues
stage has been properly registered in the_STAGES
list, making it available for use within the FiftyOne framework.docs/source/user_guide/using_views.rst (1)
2421-2422
: Update Example for Renaming Labels withmap_values()
The example now correctly demonstrates how to use
map_values()
to rename labels (e.g., mapping a group of animal names to a single category). This usage aligns well with the new API. Please verify that the resulting behavior (e.g. the final count of labels) matches the intended semantics of the new transformation.docs/source/user_guide/using_aggregations.rst (1)
822-829
: Validate Example for Normalized Label AggregationThe updated example that maps
"cat"
and"dog"
to"pet"
usingmap_values()
is clear and aligns with the intended usage for aggregating transformed data. Please check that the resulting histogram (i.e. the count for"pet"
) reflects the proper grouping and that it is fully consistent with the new API's behavior.plugins/operators/__init__.py (3)
132-139
: Class definition and operator config look good
The class nameEditFieldValues
is consistent with the operator’s purpose, and the dynamic config appropriately handles runtime form generation.
141-148
: No issues with the input resolution logic
The method cleanly delegates form construction to_edit_field_values_inputs
. The returnedtypes.Property
definition is consistent with other operators in this file.
2599-2599
: Operator registration is correct
RegisteringEditFieldValues
ensures that the new operator is available. This finalizes integration with the rest of the plugin system.
def _make_parse_field_fcn(ctx, path): | ||
field = ctx.dataset.get_field(path) | ||
|
||
if isinstance(field, fo.StringField): | ||
return lambda v: str(v) if v != "None" else None | ||
elif isinstance(field, fo.BooleanField): | ||
return lambda v: v == "True" if v != "None" else None | ||
elif isinstance(field, fo.ObjectIdField): | ||
return lambda v: ObjectId(v) if v != "None" else None | ||
elif isinstance(field, fo.IntField): | ||
return lambda v: int(v) if v != "None" else None | ||
else: | ||
return lambda v: v | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Gracefully handle potential conversion errors in _make_parse_field_fcn
A user providing an invalid int or object ID will trigger a runtime exception. Consider wrapping these conversions in a try/except block to provide meaningful feedback.
def execute(self, ctx): | ||
path = ctx.params["path"] | ||
map = ctx.params["map"] | ||
target = ctx.params.get("target", None) | ||
|
||
target_view = _get_target_view(ctx, target) | ||
|
||
f = _make_parse_field_fcn(ctx, path) | ||
|
||
_map = {} | ||
for d in map: | ||
current = d["current"] | ||
new = d["new"] | ||
|
||
if not isinstance(current, list): | ||
current = [current] | ||
|
||
for c in current: | ||
_map[f(c)] = f(new) | ||
|
||
target_view.map_values(path, _map).save() | ||
ctx.trigger("reload_dataset") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation and exception handling for missing or invalid mapping entries
While iterating over the map
objects, the code assumes "current"
and "new"
are always present. A missing key or invalid data could cause unhandled errors. Additionally, any conversion errors from _make_parse_field_fcn
(e.g., casting to int) will raise exceptions without being caught.
Here’s a possible approach to safeguard the loop:
def execute(self, ctx):
path = ctx.params["path"]
map_vals = ctx.params["map"]
target = ctx.params.get("target", None)
target_view = _get_target_view(ctx, target)
f = _make_parse_field_fcn(ctx, path)
_map = {}
for d in map_vals:
+ if "current" not in d or "new" not in d:
+ # Optionally handle or skip invalid entries
+ continue
current = d["current"]
new = d["new"]
if not isinstance(current, list):
current = [current]
for c in current:
+ try:
_map[f(c)] = f(new)
+ except (ValueError, TypeError):
+ # Handle conversion errors
+ continue
target_view.map_values(path, _map).save()
ctx.trigger("reload_dataset")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def execute(self, ctx): | |
path = ctx.params["path"] | |
map = ctx.params["map"] | |
target = ctx.params.get("target", None) | |
target_view = _get_target_view(ctx, target) | |
f = _make_parse_field_fcn(ctx, path) | |
_map = {} | |
for d in map: | |
current = d["current"] | |
new = d["new"] | |
if not isinstance(current, list): | |
current = [current] | |
for c in current: | |
_map[f(c)] = f(new) | |
target_view.map_values(path, _map).save() | |
ctx.trigger("reload_dataset") | |
def execute(self, ctx): | |
path = ctx.params["path"] | |
map_vals = ctx.params["map"] | |
target = ctx.params.get("target", None) | |
target_view = _get_target_view(ctx, target) | |
f = _make_parse_field_fcn(ctx, path) | |
_map = {} | |
for d in map_vals: | |
if "current" not in d or "new" not in d: | |
# Optionally handle or skip invalid entries | |
continue | |
current = d["current"] | |
new = d["new"] | |
if not isinstance(current, list): | |
current = [current] | |
for c in current: | |
try: | |
_map[f(c)] = f(new) | |
except (ValueError, TypeError): | |
# Handle conversion errors | |
continue | |
target_view.map_values(path, _map).save() | |
ctx.trigger("reload_dataset") |
Change log
map_values()
view stage that generalizesmap_labels()
to any field or embedded fieldedit_field_values
operator that allows for editing field values from the AppViewExpression.map_values(mapping)
now supportsmapping
dict withNone
keysExample usage
map_values()
edit_field_values
edit-field-values.mov
Summary by CodeRabbit
New Features
Documentation