You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Unitxt, every artifact in the catalog includes a __type__ field in its JSON representation. This field stores the class that was used to instantiate the artifact, which is necessary for loading it back into a Python instance.
Currently, Unitxt relies on a class registry that maps a prettified class name to its actual class. The __type__ field stores the prettified name, and when an artifact is loaded, this name is used to look up the original class in the registry.
However, this approach introduces several challenges:
Manual Class Registration – Any class that might appear in the catalog must be registered in advance.
Import Dependencies – Users must explicitly import all custom classes used in the catalog within any code accessing it. This can be difficult to debug and communicate to users.
Ongoing Maintenance – Users frequently encounter this issue and must manually maintain the solution.
Proposed Solution
Instead of storing a prettified name, we propose changing the __type__ field to store:
A full import path (e.g., "unitxt.loaders.LoadHF") for globally available classes.
A relative import path (e.g., ".MyOperator") based on a registered folder.
By default, the current working directory will be automatically registered, making the system more intuitive for small projects running locally.
Benefits of the Proposed Change
No More Manual Class Registration – Libraries using Unitxt will no longer need to register their classes manually.
Improved Usability for Small Projects – Projects operating within a single working directory will work seamlessly using relative imports.
Support for Larger Projects – Projects without a formal package structure can register their main directories and use relative imports.
This change will make Unitxt more user-friendly, reduce setup complexity, and improve error handling.
The text was updated successfully, but these errors were encountered:
Problem Statement
In Unitxt, every artifact in the catalog includes a
__type__
field in its JSON representation. This field stores the class that was used to instantiate the artifact, which is necessary for loading it back into a Python instance.Currently, Unitxt relies on a class registry that maps a prettified class name to its actual class. The
__type__
field stores the prettified name, and when an artifact is loaded, this name is used to look up the original class in the registry.However, this approach introduces several challenges:
Proposed Solution
Instead of storing a prettified name, we propose changing the
__type__
field to store:"unitxt.loaders.LoadHF"
) for globally available classes.".MyOperator"
) based on a registered folder.By default, the current working directory will be automatically registered, making the system more intuitive for small projects running locally.
Benefits of the Proposed Change
This change will make Unitxt more user-friendly, reduce setup complexity, and improve error handling.
The text was updated successfully, but these errors were encountered: