Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating Manual Class Registration in Unitxt with Import Paths #1575

Open
elronbandel opened this issue Feb 4, 2025 · 0 comments
Open
Assignees

Comments

@elronbandel
Copy link
Member

Problem Statement

In Unitxt, every artifact in the catalog includes a __type__ field in its JSON representation. This field stores the class that was used to instantiate the artifact, which is necessary for loading it back into a Python instance.

Currently, Unitxt relies on a class registry that maps a prettified class name to its actual class. The __type__ field stores the prettified name, and when an artifact is loaded, this name is used to look up the original class in the registry.

However, this approach introduces several challenges:

  1. Manual Class Registration – Any class that might appear in the catalog must be registered in advance.
  2. Import Dependencies – Users must explicitly import all custom classes used in the catalog within any code accessing it. This can be difficult to debug and communicate to users.
  3. Ongoing Maintenance – Users frequently encounter this issue and must manually maintain the solution.

Proposed Solution

Instead of storing a prettified name, we propose changing the __type__ field to store:

  • A full import path (e.g., "unitxt.loaders.LoadHF") for globally available classes.
  • A relative import path (e.g., ".MyOperator") based on a registered folder.

By default, the current working directory will be automatically registered, making the system more intuitive for small projects running locally.

Benefits of the Proposed Change

  1. No More Manual Class Registration – Libraries using Unitxt will no longer need to register their classes manually.
  2. Improved Usability for Small Projects – Projects operating within a single working directory will work seamlessly using relative imports.
  3. Support for Larger Projects – Projects without a formal package structure can register their main directories and use relative imports.

This change will make Unitxt more user-friendly, reduce setup complexity, and improve error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants