Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mmm-18320] operator create registered model registered model version #136

Conversation

paulsliusar
Copy link
Contributor

This repository is public. Do not put here any private DataRobot or customer's data: code, datasets, model artifacts, .etc.

Summary

This operator is created to cover functionality of creating a registered model version for several different model types

  • Leaderboard Model
  • Custom Model
  • External Model
    for detailed description you can follow update in readme file and for more details on client side

Rationale

@engprod-2
Copy link

engprod-2 bot commented Feb 5, 2025

The Needs Review labels were added based on the following file changes.

Team @datarobot/machine-learning-development (#machine-learning-dev) was assigned because of changes in files:

README.md
datarobot_provider/operators/model_registry.py
tests/unit/operators/test_model_registry.py

If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file.

Copy link

@Zotkin Zotkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the comments regarding registered model name/id (The confusion is probably coming from the bug in the documentation example code that needs to be fixed).
Additional question: all model package creation endpoints accept more parameters than implemented here (for example, external models allow for training data assignment when creation model package). Is this something you are planning to address in the future?

README.md Outdated
| `model_id` | str | (Required for leaderboard) The ID of the leaderboard model. |
| `registered_model_name` | str | (Required for leaderboard) Name of the registered model. |
| `custom_model_version_id` | str | (Required for custom) The ID of the custom model version. |
| `registered_model_id` | str | (Required for external) The ID of the registered model. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic and text description regarding the registerd_model_name and registered_model_id are incorrect. Those parameters are mutually exclusive and do not depend on underlying model type.
When providing registered model name you are saying: please take this model, create a model package (registered model version) and assign it to new registered model with given name.
When providing registered_model_id you are assigning newly created model package to the existing registered model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right will correct this, as for other parameters that are optional will extend methods to pass them as kwargs, and update readme file

README.md Outdated
| `model_type` | str | Type of model version to create (leaderboard, custom, or external).|
| `name` | str | Name of the registered model version.|
| `model_id` | str | (Required for leaderboard) The ID of the leaderboard model. |
| `registered_model_name` | str | (Required for leaderboard) Name of the registered model. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter has nothing to do with leaderboard or not.

self.log.info(f"Successfully created model version: {version.id}")
return version.id

def create_for_leaderboard(self, model_name):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above comments, this should probably be reworked with regards to registered_model_name/id

Copy link
Contributor

@tsdaemon tsdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note to CHANGES.
Also, I believe there is no need to write about every operator in README. @mjnitz02 prepared autogenerated documentation, and we can release it with the first new versionn.

Comment on lines 32 to 34
template_fields_renderers: dict[str, str] = {}
template_ext: Sequence[str] = ()
ui_color = "#f4a460"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove this boiler plate code thanks to using the BaseDatarobotOperator 😎

Suggested change
template_fields_renderers: dict[str, str] = {}
template_ext: Sequence[str] = ()
ui_color = "#f4a460"

Comment on lines 40 to 45
datarobot_conn_id: str = "datarobot_default",
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
self.model_version_params = model_version_params
self.datarobot_conn_id = datarobot_conn_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with some boiler plate code - it could be useful to check against this PR's changes for examples:
https://github.com/datarobot/airflow-provider-datarobot/pull/135/files

Suggested change
datarobot_conn_id: str = "datarobot_default",
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
self.model_version_params = model_version_params
self.datarobot_conn_id = datarobot_conn_id
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
self.model_version_params = model_version_params


def execute(self, context: Context) -> str:
"""Executes the operator to create a registered model version."""
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also taken care of in the newer base class 👍

Suggested change
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run()

Copy link
Collaborator

@andrius-senulis andrius-senulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


def execute(self, context: Context) -> str:
"""Executes the operator to create a registered model version."""
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in the base operator

Comment on lines 52 to 58
if not model_type:
raise ValueError("'model_type' must be specified in model_version_params.")

try:
model_type_enum = ModelType(model_type)
except ValueError:
raise AirflowException(f"Invalid model_type: {model_type}") from None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move these lines into the validate() method now provided by the newer base class?

I think lines should work as is as long as you alter:
if not model_type:
to be:
if not self.model_version_params.get("model_type"):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this new file is missing a copyright - I had assumed each new file should have that.

CC @mjnitz02 maybe we should be sure (assuming we need copyrights) that we're running a copyright checker as part of PR checks?

- External Model

:param model_version_params: Dictionary with parameters for creating model version.
:param datarobot_conn_id: Airflow connection ID for DataRobot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, probably last comment related to the new base class, but I think we can also remove this.

Suggested change
:param datarobot_conn_id: Airflow connection ID for DataRobot.

@paulsliusar paulsliusar requested a review from Zotkin February 7, 2025 11:29
Copy link

@Zotkin Zotkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in terms of implementation. Given that this PR includes a lot of customer facing strings, I'd suggest getting a review from @n-aylward .

@paulsliusar paulsliusar merged commit a46e327 into main Feb 10, 2025
4 checks passed
@svc-engprod-git1 svc-engprod-git1 deleted the paulsliusar/MMM-18320-operator-create-registered-model-registered-model-version branch February 10, 2025 14:35
@n-aylward
Copy link
Contributor

n-aylward commented Feb 10, 2025

@Zotkin @paulsliusar Apologies. I was out this morning for a doctor's appointment. Since this is already merged, perhaps we should just have someone from the doc team (potentially me) review the new content in the course of the documentation work for DOC-7718.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants