-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mmm-18320] operator create registered model registered model version #136
[mmm-18320] operator create registered model registered model version #136
Conversation
The Needs Review labels were added based on the following file changes. Team @datarobot/machine-learning-development (#machine-learning-dev) was assigned because of changes in files:README.md datarobot_provider/operators/model_registry.py tests/unit/operators/test_model_registry.py If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the comments regarding registered model name/id (The confusion is probably coming from the bug in the documentation example code that needs to be fixed).
Additional question: all model package creation endpoints accept more parameters than implemented here (for example, external models allow for training data assignment when creation model package). Is this something you are planning to address in the future?
README.md
Outdated
| `model_id` | str | (Required for leaderboard) The ID of the leaderboard model. | | ||
| `registered_model_name` | str | (Required for leaderboard) Name of the registered model. | | ||
| `custom_model_version_id` | str | (Required for custom) The ID of the custom model version. | | ||
| `registered_model_id` | str | (Required for external) The ID of the registered model. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic and text description regarding the registerd_model_name and registered_model_id are incorrect. Those parameters are mutually exclusive and do not depend on underlying model type.
When providing registered model name you are saying: please take this model, create a model package (registered model version) and assign it to new registered model with given name.
When providing registered_model_id you are assigning newly created model package to the existing registered model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right will correct this, as for other parameters that are optional will extend methods to pass them as kwargs, and update readme file
README.md
Outdated
| `model_type` | str | Type of model version to create (leaderboard, custom, or external).| | ||
| `name` | str | Name of the registered model version.| | ||
| `model_id` | str | (Required for leaderboard) The ID of the leaderboard model. | | ||
| `registered_model_name` | str | (Required for leaderboard) Name of the registered model. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter has nothing to do with leaderboard or not.
self.log.info(f"Successfully created model version: {version.id}") | ||
return version.id | ||
|
||
def create_for_leaderboard(self, model_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the above comments, this should probably be reworked with regards to registered_model_name/id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note to CHANGES.
Also, I believe there is no need to write about every operator in README. @mjnitz02 prepared autogenerated documentation, and we can release it with the first new versionn.
template_fields_renderers: dict[str, str] = {} | ||
template_ext: Sequence[str] = () | ||
ui_color = "#f4a460" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to remove this boiler plate code thanks to using the BaseDatarobotOperator
😎
template_fields_renderers: dict[str, str] = {} | |
template_ext: Sequence[str] = () | |
ui_color = "#f4a460" |
datarobot_conn_id: str = "datarobot_default", | ||
**kwargs: Any, | ||
) -> None: | ||
super().__init__(**kwargs) | ||
self.model_version_params = model_version_params | ||
self.datarobot_conn_id = datarobot_conn_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here with some boiler plate code - it could be useful to check against this PR's changes for examples:
https://github.com/datarobot/airflow-provider-datarobot/pull/135/files
datarobot_conn_id: str = "datarobot_default", | |
**kwargs: Any, | |
) -> None: | |
super().__init__(**kwargs) | |
self.model_version_params = model_version_params | |
self.datarobot_conn_id = datarobot_conn_id | |
**kwargs: Any, | |
) -> None: | |
super().__init__(**kwargs) | |
self.model_version_params = model_version_params |
|
||
def execute(self, context: Context) -> str: | ||
"""Executes the operator to create a registered model version.""" | ||
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also taken care of in the newer base class 👍
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
def execute(self, context: Context) -> str: | ||
"""Executes the operator to create a registered model version.""" | ||
DataRobotHook(datarobot_conn_id=self.datarobot_conn_id).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens in the base operator
if not model_type: | ||
raise ValueError("'model_type' must be specified in model_version_params.") | ||
|
||
try: | ||
model_type_enum = ModelType(model_type) | ||
except ValueError: | ||
raise AirflowException(f"Invalid model_type: {model_type}") from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these lines into the validate()
method now provided by the newer base class?
I think lines should work as is as long as you alter:
if not model_type:
to be:
if not self.model_version_params.get("model_type"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this new file is missing a copyright - I had assumed each new file should have that.
CC @mjnitz02 maybe we should be sure (assuming we need copyrights) that we're running a copyright checker as part of PR checks?
- External Model | ||
|
||
:param model_version_params: Dictionary with parameters for creating model version. | ||
:param datarobot_conn_id: Airflow connection ID for DataRobot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, probably last comment related to the new base class, but I think we can also remove this.
:param datarobot_conn_id: Airflow connection ID for DataRobot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in terms of implementation. Given that this PR includes a lot of customer facing strings, I'd suggest getting a review from @n-aylward .
…ered-model-registered-model-version
@Zotkin @paulsliusar Apologies. I was out this morning for a doctor's appointment. Since this is already merged, perhaps we should just have someone from the doc team (potentially me) review the new content in the course of the documentation work for DOC-7718. |
This repository is public. Do not put here any private DataRobot or customer's data: code, datasets, model artifacts, .etc.
Summary
This operator is created to cover functionality of creating a registered model version for several different model types
for detailed description you can follow update in readme file and for more details on client side
Rationale