-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DM-17022] Default use case #144
Conversation
8bff6a0
to
561f976
Compare
561f976
to
2f853c9
Compare
The Needs Review labels were added based on the following file changes. Team @datarobot/machine-learning-development (#machine-learning-dev) was assigned because of changes in files:datarobot_provider/example_dags/hospital_readmissions_example.py datarobot_provider/operators/base_datarobot_operator.py datarobot_provider/operators/datarobot.py tests/unit/conftest.py tests/unit/operators/test_ai_catalog.py tests/unit/operators/test_base_datarobot_operator.py tests/unit/operators/test_datarobot.py Team @datarobot/mldata (#mldata) was assigned because of changes in files:datarobot_provider/operators/ai_catalog.py If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file. |
class BaseUseCaseEntityOperator(BaseDatarobotOperator): | ||
"""Base class for all operators which create a Use Case assets, | ||
i.e. dataset, recipe, project, deployment, notebook, application, etc. | ||
|
||
It introduces *use_case_id* template parameter and helper methods | ||
to use a default Use Case if no Use Case was defined in the operator itself or a DAG context. | ||
""" | ||
|
||
@classmethod | ||
def __init_subclass__(cls, **kwargs): | ||
super().__init_subclass__(**kwargs) | ||
if "use_case_id" not in cls.template_fields: | ||
cls.template_fields = (*cls.template_fields, "use_case_id") | ||
cls.logger().info( | ||
"use_case_id is implicitly added into %s template_fields list.", cls.__name__ | ||
) | ||
|
||
def __init__( | ||
self, *, use_case_id: Optional[str] = "{{ params.use_case_id | default('') }}", **kwargs | ||
): | ||
super().__init__(**kwargs) | ||
self.use_case_id = use_case_id | ||
|
||
def get_use_case(self, context: Context, required=False) -> Optional[dr.UseCase]: | ||
"""Get a `dr.UseCase` entity based on self.use_case_id or a default Use Case defined at runtime. | ||
Raises an exception if no Use Case is defined and required=True | ||
""" | ||
if use_case_id := self.get_use_case_id(context, required=required): | ||
return dr.UseCase.get(use_case_id) | ||
|
||
return None | ||
|
||
def get_use_case_id(self, context: Context, required=False) -> Optional[str]: | ||
"""Get self.use_case_id or a default Use Case id defined at runtime. | ||
Raises an exception if no Use Case id is defined and required=True""" | ||
if use_case_id := ( | ||
self.use_case_id or self.xcom_pull(context, key=XCOM_DEFAULT_USE_CASE_ID) | ||
): | ||
return use_case_id | ||
|
||
if required: | ||
raise AirflowException( | ||
f"{self.__class__.__name__} requires a Use Case. Please define one of:\n" | ||
"*use_case_id* parameter in the operator\n" | ||
"*use_case_id* DAG context parameter\n" | ||
"`GetOrCreateUseCaseOperator(..., set_default=True)` as one of the previous DAG tasks" | ||
) | ||
|
||
return None | ||
|
||
def add_into_use_case( | ||
self, | ||
entity: Union[dr.Project, dr.Dataset, dr.models.Recipe, dr.Application], | ||
*, | ||
context: Context, | ||
): | ||
"""Add an *entity* into Use Case defined as self.use_case_id or a default DAG Use Case.""" | ||
if use_case := self.get_use_case(context): | ||
use_case.add(entity) | ||
self.log.info( | ||
'%s is added into use case "%s"', entity.__class__.__name__, use_case.name | ||
) | ||
|
||
else: | ||
self.log.info("The %s won't be added into any use case.", entity.__class__.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c-h-russell-walker notebooks may find this base class useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Also I appreciate your review/comments on my draft PR - I've been sidelined by Fedramp 10.2 issues though so haven't gotten a chance to fully review/respond yet.
Label Needs Review: MLData was removed because @bdrosen96 is part of Build Data & Features domain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. A few comments and questions as I'm trying to understand everything here.
|
||
return None | ||
|
||
def get_use_case_id(self, context: Context, required=False) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should required be a property of the Operator itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like based on the operator type, required would be pre-determined, or at least defined through the init
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 ways to define a use case: an explicit parameter, a context variable and a default use case defined via xcom.
__init__
can handle only the first option.
Mandatory vs optional use_case_id could be really defined using requires_use_case: bool
class field. At the moment PR uses an explicit argument in get_use_case_id()
method. I'm ok with either option. I could switch to the class field either in this PR or in a followup if you find that option better.
*, | ||
context: Context, | ||
): | ||
"""Add an *entity* into Use Case defined as self.use_case_id or a default DAG Use Case.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mainly for adding datasets to a use case? I guess these are defined separately and then added to a use case? Sorry this might just be my misunderstanding. I guess like, a project, has to have a use case assigned to it during creation for it to be within the context of a use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the operator implantation. Yes, at the moment we use it for datasets. Nevertheless, it's applicable for projects or deployments, I guess
self, *, use_case_id: Optional[str] = "{{ params.use_case_id | default('') }}", **kwargs | ||
): | ||
super().__init__(**kwargs) | ||
self.use_case_id = use_case_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking maybe this could auto-initialize, but after looking I guess context
only exists in the scope of execute
, so it wouldn't be present. It might be worth considering as some followup if we could sort of autowrap this so that the methods just use use_case_id = self.use_case_id
and its either populated or None
. I'm not 100% sure if thats actually possible to do in the context of how airflow works though. Just wanted to put the idea out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__
is not context-aware by design because operators are initialized much before the execution.
8e7180b
to
5d70621
Compare
This repository is public. Do not put here any private DataRobot or customer's data: code, datasets, model artifacts, .etc.
Summary
BaseUseCaseEntityOperator
classCreateUseCaseOperator
intoGetOrCreateUseCaseOperator
Rationale
There are quite many DataRobot assets which have to belong to a UseCase. Define a default one in the very beginning of your DAG to avoid repetitive
use_case_id
parameters.