Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DM-17022] Default use case #144

Merged
merged 11 commits into from
Feb 10, 2025
Merged

Conversation

demidov91
Copy link
Contributor

@demidov91 demidov91 commented Feb 7, 2025

This repository is public. Do not put here any private DataRobot or customer's data: code, datasets, model artifacts, .etc.

Summary

  • Introduce a notion of a default Use Case for a DAG.
  • Add a BaseUseCaseEntityOperator class
  • Rename CreateUseCaseOperator into GetOrCreateUseCaseOperator

Rationale

There are quite many DataRobot assets which have to belong to a UseCase. Define a default one in the very beginning of your DAG to avoid repetitive use_case_id parameters.

@demidov91 demidov91 force-pushed the dzmitry/DM-17022-default-use-case branch from 8bff6a0 to 561f976 Compare February 7, 2025 14:36
@engprod-2
Copy link

engprod-2 bot commented Feb 7, 2025

The Needs Review labels were added based on the following file changes.

Team @datarobot/machine-learning-development (#machine-learning-dev) was assigned because of changes in files:

datarobot_provider/example_dags/hospital_readmissions_example.py
datarobot_provider/operators/base_datarobot_operator.py
datarobot_provider/operators/datarobot.py
tests/unit/conftest.py
tests/unit/operators/test_ai_catalog.py
tests/unit/operators/test_base_datarobot_operator.py
tests/unit/operators/test_datarobot.py

Team @datarobot/mldata (#mldata) was assigned because of changes in files:

datarobot_provider/operators/ai_catalog.py

If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file.

Comment on lines +48 to +112
class BaseUseCaseEntityOperator(BaseDatarobotOperator):
"""Base class for all operators which create a Use Case assets,
i.e. dataset, recipe, project, deployment, notebook, application, etc.

It introduces *use_case_id* template parameter and helper methods
to use a default Use Case if no Use Case was defined in the operator itself or a DAG context.
"""

@classmethod
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
if "use_case_id" not in cls.template_fields:
cls.template_fields = (*cls.template_fields, "use_case_id")
cls.logger().info(
"use_case_id is implicitly added into %s template_fields list.", cls.__name__
)

def __init__(
self, *, use_case_id: Optional[str] = "{{ params.use_case_id | default('') }}", **kwargs
):
super().__init__(**kwargs)
self.use_case_id = use_case_id

def get_use_case(self, context: Context, required=False) -> Optional[dr.UseCase]:
"""Get a `dr.UseCase` entity based on self.use_case_id or a default Use Case defined at runtime.
Raises an exception if no Use Case is defined and required=True
"""
if use_case_id := self.get_use_case_id(context, required=required):
return dr.UseCase.get(use_case_id)

return None

def get_use_case_id(self, context: Context, required=False) -> Optional[str]:
"""Get self.use_case_id or a default Use Case id defined at runtime.
Raises an exception if no Use Case id is defined and required=True"""
if use_case_id := (
self.use_case_id or self.xcom_pull(context, key=XCOM_DEFAULT_USE_CASE_ID)
):
return use_case_id

if required:
raise AirflowException(
f"{self.__class__.__name__} requires a Use Case. Please define one of:\n"
"*use_case_id* parameter in the operator\n"
"*use_case_id* DAG context parameter\n"
"`GetOrCreateUseCaseOperator(..., set_default=True)` as one of the previous DAG tasks"
)

return None

def add_into_use_case(
self,
entity: Union[dr.Project, dr.Dataset, dr.models.Recipe, dr.Application],
*,
context: Context,
):
"""Add an *entity* into Use Case defined as self.use_case_id or a default DAG Use Case."""
if use_case := self.get_use_case(context):
use_case.add(entity)
self.log.info(
'%s is added into use case "%s"', entity.__class__.__name__, use_case.name
)

else:
self.log.info("The %s won't be added into any use case.", entity.__class__.__name__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-h-russell-walker notebooks may find this base class useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
Also I appreciate your review/comments on my draft PR - I've been sidelined by Fedramp 10.2 issues though so haven't gotten a chance to fully review/respond yet.

@demidov91 demidov91 requested a review from mjnitz02 February 7, 2025 15:55
@devexp-slackbot
Copy link

Label Needs Review: MLData was removed because @bdrosen96 is part of Build Data & Features domain.

Copy link
Collaborator

@mjnitz02 mjnitz02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. A few comments and questions as I'm trying to understand everything here.


return None

def get_use_case_id(self, context: Context, required=False) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should required be a property of the Operator itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like based on the operator type, required would be pre-determined, or at least defined through the init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 ways to define a use case: an explicit parameter, a context variable and a default use case defined via xcom.

__init__ can handle only the first option.

Mandatory vs optional use_case_id could be really defined using requires_use_case: bool class field. At the moment PR uses an explicit argument in get_use_case_id() method. I'm ok with either option. I could switch to the class field either in this PR or in a followup if you find that option better.

*,
context: Context,
):
"""Add an *entity* into Use Case defined as self.use_case_id or a default DAG Use Case."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mainly for adding datasets to a use case? I guess these are defined separately and then added to a use case? Sorry this might just be my misunderstanding. I guess like, a project, has to have a use case assigned to it during creation for it to be within the context of a use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the operator implantation. Yes, at the moment we use it for datasets. Nevertheless, it's applicable for projects or deployments, I guess

self, *, use_case_id: Optional[str] = "{{ params.use_case_id | default('') }}", **kwargs
):
super().__init__(**kwargs)
self.use_case_id = use_case_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe this could auto-initialize, but after looking I guess context only exists in the scope of execute, so it wouldn't be present. It might be worth considering as some followup if we could sort of autowrap this so that the methods just use use_case_id = self.use_case_id and its either populated or None. I'm not 100% sure if thats actually possible to do in the context of how airflow works though. Just wanted to put the idea out there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ is not context-aware by design because operators are initialized much before the execution.

@demidov91 demidov91 force-pushed the dzmitry/DM-17022-default-use-case branch from 8e7180b to 5d70621 Compare February 10, 2025 11:00
@demidov91 demidov91 merged commit b731e66 into main Feb 10, 2025
4 checks passed
@demidov91 demidov91 deleted the dzmitry/DM-17022-default-use-case branch February 10, 2025 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants