-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor _register_dataset_changes #42343
Open
uranusjr
wants to merge
9
commits into
apache:main
Choose a base branch
from
astronomer:refactor-register-dataset-changes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Refactor _register_dataset_changes #42343
uranusjr
wants to merge
9
commits into
apache:main
from
astronomer:refactor-register-dataset-changes
+146
−67
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
the
area:Scheduler
including HA (high availability) scheduler
label
Sep 19, 2024
uranusjr
force-pushed
the
refactor-register-dataset-changes
branch
from
September 19, 2024 10:38
9f2101b
to
24e7777
Compare
Instead of fetching DatasetModel one by one, do a bulk fetch into a dict to save roundtrips to the database.
Prior to this commit, we already only create DatasetModel rows inside the manager. This also changes how DatasetAliasModel to only be created inside create_dataset_aliases, and only associated them to DatasetEvent in register_dataset_change. All the dataset manager functions are also changed to only accept public-facing dataset classes, instead of ORM models. The register_dataset_change function now takes an additional keyword argument 'aliases' that is a list of dataset aliases associated to the DatasetEvent to be created.
Prior to this commit, we already only create DatasetModel rows inside the manager. This also changes how DatasetAliasModel to only be created inside create_dataset_aliases, and only associated them to DatasetEvent in register_dataset_change. All the dataset manager functions are also changed to only accept public-facing dataset classes, instead of ORM models. The register_dataset_change function now takes an additional keyword argument 'aliases' that is a list of dataset aliases associated to the DatasetEvent to be created.
uranusjr
force-pushed
the
refactor-register-dataset-changes
branch
from
September 20, 2024 06:46
24e7777
to
a106a8a
Compare
uranusjr
requested review from
potiuk,
jedcunningham,
ephraimbuddy,
kaxil,
XD-DENG and
ashb
as code owners
September 20, 2024 07:28
uranusjr
commented
Sep 20, 2024
self.log.info( | ||
'Creating event for %r through aliases "%s"', | ||
dataset_obj, | ||
", ".join(alias_names), | ||
) | ||
dataset_manager.register_dataset_change( | ||
task_instance=self, | ||
dataset=dataset_obj, | ||
extra=extra, | ||
dataset=dataset_obj.to_public(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug—dataset
here should be passed a Dataset, not DatasetModel. The to_public
methods on DatasetModel was added just for this.
uranusjr
force-pushed
the
refactor-register-dataset-changes
branch
from
September 20, 2024 08:36
972b885
to
7475d1b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on #42245. Please review that PR first. This will be rebased when that one is merged.Rebased. Ready!This makes DatasetAliasModel only be created inside DatasetManager, similar to how DatasetModel is handled. The function create_dataset_aliases is added for this. DatasetAlias is also made to only be associated to a DatasetEvent in the manager (in function register_dataset_change). This function now takes an additional keyword argument
aliases
that is a list of dataset aliases associated to the DatasetEvent to be created.All the dataset manager functions are also changed to only accept public-facing dataset classes, instead of ORM models.