Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use set instead of list for dags' tags #41695

Merged
merged 29 commits into from
Sep 21, 2024
Merged

Conversation

Avihais12344
Copy link
Contributor

@Avihais12344 Avihais12344 commented Aug 23, 2024

Closes issue 41420.

Thank you for viewing this PR :)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Aug 23, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@shahar1 shahar1 added airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:core labels Aug 23, 2024
@shahar1 shahar1 added this to the Airflow 3.0.0 milestone Aug 23, 2024
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I agree with the idea that set is more appropriate in this case, as tags should be unique. I'd be happy for more opinions whether it's worth making it as a breaking chage.

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup! Especially as is is "non-breaking" for existing DAGs, so no effect on DAG authors but better data types. Just small proposals to pin the typing to strings

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests fail, otherwise OK for me. Small nit as comments

newsfragments/41420.significant.rst Outdated Show resolved Hide resolved
newsfragments/41420.significant.rst Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

uranusjr commented Sep 11, 2024

I do not like this, but feel free to merge this without me if others feel more strongly this i beneficial.

…it raises the error: Subscripted generics cannot be used with class and instance checks.
@Avihais12344
Copy link
Contributor Author

Avihais12344 commented Sep 12, 2024

@uranusjr why you don't like this?

@uranusjr
Copy link
Member

I mostly outlined the reasons above, but to summarise, you want to keep this mutable so you can call add after a DAG is created, but by changing this from a list to set, you are also breaking everyone that’s already adding tags after a DAG is created. The two things you are trying to do (use set instead of list, and keep the attribute mutable) have directly conflicted reasoning. IMO it is simply not worthwhile to break everyone so you can more easily handle duplicated tags. You can just do that easily in your own code.

With that said, 3.0 is the exact time to break everyone, so I am still open for this to be merged if people feel fine about it. But I will not approve nor merge this change myself.

@jscheffl
Copy link
Contributor

I mostly outlined the reasons above, but to summarise, you want to keep this mutable so you can call add after a DAG is created, but by changing this from a list to set, you are also breaking everyone that’s already adding tags after a DAG is created. The two things you are trying to do (use set instead of list, and keep the attribute mutable) have directly conflicted reasoning. IMO it is simply not worthwhile to break everyone so you can more easily handle duplicated tags. You can just do that easily in your own code.

With that said, 3.0 is the exact time to break everyone, so I am still open for this to be merged if people feel fine about it. But I will not approve nor merge this change myself.

Would it make you more happy if we keep the interface as a list (so: non breaking) but during setting/init it is temporarily converted to a set to ensure no duplicates are in the list?

@Avihais12344
Copy link
Contributor Author

@uranusjr I am sorry to hear you don't like my change. I think the solution of @jscheffl is reasonable to handle your conflict, even though I think we should break the interface to a set. Due to the release of Airflow 3, it's a great time (that may not come back) to break interfaces.

But can you (@uranusjr) at least close your conversations? As it blockes the merge request (that was approved by other people).
Thank you for your understanding.

@romsharon98 romsharon98 merged commit bdab7dc into apache:main Sep 21, 2024
51 checks passed
Copy link

boring-cyborg bot commented Sep 21, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

joaopamaral pushed a commit to joaopamaral/airflow that referenced this pull request Oct 21, 2024
* Started working on dag tags, moved the tags to set, and added atest to check duplications.

* Fixed more areas of the c'tor.

* Fixed test of dag tags.

* Added a check to see if the tags are mutable.

* Added newsfragment.

* Removed unecessary check.

* Removed str specification at the type, for compatability with python 3.8.

* Removed more type specification as part of compatability with python 3.8

* Fixed the newsfragment.

* Added missing word.

* Used `` for code segemnts at the rst file.

* Reformatted the file.

* Fixed wrong method for adding tag.

* Added type hinting at the dag bag.

* Deserialized the tags to set.

* Adjusted the tests for the set type.

* Added type hinting.

* Sorting the tags by name.

* Changed to typing.

* Update newsfragments/41420.significant.rst

Co-authored-by: Jens Scheffler <[email protected]>

* Update newsfragments/41420.significant.rst

Co-authored-by: Jens Scheffler <[email protected]>

* Removed the generic specification at the dag args expected types, as it raises the error: Subscripted generics cannot be used with class and instance checks.

* Added tags to the expected serialized DAG.

* Added sorting the tags keys by the name key.

* Fixed sorting tags by name to use `sorted` instead of `.sort`

* Fixed tags comparesion, as it's now a set, and not a list.

---------

Co-authored-by: Jens Scheffler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants