-
Notifications
You must be signed in to change notification settings - Fork 15.9k
Optimize dynamic DAG updates to avoid loading large serialized DAGs #57592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When updating dynamic DAGs (those without task instances), Airflow was loading the entire `SerializedDagModel` object from the database, which could contain megabytes of JSON data, just to update a few fields which was completely unnecesary. This change replaces the object-loading approach with a direct SQL UPDATE statement, significantly improving performance for deployments with large or frequently-changing dynamic DAGs. The optimization uses SQLAlchemy's update() construct to modify only the necessary columns (_data, _data_compressed, dag_hash) without fetching the existing row, reducing both database load and network transfer. Additionally, removed an unnecessary session.merge() call on dag_version, as the object is already tracked by the session after being loaded.
tirkarthi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kaxil . We also had odd crashes in this code path where latest serialised dag was none with dag version present causing AttributeError but I couldn't reproduce it consistently.
Backport failed to create: v3-1-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 27c9b94 v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
…57592) When updating dynamic DAGs (those without task instances), Airflow was loading the entire `SerializedDagModel` object from the database, which could contain megabytes of JSON data, just to update a few fields which was completely unnecesary. This change replaces the object-loading approach with a direct SQL UPDATE statement, significantly improving performance for deployments with large or frequently-changing dynamic DAGs. The optimization uses SQLAlchemy's update() construct to modify only the necessary columns (_data, _data_compressed, dag_hash) without fetching the existing row, reducing both database load and network transfer. Additionally, removed an unnecessary session.merge() call on dag_version, as the object is already tracked by the session after being loaded. (cherry picked from commit 27c9b94)
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
When updating dynamic DAGs (those without task instances), Airflow was loading the entire
SerializedDagModelobject from the database, which could contain megabytes of JSON data, just to update a few fields which was completely unnecesary.This change replaces the object-loading approach with a direct SQL UPDATE statement, significantly improving performance for deployments with large or frequently-changing dynamic DAGs.
The optimization uses SQLAlchemy's
update()construct to modify only the necessary columns (_data,_data_compressed,dag_hash) without fetching the existing row, reducing both database load and network transfer.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.