Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Oct 31, 2025

When updating dynamic DAGs (those without task instances), Airflow was loading the entire SerializedDagModel object from the database, which could contain megabytes of JSON data, just to update a few fields which was completely unnecesary.

This change replaces the object-loading approach with a direct SQL UPDATE statement, significantly improving performance for deployments with large or frequently-changing dynamic DAGs.

The optimization uses SQLAlchemy's update() construct to modify only the necessary columns (_data, _data_compressed, dag_hash) without fetching the existing row, reducing both database load and network transfer.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

When updating dynamic DAGs (those without task instances), Airflow was
loading the entire `SerializedDagModel` object from the database, which
could contain megabytes of JSON data, just to update a few fields which
was completely unnecesary.

This change replaces the object-loading approach with a direct SQL UPDATE
statement, significantly improving performance for deployments with large
or frequently-changing dynamic DAGs.

The optimization uses SQLAlchemy's update() construct to modify only the
necessary columns (_data, _data_compressed, dag_hash) without fetching
the existing row, reducing both database load and network transfer.

Additionally, removed an unnecessary session.merge() call on dag_version,
as the object is already tracked by the session after being loaded.
@kaxil kaxil added this to the Airflow 3.1.2 milestone Oct 31, 2025
@kaxil kaxil requested review from XD-DENG and ashb as code owners October 31, 2025 01:06
@kaxil kaxil added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Oct 31, 2025
@kaxil kaxil requested review from ephraimbuddy and removed request for XD-DENG and ashb October 31, 2025 01:06
@kaxil kaxil requested review from jscheffl and tirkarthi October 31, 2025 01:06
Copy link
Contributor

@tirkarthi tirkarthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaxil . We also had odd crashes in this code path where latest serialised dag was none with dag version present causing AttributeError but I couldn't reproduce it consistently.

@kaxil kaxil merged commit 27c9b94 into apache:main Oct 31, 2025
64 checks passed
@kaxil kaxil deleted the fix-serialized-dag-n-plus-1 branch October 31, 2025 11:00
@github-actions
Copy link

Backport failed to create: v3-1-test. View the failure log Run details

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 27c9b94 v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

kaxil added a commit that referenced this pull request Oct 31, 2025
…57592)

When updating dynamic DAGs (those without task instances), Airflow was
loading the entire `SerializedDagModel` object from the database, which
could contain megabytes of JSON data, just to update a few fields which
was completely unnecesary.

This change replaces the object-loading approach with a direct SQL UPDATE
statement, significantly improving performance for deployments with large
or frequently-changing dynamic DAGs.

The optimization uses SQLAlchemy's update() construct to modify only the
necessary columns (_data, _data_compressed, dag_hash) without fetching
the existing row, reducing both database load and network transfer.

Additionally, removed an unnecessary session.merge() call on dag_version,
as the object is already tracked by the session after being loaded.

(cherry picked from commit 27c9b94)
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants