Enabling Read Replica and Configuring Webserver to Use a Different DB Host #40549
Replies: 1 comment
-
Not impossible, but It would be way more complex for deployment configuration and we would have to also figure out how to separate "read" access from "write" access. Unfortunately, the webserver does not only read but also writes and thos are often mixed when it comes to logic (for example when you consider FAB views for connections and variables, it uses FAB mapping to models and implementing read/write replicas there would require changing FAB. Hopefully we will get rid of it eventually in 3.0 (FAB models mapping to UI) but also other parts of airflow UI are doing a number of writes. Plus we would have to have a suite of tests that would detect when somoene uses read replica for writing and likely warn when you use write replica for pure reading - to make sure that any fututre development will keep the split. So - not impossible, but rather complex. |
Beta Was this translation helpful? Give feedback.
-
Recently, I conducted an analysis of database performance in Airflow deployments. I set up a local environment with 40K DAGs using Breeze. During the parsing and execution of these DAGs, I attempted to load the home page in the Airflow UI. As the number of rows in the metadata database increased, I observed a significant slowdown in the home page loading time, which took about 7-8 seconds.
To further investigate, I conducted an experiment by turning off the scheduler container. Surprisingly, with the scheduler disabled, the home page loaded in about 3-5 seconds. My hypothesis is that the scheduler frequently locks rows for its operations, causing the webserver to wait until these locks are released when querying the same tables.
Given this, I am considering whether enabling a read replica for the webserver could mitigate the performance decrease. This would involve configuring the webserver to use a different database host, specifically a read replica, to avoid contention with the scheduler's operations.
I would like to discuss the feasibility and potential benefits of this approach. Specifically, I propose:
By implementing this, we might achieve a more responsive UI and better performance for Airflow deployments with large numbers of DAGs. Looking forward to the community's thoughts and insights on this approach.
Beta Was this translation helpful? Give feedback.
All reactions