Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update path handling in get_dag_directory() to return the absolute path of the symlink #42350

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oboki
Copy link
Contributor

@oboki oboki commented Sep 19, 2024

Somtimes, I use symlinks in my local environment, like this:

❯ ls -rtlh /opt/airflow/
total 0
lrwxrwxrwx 1 oboki oboki 39 Sep 19 21:24 dags -> /home/oboki/projects/oboki-airflow/dags
lrwxrwxrwx 1 oboki oboki 42 Sep 19 21:24 plugins -> /home/oboki/projects/oboki-airflow/plugins

In this case, the get_dag_directory() function follows the symlink and returns the original path, which causes issues.

To resolve this, it should return the absolute path of the symlink itself.

Even though airflow.cfg is set with dags_folder = /opt/airflow/dags, if that path is a symlink, it returns /home/oboki/projects/oboki-airflow/dags, leading to dag_not_in_current_folder = True and causing issues. (a valid DAG is periodically deactivated.)

I changed manager.py to use the absolute path of symlink itself instead of resolved one.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Sep 19, 2024
@oboki oboki force-pushed the feature-symlink-dag-dir-returning-abspath-itself branch from 37c560c to e583098 Compare September 19, 2024 13:47
@oboki oboki changed the title fix: update path handling in get_dag_directory() to teturn the absolute path of the symlink fix: update path handling in get_dag_directory() to return the absolute path of the symlink Sep 19, 2024
@afranzi
Copy link

afranzi commented Sep 19, 2024

We are having the same issue using the gitSync in our K8s deployment 😢.
After upgrading from 2.9.3 to 2.10.1, we started having the same issue, where the dag_processor_manager adds the DAGs but the scheduler removes them in the code shared by @oboki.

Could we include this PR in the 2.10.2 please?

@bitomukesh
Copy link

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Bug Fix - Fix Symlink Path Handling in DAG Directory

manager.py - Changed get_dag_directory() to return absolute path instead of resolved path

@ephraimbuddy
Copy link
Contributor

We are having the same issue using the gitSync in our K8s deployment 😢. After upgrading from 2.9.3 to 2.10.1, we started having the same issue, where the dag_processor_manager adds the DAGs but the scheduler removes them in the code shared by @oboki.

Could we include this PR in the 2.10.2 please?

The issue you had has been fixed in 2.10.2, which will be released later today. Makes me think that this PR is also a result of the same problem.

@afranzi
Copy link

afranzi commented Sep 20, 2024

@ephraimbuddy even if the 2.10.2 is reverting the changes causing the issue, I believe this PR still adds value and robustness to airflow, so I believe it would be a good addition.

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy even if the 2.10.2 is reverting the changes causing the issue, I believe this PR still adds value and robustness to airflow, so I believe it would be a good addition.

There's a similar change here: https://github.com/apache/airflow/pull/42142/files. So yeah, we will look into it

@afranzi
Copy link

afranzi commented Sep 20, 2024

I believe this one would be cleaner than the other one you mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants