On Airflow with Kubernetes Executor, sometime the Task keeps running even though the executor pod has failed and later it is cleaned up by Scheduler as Daemon task #40979
Replies: 4 comments 16 replies
-
I am also seeing this issue as well, although with mine it has been mostly completed task that for some reason don't kill the pod and still show as "Running" in Kubernetes. I have also seen this on 2 Airflow versions (2.8.3 and 2.9.2). We are running on Kubernetes version 1.29. Could this be the cause? If anybody has solved this or has debugging tips that would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
-
Likely database connectibvity stability or possibly timeuts. It might be that the task fails to update status. Another option (and you can try it) is "schediule after task execution" configuration - you can set it to false and see if it helps. You can also take a look in your database locks for suspicious things (maybe your database runs out of resorces, or gets restarted from time to time or there is a tempoarary connectivity issue between pods and the DB. Converting it to a discussion as it is not a concrete "issue", but best things for you to do is to try to find some correlated events in other logs when it happens. |
Beta Was this translation helpful? Give feedback.
-
Also I have seen this issue mostly with dags which are resource intensive. For example, some dag is processing lot of data in memory and it exceeds the limit, it is killed by kubernetes and we can see OOMKilled status. Mostly this type of dags are ones which show this behaviour. Even after being killed, their status does not seem to update and the task continues to run. |
Beta Was this translation helpful? Give feedback.
-
@ahipp13 Have you fixed the issue yet? Log in airflow-scheduler pod:
|
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.9.2
What happened?
Hello,
After upgrading to Airflow
2.9.2
we have started seeing a strange issue with our Airflow tasks. Sometime if the task fails, the executor pod is deleted. But somehow Airflow does not get notified of the failure causing the task to continue running.After some time, scheduler finds it as a daemon task and kills the task. I don't know if this is airflow issue or Kubernetes Provider's issue. But there seem to be some drop in communication between these two.
But this has been occurring very frequently since
2.9.2
upgrade.In some rare instances, I have also seen this behavior after successful completion of task. Though task complete successfully, Airflow is not notified of task state. Causing the task being killed later by Scheduler. And task status changes to failed.
What you think should happen instead?
When the task fails and pod is killed, Airflow should get notified appropriately. And task should reflect correct status.
How to reproduce
I am not sure how to re-produce it. This is not something happening everytime. It happens to some job. And on re-run the job completes correctly.
Operating System
Ubuntu Linux (on AKS)
Versions of Apache Airflow Providers
Deployment
Official Apache Airflow Helm Chart
Deployment details
We have deployed it AKS
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions