Tasks pods are getting stuck in scheduled state after open slot parallelism count is reached #42383
Open
1 of 2 tasks
Labels
area:core
area:Scheduler
including HA (high availability) scheduler
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
pending-response
provider:cncf-kubernetes
Kubernetes provider related issues
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.0
What happened?
I recently upgraded my airflow version from 2.5.3 to 2.10.0 in our environment, and the parallelism count is set to 32 with three schedulers in place, so what happens is that when more than 96 tasks run, whenever a new task is scheduled after that, it gets stuck in scheduled state, with the open slot count being zero, even though the previous tasks that ran have completed and have been cleared.
What you think should happen instead?
The open slot count should increase when the tasks are completed and the tasks queued up should be scheduled
How to reproduce
Just tried it by upgrading the changes and running 5 or 6 dags with 10 task in each dag and parallelism set to 32 for each scheduler. Point to be noted is that the same set of dag works fine when it was running in airflow version 2.5.3
Operating System
Redhat linux
Versions of Apache Airflow Providers
apache-airflow-providers-postgres==5.12.0
apache-airflow-providers-apache-hive==8.2.0
apache-airflow-providers-amazon==8.28.0
apache-airflow-providers-cncf-kubernetes==8.4.1
apache-airflow-providers-apache-livy==3.9.0
apache-airflow-providers-presto==5.6.0
apache-airflow-providers-http==4.13.0
apache-airflow-providers-trino==5.8.0
apache-airflow-providers-snowflake==5.7.0
apache-airflow-providers-salesforce==5.8.0
apache-airflow-providers-papermill==3.8.0
apache-airflow-providers-google==10.22.0
apache-airflow-providers-celery==3.8.1
apache-airflow-providers-redis==3.8.0
apache-airflow-providers-dbt-cloud==3.10.0
apache-airflow-providers-openlineage==1.11.0 \
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: