Almost-starved pools preventing scheduling in pools with free slots (possible bug or improvement idea) #25139

darinbaumgartel · 2022-07-18T19:26:22Z

darinbaumgartel
Jul 18, 2022

Not sure if this is a bug or an improvement and wanted to check my understanding in regards to the mechanism for finding task instances ready for execution. Relevant code is in this function:

airflow/airflow/jobs/scheduler_job.py

Line 249 in 3dbca5e

    
           def _executable_task_instances_to_queued(self, max_tis: int, session: Session) -> List[TI]:

Imagine the following setup with two pools:

parallelism=100
big_pool (90 slots) runs ~1k hourly tasks that each take a few seconds at priority_weight=2. It is often near capacity.
small_pool (10 slots) runs just a few short tasks each hour at priority_weight=1. It is often empty with all slots free.

What I expect to happen

Since big_pool only has 90 slots, and small_pool has 10 slots, and the parallelism = 100, we should always be able to schedule tasks in small_pool as long as it has available slots.

What actually happens

Since big_pool is churning through lots of tasks, it is rarely if ever actually starving. Often times there's just a few open slots (let's say it's 1 open slot for this example). This means that often times max_tis is ~11 because small_pool(10) is generally free and big_pool has just 1 slot.
The task query includes big_pool because it is not starving, and limits to max_tis=11.
Since big_pool is higher priority, all 11 tasks resulting from the task query are for big_pool, and so we schedule just 1 new big_pool tasks, when we could be scheduling 10 small_pool tasks also.
Since big_pool has high throughput, on the next scheduler loop, it's likely we are back in this situation again.

How this could be improved?

In this simple example, we can obviously just change the priority weight, but in real world scenarios with a more complex assortment of pools, that's not always the way to go. Fundamentally, we aren't always making use of our allotted parallelism.
One possibility is instead of running the TI query once, it could be run multiple times. First we note all the open slots by pool. On the first iteration, it may find 1 TI in big_pool. On the second loop, it would exclude big_pool and find it can run 10 TIs in small_pool.
The max number of queries it would have to run is equal to the number of pools.
We could throttle this to avoid hammering the DB with a new config parameter max_executable_ti_queries.

Happy to talk details or to be corrected on any of this. Please note that it's not really possible to provide a minimal reproducible example here because it requires a large-ish cluster setup.

Thanks!

potiuk · 2022-07-18T20:00:22Z

potiuk
Jul 18, 2022
Collaborator

I think you have wrong configuration, you should give higher priority to your small tasks. That should solve the problem in this case.

1 reply

darinbaumgartel Jul 18, 2022
Author

Thanks for the feedback. While I agree that in this simple example it can be resolved with priorities, the larger point is that while Airflow queries for task instances eligible for execution, it attempts to remove starved pools so that the query doesn't return non-executable tasks.

However, this only eliminates pools with 0 open slots. A pool could have just 1 open slot, and that could result in the whole query returning only tasks for a pool with just 1 open slot, and tasks in other pools would not get executed.

potiuk · 2022-07-18T20:41:35Z

potiuk
Jul 18, 2022
Collaborator

Correct. If you have an idea how to solve it and tests covering it you are most welcome to make PRs. I think it is impossible to solve all possible cases and Airflow focuses on "realistic" ones. But if you think you can propose a PR, set of tests cases and reasoning behind better scheduling, you are absolutely welcome.

0 replies

tanelk · 2022-07-19T06:48:51Z

tanelk
Jul 19, 2022

Perhaps this can help you a bit.

A minimal diff, that should work for you and passes existing tests in tests/jobs/test_scheduler_job.py.

Basic idea is that if we fill up a pool, then run another query iteration. I do not think this needs extra parameter but this definetly needs some benchmarking on your case and on "regular" cases.

Feel free to use it if it looks like it could work.

index 1889285899..8094401219 100644
--- a/airflow/jobs/scheduler_job.py
+++ b/airflow/jobs/scheduler_job.py
@@ -336,7 +336,7 @@ class SchedulerJob(BaseJob):
                 task_filter = tuple_in_condition((TaskInstance.dag_id, TaskInstance.task_id), starved_tasks)
                 query = query.filter(not_(task_filter))

-            query = query.limit(max_tis)
+            query = query.limit(max_tis - len(executable_tis))

             task_instances_to_examine: List[TI] = with_row_locks(
                 query,
@@ -477,8 +477,10 @@ class SchedulerJob(BaseJob):
                 task_concurrency_map[(task_instance.dag_id, task_instance.task_id)] += 1

                 pool_stats["open"] = open_slots
+                if open_slots <= 0:
+                    starved_pools.add(pool_name)

-            is_done = executable_tis or len(task_instances_to_examine) < max_tis
+            is_done = len(task_instances_to_examine) < max_tis
             # Check this to avoid accidental infinite loops
             found_new_filters = (
                 len(starved_pools) > num_starved_pools

I'm 100% sure that there are some bugs here :) .

6 replies

tanelk Jul 19, 2022

I know I am, but I wont. Just giving my two cents.

potiuk Jul 19, 2022
Collaborator

Pity. You could become a contributor and help community.

tanelk Jul 19, 2022

I am one: https://github.com/apache/airflow/pulls?q=is%3Apr+author%3Atanelk+is%3Aclosed
And when I stumble upon a bug, that impacts me then I will try to fix it.

Trying to fix a bug/performance issue, that is very hard to replicate, does not seem like a fun time for me. Thats why posted the idea that I had based on the problem description and hopefully @darinbaumgartel or some other interested/impacted party can use that.

potiuk Jul 19, 2022
Collaborator

I am one: https://github.com/apache/airflow/pulls?q=is%3Apr+author%3Atanelk+is%3Aclosed

Cool. Sorry I missed it :).

Trying to fix a bug/performance issue, that is very hard to replicate, does not seem like a fun time for me.

I know. But I am trying to use any opportunity I can to encourage people to contribute :)

darinbaumgartel Jul 21, 2022
Author

Thanks for this! I will look into it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Almost-starved pools preventing scheduling in pools with free slots (possible bug or improvement idea) #25139

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Almost-starved pools preventing scheduling in pools with free slots (possible bug or improvement idea) #25139

darinbaumgartel Jul 18, 2022

Replies: 3 comments · 7 replies

potiuk Jul 18, 2022 Collaborator

darinbaumgartel Jul 18, 2022 Author

potiuk Jul 18, 2022 Collaborator

tanelk Jul 19, 2022

tanelk Jul 19, 2022

potiuk Jul 19, 2022 Collaborator

tanelk Jul 19, 2022

potiuk Jul 19, 2022 Collaborator

darinbaumgartel Jul 21, 2022 Author

darinbaumgartel
Jul 18, 2022

Replies: 3 comments 7 replies

potiuk
Jul 18, 2022
Collaborator

darinbaumgartel Jul 18, 2022
Author

potiuk
Jul 18, 2022
Collaborator

tanelk
Jul 19, 2022

potiuk Jul 19, 2022
Collaborator

potiuk Jul 19, 2022
Collaborator

darinbaumgartel Jul 21, 2022
Author