Fix `FileTaskHandler` only read from default executor #45631

jason810496 · 2025-01-14T01:55:08Z

Also added test_file_task_handler_with_multiple_executors to verify that the correct get_task_log method of different executors is called.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk · 2025-01-14T16:19:12Z

cc: @o-nikolas

airflow/utils/log/file_task_handler.py

tests/utils/test_log_handlers.py

airflow/utils/log/file_task_handler.py

jason810496 · 2025-01-17T04:33:12Z

Hi @o-nikolas,
I ran the tests locally with Breeze, and they worked fine. However, the CI is failing mostly due to "Unknown executor being loaded: " errors. Do you have any idea where I should look into this? (Could it be resolved by adding some additional configuration for Breeze tests or something similar?)

Thanks!

potiuk · 2025-01-17T14:42:01Z

Likely this is a side effect of some other test that does not clean-up after itself - you can repeat what CI is doing - i.e. run the Core test type:

breeze testing core-tests --test-type Core

And they should run the tests in the same sequence the "Core" tests are run in CI in your PR (Core test type is the one that is failing.

The problem is that some of the earlier tests likely modify a state stored in memory (likely list of executors stored in memory) and do not restore it properly - and the failing tests do not set up the expected state properly.

Now - it's a bit of guessing, art and eventually bisecting to see which tests are causing it. In many cases you can guess - you know which tests you added so likely it is somehow related to your changes (but it might be that your tests simply ucovered it). What I often do in such cases is trying to bisect it - instead of runnin the whole test_type ("Core") - I just enter breeze and try to run the same sequence of tests - simply individually rin the test packages/modulles that you see printed in the output - so for example from the above I try to run (there are some packages that are before that).

pytest ... tests/models tests/ti_deps tests/utils

And see it fails

Now if you run:

pytest tests/ti_deps tests/utils

And it fails - then you know side effect comes from either ti_deps or earlier utils - then you can remove ti_deps and see if it fails, then if you see tha it comes from utils, you run tests from individual modules in ti_utils in the same sequence you see below and remove first half and see if it fails - and so on.

This is by far fastest way of narrowing down which test is creating the side-effect if you have no idea.

Here is the example output.

  ....
  tests/models/test_xcom_arg_map.py ...........                            [ 67%]
  tests/ti_deps/deps/test_dag_ti_slots_available_dep.py ..                 [ 67%]
  tests/ti_deps/deps/test_dag_unpaused_dep.py ..                           [ 67%]
  tests/ti_deps/deps/test_dagrun_exists_dep.py ..                          [ 67%]
  tests/ti_deps/deps/test_mapped_task_upstream_dep.py .................... [ 68%]
  ......................................................ssssssssssss...... [ 70%]
  ......                                                                   [ 71%]
  tests/ti_deps/deps/test_not_in_retry_period_dep.py ...                   [ 71%]
  tests/ti_deps/deps/test_not_previously_skipped_dep.py .....              [ 71%]
  tests/ti_deps/deps/test_pool_slots_available_dep.py .....                [ 71%]
  tests/ti_deps/deps/test_prev_dagrun_dep.py ...........                   [ 71%]
  tests/ti_deps/deps/test_ready_to_reschedule_dep.py ................      [ 72%]
  tests/ti_deps/deps/test_runnable_exec_date_dep.py .........              [ 72%]
  tests/ti_deps/deps/test_task_concurrency.py ..........                   [ 73%]
  tests/ti_deps/deps/test_task_not_running_dep.py ..                       [ 73%]
  tests/ti_deps/deps/test_trigger_rule_dep.py ............................ [ 74%]
  ........................................................................ [ 77%]
  ..........................                                               [ 78%]
  tests/ti_deps/deps/test_valid_state_dep.py ...                           [ 78%]
  tests/utils/log/test_colored_log.py .                                    [ 78%]
  tests/utils/log/test_file_processor_handler.py ssss                      [ 78%]
  tests/utils/log/test_json_formatter.py sssss                             [ 78%]
  tests/utils/log/test_log_reader.py ........                              [ 78%]
  tests/utils/log/test_secrets_masker.py sssssssssssssssssssssssssssssssss [ 80%]
  ssssssssssssssssssssssssss                                               [ 80%]
  tests/utils/test_cli_util.py ..................                          [ 81%]
  tests/utils/test_context.py ....sss                                      [ 81%]
  tests/utils/test_dag_cycle.py sssssssss                                  [ 82%]
  tests/utils/test_dataform.py ss                                          [ 82%]
  tests/utils/test_db.py ...................                               [ 83%]
  tests/utils/test_db_cleanup.py ......................................... [ 84%]
  ...                                                                      [ 84%]
  tests/utils/test_db_manager.py ........                                  [ 84%]
  tests/utils/test_decorators.py ssssssssssssssssssssssss                  [ 85%]
  tests/utils/test_docs.py ssss                                            [ 86%]
  tests/utils/test_dot_renderer.py ......                                  [ 86%]
  tests/utils/test_edgemodifier.py ssssssssssssss                          [ 86%]
  tests/utils/test_email.py sssssssssss..............                      [ 87%]
  tests/utils/test_entry_points.py s                                       [ 87%]
  tests/utils/test_event_scheduler.py s                                    [ 87%]
  tests/utils/test_file.py ssssssssssssssssssssss                          [ 88%]
  tests/utils/test_helpers.py .sssssssssss.ssssssssssssssssssssssssssss    [ 90%]
  tests/utils/test_json.py sssssssssssssssss                               [ 90%]
  tests/utils/test_log_handlers.py ...FFF.......................           [ 91%]

jason810496 · 2025-01-17T16:00:16Z

Thanks, @potiuk, for pointing out that the error might be caused by side effects from other tests and for providing the detailed guide to resolve it! I will try using the bisect approach to identify which test case is causing the side effect.

Update:
Luckily, I found that it was caused by the tests/ti_deps/ module after a few iterations!

tests/ti_deps/deps/test_ready_to_reschedule_dep.py

jason810496 · 2025-01-18T14:32:44Z

Update:
Luckily, I found that it was caused by the tests/ti_deps/ module after a few iterations!
tests/ti_deps/deps/test_ready_to_reschedule_dep.py

TL;DR:
The global state in the executor_loader module can cause side effects in other test cases if not properly torn down.

The issue is caused by the executor_loader module, as it stores some state in global variables:
https://github.com/apache/airflow/blob/main/airflow/executors/executor_loader.py#L48-L54
This can lead to errors in Executor.load_executor or Executor.get_default_executor , as these methods rely on looking up executor_name in global variables. If a test interacts with executor_loader and doesn't clear the global variables, it may cause side effects in subsequent tests.

In other test cases that use the executor_loader module, reload is employed.
However, in the current test case, we need to mock the Executor.load_executor and Executor.get_default_executor methods. Mixing reload(executor_loader) with mocks results in the mock object never being called, as the callable used during the test runtime bypasses the mock.

From my perspective, cleaning the global variables in executor_loader should suffice for tearing down the state. I will create another PR to investigate whether this approach is correct and to evaluate how much test speed can be improved by replacing all reload(executor_loader) calls with clean_executor_loader().

airflow/utils/log/file_task_handler.py

tests_common/test_utils/executor_loader.py

jason810496 · 2025-01-22T16:56:02Z

The CI failed due to #45917. I'll wait for it to be merged, then rebase with the main branch, which should resolve the issue.

providers/tests/cncf/kubernetes/log_handlers/test_log_handlers.py

tests/executors/test_executor_loader.py

tests_common/pytest_plugin.py

- Previous test failure is cuased by cache state of executor_instances - Should set ti.state = RUNNING after ti.run

- The side effect only show up in postgres as backend environment, as previous fix only resolve side effect in sqlite as backend environment. - Also refactor clean_executor_loader as pytest fixture with setup teardown

jason810496 · 2025-01-23T08:29:23Z

Resolved the issue, and the CI has successfully passed! 🎉
cc @o-nikolas

o-nikolas · 2025-01-23T18:35:04Z

Resolved the issue, and the CI has successfully passed! 🎉 cc @o-nikolas

Nice! LGTM, @Lee-W Has your feedback been addressed to your satisfaction?

Lee-W · 2025-01-24T02:58:40Z

Resolved the issue, and the CI has successfully passed! 🎉 cc @o-nikolas

Nice! LGTM, @Lee-W Has your feedback been addressed to your satisfaction?

Yep, looks good 🙂 let's merge it!

* Fix FileTaskHandler only read from default executor * Add cached_property back to avoid loading executors * Add test for multi-executors scenario * Allow to call load_executor without init_executors * Refactor by caching necessary executors * Refactor test with default executor case * Fix side effect from executor_loader * Fix KubernetesExecutor test - Previous test failure is cuased by cache state of executor_instances - Should set ti.state = RUNNING after ti.run * Fix side effect from executor_loader - The side effect only show up in postgres as backend environment, as previous fix only resolve side effect in sqlite as backend environment. - Also refactor clean_executor_loader as pytest fixture with setup teardown * Capitalize default executor key * Refactor clean_executor_loader fixture

boring-cyborg bot added the area:logging label Jan 14, 2025

o-nikolas reviewed Jan 15, 2025

View reviewed changes

airflow/utils/log/file_task_handler.py Show resolved Hide resolved

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from c982d33 to 5d032b6 Compare January 15, 2025 03:50

jason810496 changed the title ~~[WIP] Fix FileTaskHandler only read from default executor~~ Fix FileTaskHandler only read from default executor Jan 15, 2025

nradz reviewed Jan 15, 2025

View reviewed changes

tests/utils/test_log_handlers.py Show resolved Hide resolved

jason810496 requested review from XD-DENG, ashb, pierrejeambrun and hussein-awala as code owners January 15, 2025 22:17

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from a814e7f to 90ffcb6 Compare January 15, 2025 22:19

o-nikolas approved these changes Jan 15, 2025

View reviewed changes

airflow/utils/log/file_task_handler.py Outdated Show resolved Hide resolved

airflow/utils/log/file_task_handler.py Show resolved Hide resolved

o-nikolas added the full tests needed We need to run full set of tests for this PR to merge label Jan 15, 2025

eladkal added this to the Airflow 2.10.5 milestone Jan 16, 2025

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from 90ffcb6 to dae4480 Compare January 18, 2025 14:14

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch 3 times, most recently from 4f6c127 to d2def20 Compare January 21, 2025 17:20

Lee-W reviewed Jan 22, 2025

View reviewed changes

airflow/utils/log/file_task_handler.py Outdated Show resolved Hide resolved

tests_common/test_utils/executor_loader.py Outdated Show resolved Hide resolved

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from d2def20 to 0581159 Compare January 22, 2025 13:53

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from 0581159 to a5fdbc4 Compare January 22, 2025 17:10

Lee-W reviewed Jan 23, 2025

View reviewed changes

providers/tests/cncf/kubernetes/log_handlers/test_log_handlers.py Outdated Show resolved Hide resolved

tests/executors/test_executor_loader.py Outdated Show resolved Hide resolved

tests_common/pytest_plugin.py Outdated Show resolved Hide resolved

jason810496 added 3 commits January 23, 2025 15:05

Fix FileTaskHandler only read from default executor

1f4593a

Add cached_property back to avoid loading executors

cc5fda2

Add test for multi-executors scenario

0a5b6e4

jason810496 added 8 commits January 23, 2025 15:05

Allow to call load_executor without init_executors

04d79a5

Refactor by caching necessary executors

3ebd940

Refactor test with default executor case

c40e68d

Fix side effect from executor_loader

b1d9652

Fix KubernetesExecutor test

e37dcdd

- Previous test failure is cuased by cache state of executor_instances - Should set ti.state = RUNNING after ti.run

Fix side effect from executor_loader

d65e1a0

- The side effect only show up in postgres as backend environment, as previous fix only resolve side effect in sqlite as backend environment. - Also refactor clean_executor_loader as pytest fixture with setup teardown

Capitalize default executor key

ca36091

Refactor clean_executor_loader fixture

1bd1a30

jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from a5fdbc4 to 1bd1a30 Compare January 23, 2025 07:05

o-nikolas mentioned this pull request Jan 23, 2025

Handle Hybrid executors for running task logs #45971

Closed

Lee-W approved these changes Jan 24, 2025

View reviewed changes

Lee-W merged commit 51dbabc into apache:main Jan 24, 2025
91 checks passed

jason810496 mentioned this pull request Jan 24, 2025

[v2-10-test] Fix FileTaskHandler only read from default executor #46000

Merged

utkarsharma2 added the type:bug-fix Changelog: Bug Fixes label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `FileTaskHandler` only read from default executor #45631

Fix `FileTaskHandler` only read from default executor #45631

jason810496 commented Jan 14, 2025 •

edited

Loading

potiuk commented Jan 14, 2025

jason810496 commented Jan 17, 2025

potiuk commented Jan 17, 2025 •

edited

Loading

jason810496 commented Jan 17, 2025 •

edited

Loading

jason810496 commented Jan 18, 2025

jason810496 commented Jan 22, 2025

jason810496 commented Jan 23, 2025 •

edited

Loading

o-nikolas commented Jan 23, 2025

Lee-W commented Jan 24, 2025

Fix FileTaskHandler only read from default executor #45631

Fix FileTaskHandler only read from default executor #45631

Conversation

jason810496 commented Jan 14, 2025 • edited Loading

potiuk commented Jan 14, 2025

jason810496 commented Jan 17, 2025

potiuk commented Jan 17, 2025 • edited Loading

jason810496 commented Jan 17, 2025 • edited Loading

jason810496 commented Jan 18, 2025

jason810496 commented Jan 22, 2025

jason810496 commented Jan 23, 2025 • edited Loading

o-nikolas commented Jan 23, 2025

Lee-W commented Jan 24, 2025

Fix `FileTaskHandler` only read from default executor #45631

Fix `FileTaskHandler` only read from default executor #45631

jason810496 commented Jan 14, 2025 •

edited

Loading

potiuk commented Jan 17, 2025 •

edited

Loading

jason810496 commented Jan 17, 2025 •

edited

Loading

jason810496 commented Jan 23, 2025 •

edited

Loading