Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FileTaskHandler only read from default executor #45631

Conversation

jason810496
Copy link
Contributor

@jason810496 jason810496 commented Jan 14, 2025

closes: #45529

Also added test_file_task_handler_with_multiple_executors to verify that the correct get_task_log method of different executors is called.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Jan 14, 2025

cc: @o-nikolas

@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from c982d33 to 5d032b6 Compare January 15, 2025 03:50
@jason810496 jason810496 changed the title [WIP] Fix FileTaskHandler only read from default executor Fix FileTaskHandler only read from default executor Jan 15, 2025
@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from a814e7f to 90ffcb6 Compare January 15, 2025 22:19
@o-nikolas o-nikolas added the full tests needed We need to run full set of tests for this PR to merge label Jan 15, 2025
@eladkal eladkal added this to the Airflow 2.10.5 milestone Jan 16, 2025
@jason810496
Copy link
Contributor Author

Hi @o-nikolas,
I ran the tests locally with Breeze, and they worked fine. However, the CI is failing mostly due to "Unknown executor being loaded: " errors. Do you have any idea where I should look into this? (Could it be resolved by adding some additional configuration for Breeze tests or something similar?)

Thanks!

@potiuk
Copy link
Member

potiuk commented Jan 17, 2025

Likely this is a side effect of some other test that does not clean-up after itself - you can repeat what CI is doing - i.e. run the Core test type:

breeze testing core-tests --test-type Core

And they should run the tests in the same sequence the "Core" tests are run in CI in your PR (Core test type is the one that is failing.

The problem is that some of the earlier tests likely modify a state stored in memory (likely list of executors stored in memory) and do not restore it properly - and the failing tests do not set up the expected state properly.

Now - it's a bit of guessing, art and eventually bisecting to see which tests are causing it. In many cases you can guess - you know which tests you added so likely it is somehow related to your changes (but it might be that your tests simply ucovered it). What I often do in such cases is trying to bisect it - instead of runnin the whole test_type ("Core") - I just enter breeze and try to run the same sequence of tests - simply individually rin the test packages/modulles that you see printed in the output - so for example from the above I try to run (there are some packages that are before that).

pytest ... tests/models tests/ti_deps tests/utils 

And see it fails

Now if you run:

pytest tests/ti_deps tests/utils 

And it fails - then you know side effect comes from either ti_deps or earlier utils - then you can remove ti_deps and see if it fails, then if you see tha it comes from utils, you run tests from individual modules in ti_utils in the same sequence you see below and remove first half and see if it fails - and so on.

This is by far fastest way of narrowing down which test is creating the side-effect if you have no idea.

Here is the example output.

  ....
  tests/models/test_xcom_arg_map.py ...........                            [ 67%]
  tests/ti_deps/deps/test_dag_ti_slots_available_dep.py ..                 [ 67%]
  tests/ti_deps/deps/test_dag_unpaused_dep.py ..                           [ 67%]
  tests/ti_deps/deps/test_dagrun_exists_dep.py ..                          [ 67%]
  tests/ti_deps/deps/test_mapped_task_upstream_dep.py .................... [ 68%]
  ......................................................ssssssssssss...... [ 70%]
  ......                                                                   [ 71%]
  tests/ti_deps/deps/test_not_in_retry_period_dep.py ...                   [ 71%]
  tests/ti_deps/deps/test_not_previously_skipped_dep.py .....              [ 71%]
  tests/ti_deps/deps/test_pool_slots_available_dep.py .....                [ 71%]
  tests/ti_deps/deps/test_prev_dagrun_dep.py ...........                   [ 71%]
  tests/ti_deps/deps/test_ready_to_reschedule_dep.py ................      [ 72%]
  tests/ti_deps/deps/test_runnable_exec_date_dep.py .........              [ 72%]
  tests/ti_deps/deps/test_task_concurrency.py ..........                   [ 73%]
  tests/ti_deps/deps/test_task_not_running_dep.py ..                       [ 73%]
  tests/ti_deps/deps/test_trigger_rule_dep.py ............................ [ 74%]
  ........................................................................ [ 77%]
  ..........................                                               [ 78%]
  tests/ti_deps/deps/test_valid_state_dep.py ...                           [ 78%]
  tests/utils/log/test_colored_log.py .                                    [ 78%]
  tests/utils/log/test_file_processor_handler.py ssss                      [ 78%]
  tests/utils/log/test_json_formatter.py sssss                             [ 78%]
  tests/utils/log/test_log_reader.py ........                              [ 78%]
  tests/utils/log/test_secrets_masker.py sssssssssssssssssssssssssssssssss [ 80%]
  ssssssssssssssssssssssssss                                               [ 80%]
  tests/utils/test_cli_util.py ..................                          [ 81%]
  tests/utils/test_context.py ....sss                                      [ 81%]
  tests/utils/test_dag_cycle.py sssssssss                                  [ 82%]
  tests/utils/test_dataform.py ss                                          [ 82%]
  tests/utils/test_db.py ...................                               [ 83%]
  tests/utils/test_db_cleanup.py ......................................... [ 84%]
  ...                                                                      [ 84%]
  tests/utils/test_db_manager.py ........                                  [ 84%]
  tests/utils/test_decorators.py ssssssssssssssssssssssss                  [ 85%]
  tests/utils/test_docs.py ssss                                            [ 86%]
  tests/utils/test_dot_renderer.py ......                                  [ 86%]
  tests/utils/test_edgemodifier.py ssssssssssssss                          [ 86%]
  tests/utils/test_email.py sssssssssss..............                      [ 87%]
  tests/utils/test_entry_points.py s                                       [ 87%]
  tests/utils/test_event_scheduler.py s                                    [ 87%]
  tests/utils/test_file.py ssssssssssssssssssssss                          [ 88%]
  tests/utils/test_helpers.py .sssssssssss.ssssssssssssssssssssssssssss    [ 90%]
  tests/utils/test_json.py sssssssssssssssss                               [ 90%]
  tests/utils/test_log_handlers.py ...FFF.......................           [ 91%]

@jason810496
Copy link
Contributor Author

jason810496 commented Jan 17, 2025

Thanks, @potiuk, for pointing out that the error might be caused by side effects from other tests and for providing the detailed guide to resolve it! I will try using the bisect approach to identify which test case is causing the side effect.

Update:
Luckily, I found that it was caused by the tests/ti_deps/ module after a few iterations!

tests/ti_deps/deps/test_ready_to_reschedule_dep.py

@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from 90ffcb6 to dae4480 Compare January 18, 2025 14:14
@jason810496
Copy link
Contributor Author

Update:
Luckily, I found that it was caused by the tests/ti_deps/ module after a few iterations!
tests/ti_deps/deps/test_ready_to_reschedule_dep.py

TL;DR:
The global state in the executor_loader module can cause side effects in other test cases if not properly torn down.

The issue is caused by the executor_loader module, as it stores some state in global variables:
https://github.com/apache/airflow/blob/main/airflow/executors/executor_loader.py#L48-L54
This can lead to errors in Executor.load_executor or Executor.get_default_executor , as these methods rely on looking up executor_name in global variables. If a test interacts with executor_loader and doesn't clear the global variables, it may cause side effects in subsequent tests.


In other test cases that use the executor_loader module, reload is employed.
However, in the current test case, we need to mock the Executor.load_executor and Executor.get_default_executor methods. Mixing reload(executor_loader) with mocks results in the mock object never being called, as the callable used during the test runtime bypasses the mock.

From my perspective, cleaning the global variables in executor_loader should suffice for tearing down the state. I will create another PR to investigate whether this approach is correct and to evaluate how much test speed can be improved by replacing all reload(executor_loader) calls with clean_executor_loader().

@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch 3 times, most recently from 4f6c127 to d2def20 Compare January 21, 2025 17:20
@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from d2def20 to 0581159 Compare January 22, 2025 13:53
@jason810496
Copy link
Contributor Author

The CI failed due to #45917. I'll wait for it to be merged, then rebase with the main branch, which should resolve the issue.

@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from 0581159 to a5fdbc4 Compare January 22, 2025 17:10
- Previous test failure is cuased by cache state of executor_instances
- Should set ti.state = RUNNING after ti.run
- The side effect only show up in postgres as backend environment, as
  previous fix only resolve side effect in sqlite as backend environment.
- Also refactor clean_executor_loader as pytest fixture with setup
  teardown
@jason810496 jason810496 force-pushed the fix/file-task-handler-only-read-from-default-executor branch from a5fdbc4 to 1bd1a30 Compare January 23, 2025 07:05
@jason810496
Copy link
Contributor Author

jason810496 commented Jan 23, 2025

Resolved the issue, and the CI has successfully passed! 🎉
cc @o-nikolas

@o-nikolas
Copy link
Contributor

Resolved the issue, and the CI has successfully passed! 🎉 cc @o-nikolas

Nice! LGTM, @Lee-W Has your feedback been addressed to your satisfaction?

@Lee-W
Copy link
Member

Lee-W commented Jan 24, 2025

Resolved the issue, and the CI has successfully passed! 🎉 cc @o-nikolas

Nice! LGTM, @Lee-W Has your feedback been addressed to your satisfaction?

Yep, looks good 🙂 let's merge it!

@Lee-W Lee-W merged commit 51dbabc into apache:main Jan 24, 2025
91 checks passed
@utkarsharma2 utkarsharma2 added the type:bug-fix Changelog: Bug Fixes label Jan 27, 2025
gpathak128 pushed a commit to gpathak128/airflow that referenced this pull request Jan 29, 2025
* Fix FileTaskHandler only read from default executor

* Add cached_property back to avoid loading executors

* Add test for multi-executors scenario

* Allow to call load_executor without init_executors

* Refactor by caching necessary executors

* Refactor test with default executor case

* Fix side effect from executor_loader

* Fix KubernetesExecutor test

- Previous test failure is cuased by cache state of executor_instances
- Should set ti.state = RUNNING after ti.run

* Fix side effect from executor_loader

- The side effect only show up in postgres as backend environment, as
  previous fix only resolve side effect in sqlite as backend environment.
- Also refactor clean_executor_loader as pytest fixture with setup
  teardown

* Capitalize default executor key

* Refactor clean_executor_loader fixture
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
* Fix FileTaskHandler only read from default executor

* Add cached_property back to avoid loading executors

* Add test for multi-executors scenario

* Allow to call load_executor without init_executors

* Refactor by caching necessary executors

* Refactor test with default executor case

* Fix side effect from executor_loader

* Fix KubernetesExecutor test

- Previous test failure is cuased by cache state of executor_instances
- Should set ti.state = RUNNING after ti.run

* Fix side effect from executor_loader

- The side effect only show up in postgres as backend environment, as
  previous fix only resolve side effect in sqlite as backend environment.
- Also refactor clean_executor_loader as pytest fixture with setup
  teardown

* Capitalize default executor key

* Refactor clean_executor_loader fixture
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
* Fix FileTaskHandler only read from default executor

* Add cached_property back to avoid loading executors

* Add test for multi-executors scenario

* Allow to call load_executor without init_executors

* Refactor by caching necessary executors

* Refactor test with default executor case

* Fix side effect from executor_loader

* Fix KubernetesExecutor test

- Previous test failure is cuased by cache state of executor_instances
- Should set ti.state = RUNNING after ti.run

* Fix side effect from executor_loader

- The side effect only show up in postgres as backend environment, as
  previous fix only resolve side effect in sqlite as backend environment.
- Also refactor clean_executor_loader as pytest fixture with setup
  teardown

* Capitalize default executor key

* Refactor clean_executor_loader fixture
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
* Fix FileTaskHandler only read from default executor

* Add cached_property back to avoid loading executors

* Add test for multi-executors scenario

* Allow to call load_executor without init_executors

* Refactor by caching necessary executors

* Refactor test with default executor case

* Fix side effect from executor_loader

* Fix KubernetesExecutor test

- Previous test failure is cuased by cache state of executor_instances
- Should set ti.state = RUNNING after ti.run

* Fix side effect from executor_loader

- The side effect only show up in postgres as backend environment, as
  previous fix only resolve side effect in sqlite as backend environment.
- Also refactor clean_executor_loader as pytest fixture with setup
  teardown

* Capitalize default executor key

* Refactor clean_executor_loader fixture
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow FileTaskHandler get running tasks logs not only from default executor
7 participants