Stop worker when side task fails #1463
Merged
+66
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello again 👋
This time with another problem: We saw that from time to time our periodic jobs aren't run anymore and since then tried to find out why.
Over time we added a dead man switch to the worker and lastly monitoring to the heartbeat. Now we think we found at least one part of the problem:
On the weekend we got two applications which had heartbeats but stopped to schedule on periodic tasks.
As we dug through the code we stumbled over the side tasks which are started but not monitored so that potential issues within the started task never show up. As there is no await the exceptions within a task won't be raised.
We suspect that the periodic_deferrer task stopped, probably due to the short network issues and that leaves us with a running worker, while no periodic tasks getting scheduled anymore.
As we can't do anything about the Azure problems, this is our suggestion to deal with problems if one of the side tasks dies.
c09370b adds a test with the current behavior: A side task fails and the runner continues to run.
7765591 adds the new behavior with a test, which should be a breaking change.
As the logic within the side tasks is essential for a working application we think stopping the worker is the right move. After shutdown the worker could be restarted automatically via the infrastructure.
Successful PR Checklist:
PR label(s):