-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch "crashed" tasks on task-level retry #7601
Comments
Hi @trahloff — I do not think we should retry on crashes. Crashes indicate infrastructure failure or failure of the Prefect engine itself. These errors are either Thanks for the issue though, happy to chat further about it here. |
Hi @madkinsz, thanks a lot for the context and lightning-fast response! Your perspective makes total sense. Are you aware of any workaround to mitigate the Prefect engine errors or make the flow fail gracefully in the meantime? |
We're adding client-level retries on those HTTP errors in #7593. We think those are upstream bugs, but we're going to deliver retries as a workaround until we can find the root cause and contribute a fix to httpcore. See #7442 for more details on that issue. What would be the ideal way for the flow to fail gracefully? |
Nice, great to see that the root cause is already identified and addressed!
Now that I try to really pinpoint the requirement, I realize that "failing gracefully" might not even be the right term. Prefect gracefully fails the flow and isolates the crash right now. From my perspective, it would be extremely helpful to fail under these conditions and include more hints for different crash types if this is realistically doable. What do you think? Would that make sense within the current Prefect setup? |
Yeah we could improve that. We are already roughly doing that by changing the messaging based on the exception type at https://github.com/PrefectHQ/prefect/blob/main/src/prefect/states.py#L112 |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
This issue was closed because it has been stale for 14 days with no activity. If this issue is important or you have more to add feel free to re-open it. |
First check
Prefect Version
2.x
Describe the current behavior
A task-level retry will retry the task if it fails but not if it crashes.
Describe the proposed behavior
A task-level retry will retry the task if it fails and when it crashes.
Example Use
It can happen that some tasks crash because of exceptions within the Prefect client-side SDK that would be mitigated by a retry. This scenario fails the whole flow (interestingly in a
failed
state and notcrashed
) even if 1000 tasks succeed and 1 crashes.For example, running many tasks in parallel can occasionally lead to the following exceptions:
httpcore.LocalProtocolError: Invalid input ConnectionInputs.SEND_HEADERS in state ConnectionState.CLOSED
andhttpcore.LocalProtocolError: Invalid input ConnectionInputs.RECV_PING in state ConnectionState.CLOSED
.It is often enough to wait a couple of minutes and run the same task again.
Additional context
No response
The text was updated successfully, but these errors were encountered: