Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtrigger timeout #6319

Open
oliver-sanders opened this issue Aug 22, 2024 · 6 comments
Open

xtrigger timeout #6319

oliver-sanders opened this issue Aug 22, 2024 · 6 comments
Labels
could be better Not exactly a bug, but not ideal.
Milestone

Comments

@oliver-sanders
Copy link
Member

I've recently seen a workflow with an xtrigger which ran for 10 minutes.

It's likely that this clogged up the subproc pool causing delays to other commands. We should instate a default timeout for xtriggers that is more compatible with their intended usage in order to feedback this advice to workflow developers earlier in the process to avoid ending up in this situation. This would also help to protect against hanging xtriggers of course.

@oliver-sanders oliver-sanders added the could be better Not exactly a bug, but not ideal. label Aug 22, 2024
@oliver-sanders oliver-sanders added this to the 8.x milestone Aug 22, 2024
@hjoliver
Copy link
Member

hjoliver commented Aug 22, 2024

https://cylc.github.io/cylc-doc/stable/html/reference/config/global.html#global.cylc[scheduler]process%20pool%20timeout

So you're suggesting a separate timeout for xtriggers, or reducing the general process pool timeout?

If the system load is such that job submission and event handlers are taking PT10M to complete, then it's probably reasonable that xtriggers do too.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Aug 23, 2024

I'm suggesting a separate timeout for xtriggers.

Xtriggers are supposed to be simple short-running functions that return True/False and are run over and over until they are satisfied. They are not intended to be used as long-lived pollers that sit there waiting for a condition to be met.

The PT10M timeout encourages the development of long-lived pollers. Setting a shorter default would discourage this behaviour (which is not the intended usage and really not a good idea as it will choke up your subprocpool).

Async xtriggers will provide us with a way to make long-lived xtriggers a viable prospect, by making the "waiting on external condition" bit async. However, I don't think we will implement async xtriggers within the same framework due to their very different execution environment.

@hjoliver
Copy link
Member

We definitely should not encourage "long-lived poller" xtriggers.

However, I think legit one-off-check xtriggers can still be on par (in terms of execution time) with other process pool processes . Well, perhaps not batched remote submissions, but certainly event handlers. Do you think that's not the case?

@oliver-sanders
Copy link
Member Author

Are you just arguing that xtriggers shouldn't be "special" or are you arguing that a 10 minute xtrigger is "reasonable"?

@hjoliver
Copy link
Member

I'm kind of struggling to see exactly why xtriggers should time out earlier than other processes in the process pool. A legit xtrigger could easily take as long to execute as job submission or event handler processes. None of these should be taking anything like 10 minutes (the default timeout) but I think we allow that in case of extreme system load.

@oliver-sanders
Copy link
Member Author

Ok, so you are arguing that xtriggers should not be "special", not that a PT10M xtrigger is reasonable.

System load aside, we know that some subprocesses can take quite a while under normal circumstances, e.g. remote file-installation and remote log-retrieval will take as long as data transfer requires and could be impacted by local network. So the PT10M might make sense in some situations, however, it definitely does not make sense for xtriggers (which were intended to be short-lived polling functions).

So given that some subprocess might be reasonably expected to take longer to run than an xtrigger, is it not reasonable to implement a separate time limit for xtriggers?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal.
Projects
None yet
Development

No branches or pull requests

2 participants