Skip to content

feat: improved job wait_for strategies#318

Open
vanyae-cqc wants to merge 6 commits intomainfrom
feat/long-running-job-monitoring
Open

feat: improved job wait_for strategies#318
vanyae-cqc wants to merge 6 commits intomainfrom
feat/long-running-job-monitoring

Conversation

@vanyae-cqc
Copy link
Copy Markdown
Collaborator

@vanyae-cqc vanyae-cqc commented Feb 24, 2026

Nexus frequently has users that want to monitor a long running job (one that will take more than 10 minutes from submission to execution, sometimes as much as several days or weeks).

This PR adds some additional strategies to the wait_for function, namely:

  • pure polling (HTTP GET, with a backoff)
  • pure websocket
  • auto (websocket before falling back to polling via HTTP GET)

Also I have set the default timeout to None, as this is what users seem to expect.

The hope being that this will handle long running jobs better than a pure websocket, which I find is often dropped if kept open for a long time.

I have also added optional logging, trying to stick to best practises for python libraries

@vanyae-cqc vanyae-cqc marked this pull request as ready for review February 25, 2026 17:10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early thoughts on the approach. I'm sorry: this proposes quite a re-write, so I've tried to justify.

If easier, we could accept that there are different ways to skin a cat and I'll pipe down and review as-is. Let me know what you think about the alternative approach I've suggested

job=job,
wait_for_status=wait_for_status,
websocket_timeout=websocket_timeout,
)
Copy link
Copy Markdown
Contributor

@quantinuum-richard-morrison quantinuum-richard-morrison Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an enum a case and three functions, these could be BaseStrategy (perhaps an ABC) and three subclasses WebsocketStrategy, PollingStrategy and HybridStrategy. The advantages would be:

  • users could implement their own variations easily (if they wanted to customise)
  • HybridStrategy could be based on PollingStrategy with WebsocketStrategy pulled in as a mixin. Fewer lines of code (maybe) (not that that's a great metric). DRYer (hopefully)
  • arguably more idiomatic python, easier to maintain and reason about

Comment on lines +362 to +365
initial_interval: float = 1.0,
max_interval_queued: float = 1200.0,
max_interval_running: float = 180.0,
backoff_factor: float = 2.0,
Copy link
Copy Markdown
Contributor

@quantinuum-richard-morrison quantinuum-richard-morrison Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(related to my other comment about using subclasses for different strategies)

The implementation of wait_for gives the user no way to choose different values for these params. They have to pass in an enum member and the code in wait_for picks a function, calling it with just the required args, so these'll always get defaults.

With a class-based approach, it could be:

class PollingStrategy(BaseStrategy):
    ...
    backoff_factor: float = 2.0
    ...

and then a user could (if they chose) do:

class MyCustomPollingStrategy(PollingStrategy):
    backoff_factor: float = 1.5

and they could pass that to wait_for

Comment on lines +370 to +371
- QUEUED: Polls less frequently (default 20 min) since queue position changes slowly
- RUNNING/SUBMITTED: Polls more frequently (default 3 min) for responsiveness
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants