-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[2/n] [Serve] poll outbound deployments into deployment state #58350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
## Summary Adds a new method to expose all downstream deployments that a replica calls into, enabling dependency graph construction. ## Motivation Deployments call downstream deployments via handles in two ways: 1. **Stored handles**: Passed to `__init__()` and stored as attributes → `self.model.func.remote()` 2. **Dynamic handles**: Obtained at runtime via `serve.get_deployment_handle()` → `model.func.remote()` Previously, there was no way to programmatically discover these dependencies from a running replica. ## Implementation ### Core Changes - **`ReplicaActor.list_outbound_deployments()`**: Returns `List[DeploymentID]` of all downstream deployments - Recursively inspects user callable attributes to find stored handles (including nested in dicts/lists) - Tracks dynamic handles created via `get_deployment_handle()` at runtime using a callback mechanism - **Runtime tracking**: Modified `get_deployment_handle()` to register handles when called from within a replica via `ReplicaContext._handle_registration_callback` Next PR: #58350 --------- Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
| if self.curr_status_info.status == DeploymentStatus.HEALTHY: | ||
| self._outbound_poll_delay = min( | ||
| self._outbound_poll_delay * 2, self._max_outbound_poll_delay | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Exponential Backoff Fails for Non-Healthy Polls Improve
The exponential backoff for outbound deployments polling only increases the delay when the deployment status is HEALTHY. This means that during deployment updates, rollouts, or any non-HEALTHY states (UPDATING, UPSCALING, DOWNSCALING, etc.), the poll delay will never increase and replicas will be polled at the initial 1-second interval indefinitely. This defeats the purpose of exponential backoff and can cause excessive polling during long-running deployment operations. The backoff should increase based on successful polls, not deployment health status.
…58345) ## Summary Adds a new method to expose all downstream deployments that a replica calls into, enabling dependency graph construction. ## Motivation Deployments call downstream deployments via handles in two ways: 1. **Stored handles**: Passed to `__init__()` and stored as attributes → `self.model.func.remote()` 2. **Dynamic handles**: Obtained at runtime via `serve.get_deployment_handle()` → `model.func.remote()` Previously, there was no way to programmatically discover these dependencies from a running replica. ## Implementation ### Core Changes - **`ReplicaActor.list_outbound_deployments()`**: Returns `List[DeploymentID]` of all downstream deployments - Recursively inspects user callable attributes to find stored handles (including nested in dicts/lists) - Tracks dynamic handles created via `get_deployment_handle()` at runtime using a callback mechanism - **Runtime tracking**: Modified `get_deployment_handle()` to register handles when called from within a replica via `ReplicaContext._handle_registration_callback` Next PR: ray-project#58350 --------- Signed-off-by: abrar <[email protected]>
Update deployment state to poll the replicas to collect information about outbound deployments. polling uses exponential backoff, capped at 10 minutes, starting at 1s.
we are polling from all replicas. This will give the most accurate result, but can also be expensive for deployments with 1000s of replicas. Polling only 1 replica can be suboptimal, should we query some fixed set for efficiency, like 10?
Next PR -> #58355