[2/n] [Serve] poll outbound deployments into deployment state #58350

abrarsheikh · 2025-11-01T01:23:07Z

Update deployment state to poll the replicas to collect information about outbound deployments. polling uses exponential backoff, capped at 10 minutes, starting at 1s.

we are polling from all replicas. This will give the most accurate result, but can also be expensive for deployments with 1000s of replicas. Polling only 1 replica can be suboptimal, should we query some fixed set for efficiency, like 10?

Next PR -> #58355

Signed-off-by: abrar <[email protected]>

…yments

Signed-off-by: abrar <[email protected]>

## Summary Adds a new method to expose all downstream deployments that a replica calls into, enabling dependency graph construction. ## Motivation Deployments call downstream deployments via handles in two ways: 1. **Stored handles**: Passed to `__init__()` and stored as attributes → `self.model.func.remote()` 2. **Dynamic handles**: Obtained at runtime via `serve.get_deployment_handle()` → `model.func.remote()` Previously, there was no way to programmatically discover these dependencies from a running replica. ## Implementation ### Core Changes - **`ReplicaActor.list_outbound_deployments()`**: Returns `List[DeploymentID]` of all downstream deployments - Recursively inspects user callable attributes to find stored handles (including nested in dicts/lists) - Tracks dynamic handles created via `get_deployment_handle()` at runtime using a callback mechanism - **Runtime tracking**: Modified `get_deployment_handle()` to register handles when called from within a replica via `ReplicaContext._handle_registration_callback` Next PR: #58350 --------- Signed-off-by: abrar <[email protected]>

…brar-controller

Signed-off-by: abrar <[email protected]>

python/ray/serve/_private/deployment_state.py

Signed-off-by: abrar <[email protected]>

cursor · 2025-11-06T22:13:06Z

python/ray/serve/_private/deployment_state.py

+        if self.curr_status_info.status == DeploymentStatus.HEALTHY:
+            self._outbound_poll_delay = min(
+                self._outbound_poll_delay * 2, self._max_outbound_poll_delay
+            )


Bug: Exponential Backoff Fails for Non-Healthy Polls Improve

The exponential backoff for outbound deployments polling only increases the delay when the deployment status is HEALTHY. This means that during deployment updates, rollouts, or any non-HEALTHY states (UPDATING, UPSCALING, DOWNSCALING, etc.), the poll delay will never increase and replicas will be polled at the initial 1-second interval indefinitely. This defeats the purpose of exponential backoff and can cause excessive polling during long-running deployment operations. The backoff should increase based on successful polls, not deployment health status.

…58345) ## Summary Adds a new method to expose all downstream deployments that a replica calls into, enabling dependency graph construction. ## Motivation Deployments call downstream deployments via handles in two ways: 1. **Stored handles**: Passed to `__init__()` and stored as attributes → `self.model.func.remote()` 2. **Dynamic handles**: Obtained at runtime via `serve.get_deployment_handle()` → `model.func.remote()` Previously, there was no way to programmatically discover these dependencies from a running replica. ## Implementation ### Core Changes - **`ReplicaActor.list_outbound_deployments()`**: Returns `List[DeploymentID]` of all downstream deployments - Recursively inspects user callable attributes to find stored handles (including nested in dicts/lists) - Tracks dynamic handles created via `get_deployment_handle()` at runtime using a callback mechanism - **Runtime tracking**: Modified `get_deployment_handle()` to register handles when called from within a replica via `ReplicaContext._handle_registration_callback` Next PR: ray-project#58350 --------- Signed-off-by: abrar <[email protected]>

abrarsheikh added 2 commits October 31, 2025 18:08

expose outbound deployment ids from replica actor

782720b

Signed-off-by: abrar <[email protected]>

[Serve] poll outbound deployments into deployment state

50353d1

Signed-off-by: abrar <[email protected]>

abrarsheikh changed the base branch from master to dag-of-deployments November 1, 2025 01:23

abrarsheikh requested review from akyang-anyscale and zcin November 1, 2025 01:24

abrarsheikh added the go add ONLY when ready to merge, run all tests label Nov 1, 2025

abrarsheikh mentioned this pull request Nov 1, 2025

[1/n] expose outbound deployment ids from replica actor #58345

Merged

abrarsheikh added 5 commits November 1, 2025 19:20

fix type

5e9901c

Signed-off-by: abrar <[email protected]>

change how we scan for deployment handle

ea7d807

Signed-off-by: abrar <[email protected]>

Merge branch 'master' of github.com:ray-project/ray into dag-of-deplo…

be35418

…yments

update docstrings

297261e

Signed-off-by: abrar <[email protected]>

Merge branch 'dag-of-deployments' into SERVE-1425-abrar-controller

494fd9f

Base automatically changed from dag-of-deployments to master November 6, 2025 21:33

abrarsheikh added 2 commits November 6, 2025 21:33

Merge branch 'master' of github.com:ray-project/ray into SERVE-1425-a…

bc6a053

…brar-controller

use all replicas

c81ee07

Signed-off-by: abrar <[email protected]>

abrarsheikh marked this pull request as ready for review November 6, 2025 22:03

abrarsheikh requested a review from a team as a code owner November 6, 2025 22:03

cursor bot reviewed Nov 6, 2025

View reviewed changes

python/ray/serve/_private/deployment_state.py Show resolved Hide resolved

address bot

a98c82a

Signed-off-by: abrar <[email protected]>

cursor bot reviewed Nov 6, 2025

View reviewed changes

ray-gardener bot added the serve Ray Serve Related Issue label Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2/n] [Serve] poll outbound deployments into deployment state #58350

[2/n] [Serve] poll outbound deployments into deployment state #58350

Uh oh!

abrarsheikh commented Nov 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

cursor bot Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[2/n] [Serve] poll outbound deployments into deployment state #58350

Are you sure you want to change the base?

[2/n] [Serve] poll outbound deployments into deployment state #58350

Uh oh!

Conversation

abrarsheikh commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 6, 2025

Choose a reason for hiding this comment

Bug: Exponential Backoff Fails for Non-Healthy Polls Improve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abrarsheikh commented Nov 1, 2025 •

edited

Loading