[Serve] Ray Serve with multiplexing with batching #58358

manickavela29 · 2025-11-02T17:30:26Z

Description

PR introduces multiplexed model serving with batching support to Ray Serve, enabling efficient hosting of multiple models on a single replica with automatic batching capabilities. This feature significantly improves resource utilization and reduces deployment costs for multi-model scenarios.

Performance Improvements
Asynchronous cleanup: Non-blocking cleanup of completed requests
Batching-aware routing: Prioritizes replicas with pending requests for the same model
Efficient request tracking: Optimized data structures for tracking pending requests by model

Related issues

Additional information

Sample Code

@serve.deployment
class MultiModelService:
    def __init__(self):
        # Initialize multiplexer with batching
        self.multiplexer = ModelMultiplexer(
            model_load_func=self.load_model,
            max_num_models_per_replica=3,
            enable_batching=True,
            max_batch_size=8,
            batch_wait_timeout_s=0.01)
    
    async def load_model(self, model_id: str):
        # Your model loading logic
        return load_your_model(model_id)
    
    async def __call__(self, request):
        model_id = request.get("model_id")
        input_data = request.get("input")
        return await self.multiplexer.predict(input_data, model_id)

Test files
test_multiplex_batching.py - Core functionality tests
test_multiplex_batching_router.py - Request routing tests
test_multiplex_batching_utils.py - Utilities and fixtures

Signed-off-by: manickavela29 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces batching capabilities to the multiplexing feature in Ray Serve, which is a valuable enhancement for performance. The changes include updates to the multiplexed decorator, the _ModelMultiplexWrapper, and the request router to support batching-aware routing. The implementation is mostly solid, but there are a few key issues to address. The batching-aware routing logic in the request router appears incomplete, as it identifies opportunities for batching but doesn't act on them. Additionally, there's a potential performance issue with how completed requests are cleaned up. The new tests are comprehensive, but one of them has flawed logic that doesn't correctly test the batching mechanism, and there are some leftover debugging statements that should be removed. Overall, this is a great feature addition, and with these fixes, it will be a strong contribution.

python/ray/serve/_private/request_router/request_router.py

python/ray/serve/tests/test_multiplex_batching_router.py

python/ray/serve/_private/request_router/request_router.py

python/ray/serve/multiplex.py

python/ray/serve/tests/test_multiplex_batching.py

python/ray/serve/tests/test_multiplex_batching_router.py

Signed-off-by: manickavela29 <[email protected]>

cursor · 2025-11-04T04:50:41Z

python/ray/serve/_private/request_router/request_router.py

+                                    )
+                            except Exception:
+                                # Future might not have replica result, skip
+                                pass


Bug: Batching Optimization Fails due to Incorrect Future Check

The batching-aware routing logic is ineffective. The _get_pending_requests_for_model method filters out completed requests, but the subsequent batching-friendly replica selection logic incorrectly checks for pending_req.future.done(). This condition is always false, preventing the batching optimization from executing.

cursor · 2025-11-04T04:50:41Z

python/ray/serve/multiplex.py

+            request_context=request_context,
+        )
+
+        batch_queue.queue.put(single_request)


Bug: Misuse Breaks Lazy Batch Queue Wrapper API

The _LazyBatchQueueWrapper is used incorrectly by directly accessing its internal _queue attribute and its queue attribute. This bypasses the wrapper's lazy initialization and violates its intended API, which can lead to AttributeError or other runtime errors when handling batched requests or during shutdown.

Additional Locations (2)

python/ray/serve/multiplex.py#L215-L216

python/ray/serve/multiplex.py#L233-L234

mulitplexing with batching

02222b8

Signed-off-by: manickavela29 <[email protected]>

manickavela29 requested a review from a team as a code owner November 2, 2025 17:30

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Nov 2, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Nov 2, 2025

manickavela29 marked this pull request as draft November 3, 2025 01:54

manickavela29 changed the title ~~multiplexing with batching~~ [WIP] multiplexing with batching Nov 3, 2025

handling request rejections and tests

550ee37

Signed-off-by: manickavela29 <[email protected]>

manickavela29 force-pushed the dev/mulitplex_batching branch from 5ea4fdc to 550ee37 Compare November 3, 2025 04:02

bazel pytest

3c639b4

Signed-off-by: manickavela29 <[email protected]>

manickavela29 force-pushed the dev/mulitplex_batching branch 5 times, most recently from 0f61147 to f014923 Compare November 3, 2025 11:48

lint fix with router counter fix

8cdf240

Signed-off-by: manickavela29 <[email protected]>

manickavela29 force-pushed the dev/mulitplex_batching branch from f014923 to 8cdf240 Compare November 3, 2025 12:17

manickavela29 marked this pull request as ready for review November 3, 2025 12:38

manickavela29 changed the title ~~[WIP] multiplexing with batching~~ Ray Serve with multiplexing with batching Nov 3, 2025

edoakes removed the core Issues that should be addressed in Ray Core label Nov 3, 2025

Merge branch 'master' into dev/mulitplex_batching

3c71a17

manickavela29 changed the title ~~Ray Serve with multiplexing with batching~~ [Serve] Ray Serve with multiplexing with batching Nov 4, 2025

cursor bot reviewed Nov 4, 2025

View reviewed changes

manickavela29 marked this pull request as draft November 5, 2025 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve] Ray Serve with multiplexing with batching #58358

[Serve] Ray Serve with multiplexing with batching #58358

Uh oh!

manickavela29 commented Nov 2, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 4, 2025

Uh oh!

cursor bot Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Serve] Ray Serve with multiplexing with batching #58358

Are you sure you want to change the base?

[Serve] Ray Serve with multiplexing with batching #58358

Uh oh!

Conversation

manickavela29 commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 4, 2025

Choose a reason for hiding this comment

Bug: Batching Optimization Fails due to Incorrect Future Check

Uh oh!

cursor bot Nov 4, 2025

Choose a reason for hiding this comment

Bug: Misuse Breaks Lazy Batch Queue Wrapper API

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manickavela29 commented Nov 2, 2025 •

edited

Loading