RPC support for OmniLLM #355

knlnguyen1802 · 2025-12-18T03:15:26Z

Purpose

Fix #342 and partial fix #316
This will support to call sleep, wait and load_weights for AR ModelRunner

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: knlnguyen1802 <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-18T03:17:51Z

vllm_omni/entrypoints/omni_stage.py

+                _logging.getLogger(__name__).debug(
+                    "[Stage-%s] Executing collective_rpc: method=%s", stage_id, method
+                )
+                result = stage_engine.collective_rpc(method, timeout, args, kwargs)
+                out_q.put({


Worker RPC handler calls missing method

The collective_rpc path in the worker invokes stage_engine.collective_rpc(...), but the stage engine is an OmniStageLLM (created at line 500) and there is no collective_rpc implementation anywhere in that class or the rest of the repo. Every collective_rpc task will therefore raise an AttributeError and be returned as an error response, so the new API never actually executes on the worker.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-18T03:17:52Z

vllm_omni/entrypoints/omni_stage.py

+            result = self.try_collect()
+            if result is not None and result.get("type") == "collective_rpc_result" and result.get("rpc_id") == rpc_id:
+                if "error" in result:
+                    raise RuntimeError(f"collective_rpc failed: {result['error']}")
+                return result["result"]


collective_rpc drops unrelated out-queue messages

While waiting for a matching collective_rpc_result, this loop consumes any item from self._out_q via try_collect() and discards it if the type/rpc_id do not match. That permanently removes normal stage outputs that arrive while the RPC is pending, causing those requests to hang because their responses are lost. The code should buffer or requeue non-RPC messages instead of dropping them.

Useful? React with 👍 / 👎.

ZJY0516 · 2025-12-18T03:38:41Z

vllm_omni/entrypoints/omni_stage.py

+        assert self._in_q is not None and self._out_q is not None, "Queues must be attached before collective_rpc"
+
+        # Submit collective_rpc task to worker
+        import uuid


Why we use lazy import here?

Sorry, it should be a draft [WIP] but thanks for the review. Will fix it

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 · 2025-12-18T07:28:02Z

@ZJY0516 Could you have a look again thanks

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-18T07:29:21Z

vllm_omni/entrypoints/omni_stage.py

+            try:
+                result = stage_engine.collective_rpc(method, timeout, args, kwargs)
+                out_q.put({


collective_rpc task calls missing engine method

The worker loop dispatches collective RPCs to stage_engine.collective_rpc(...), but OmniStageLLM (and the rest of the repo—checked with rg "collective_rpc") defines no such method. Any collective_rpc task will therefore raise an AttributeError in the worker, propagate back as an error reply, and cause OmniStage.collective_rpc to raise a RuntimeError, leaving the new API unusable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-18T07:29:21Z

vllm_omni/entrypoints/omni_stage.py

+            result = self.try_collect()
+            if result is not None:
+                if result.get("type") == "collective_rpc_result":
+                    if result.get("rpc_id") == rpc_id:
+                        if "error" in result:


RPC wait loop discards non-RPC outputs

While waiting for a matching collective_rpc_result, the new collective_rpc method pops any item from self._out_q via try_collect() and ignores everything that is not the target RPC result. That silently drops unrelated stage outputs (e.g., pending inference responses or status messages), so issuing a collective RPC while other work is in-flight will lose those messages and leave callers hanging without responses.

Useful? React with 👍 / 👎.

Signed-off-by: knlnguyen1802 <[email protected]>

Try to do rpc

5f17c7c

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 requested a review from hsliuustc0106 as a code owner December 18, 2025 03:15

chatgpt-codex-connector bot reviewed Dec 18, 2025

View reviewed changes

ZJY0516 reviewed Dec 18, 2025

View reviewed changes

knlnguyen1802 marked this pull request as draft December 18, 2025 03:40

Implement rpc

1ec072c

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 marked this pull request as ready for review December 18, 2025 07:26

knlnguyen1802 requested a review from ZJY0516 December 18, 2025 07:27

knlnguyen1802 changed the title ~~Try to do rpc~~ RPC support for OmniLLM Dec 18, 2025

chatgpt-codex-connector bot reviewed Dec 18, 2025

View reviewed changes

Support for async engine

a6ea211

Signed-off-by: knlnguyen1802 <[email protected]>

This was referenced Dec 19, 2025

RPC support for OmniDiffusion #371

Open

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache #316

Open

chenyingshu mentioned this pull request Dec 22, 2025

[WIP][RFC] Support Qwen-Image Flow-GRPO Training based on vLLM-Omni volcengine/verl#4639

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC support for OmniLLM #355

RPC support for OmniLLM #355

Uh oh!

knlnguyen1802 commented Dec 18, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

knlnguyen1802 Dec 18, 2025

Uh oh!

knlnguyen1802 commented Dec 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RPC support for OmniLLM #355

Are you sure you want to change the base?

RPC support for OmniLLM #355

Uh oh!

Conversation

knlnguyen1802 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 commented Dec 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knlnguyen1802 commented Dec 18, 2025 •

edited

Loading