[Feat] EPD Mooncake Store Connector #17

khuonglmhw · 2025-12-11T07:34:47Z

Purpose

Implement Mooncake storage EC connector

Test Plan

Test Result

The result of script disagg_1e1pd_example.sh on A100-40GB

rdma:

============ Serving Benchmark Result ============
Successful requests:                     100       
Failed requests:                         0         
Benchmark duration (s):                  18.40     
Total input tokens:                      15000     
Total generated tokens:                  10000     
Request throughput (req/s):              5.43      
Output token throughput (tok/s):         543.41    
Peak output token throughput (tok/s):    681.00    
Peak concurrent requests:                100.00    
Total Token throughput (tok/s):          1358.53   
---------------Time to First Token----------------
Mean TTFT (ms):                          8403.98   
Median TTFT (ms):                        8555.08   
P99 TTFT (ms):                           16054.71  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          25.17     
Median TPOT (ms):                        25.08     
P99 TPOT (ms):                           29.13     
---------------Inter-token Latency----------------
Mean ITL (ms):                           26.79     
Median ITL (ms):                         22.37     
P99 ITL (ms):                            115.44    
==================================================

tcp:

============ Serving Benchmark Result ============
Successful requests:                     100       
Failed requests:                         0         
Benchmark duration (s):                  19.22     
Total input tokens:                      15000     
Total generated tokens:                  10000     
Request throughput (req/s):              5.20      
Output token throughput (tok/s):         520.41    
Peak output token throughput (tok/s):    722.00    
Peak concurrent requests:                100.00    
Total Token throughput (tok/s):          1301.03   
---------------Time to First Token----------------
Mean TTFT (ms):                          9057.54   
Median TTFT (ms):                        8348.85   
P99 TTFT (ms):                           15949.27  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          24.83     
Median TPOT (ms):                        24.95     
P99 TPOT (ms):                           26.65     
---------------Inter-token Latency----------------
Mean ITL (ms):                           26.29     
Median ITL (ms):                         22.53     
P99 ITL (ms):                            92.85     
==================================================

example connector (disk):

============ Serving Benchmark Result ============
Successful requests:                     100       
Failed requests:                         0         
Benchmark duration (s):                  18.57     
Total input tokens:                      15000     
Total generated tokens:                  10000     
Request throughput (req/s):              5.38      
Output token throughput (tok/s):         538.45    
Peak output token throughput (tok/s):    736.00    
Peak concurrent requests:                100.00    
Total Token throughput (tok/s):          1346.12   
---------------Time to First Token----------------
Mean TTFT (ms):                          8258.42   
Median TTFT (ms):                        8068.73   
P99 TTFT (ms):                           15228.11  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          23.78     
Median TPOT (ms):                        23.86     
P99 TPOT (ms):                           25.54     
---------------Inter-token Latency----------------
Mean ITL (ms):                           24.93     
Median ITL (ms):                         21.56     
P99 ITL (ms):                            92.76     
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py

vllm/distributed/ec_transfer/ec_connector/base.py

knlnguyen1802

LGTM now.

knlnguyen1802 · 2025-12-12T09:52:06Z

cc @fake0fan PLTA again

Copilot

Pull request overview

This PR implements a Mooncake storage connector for the Encoder Cache (EC) disaggregated architecture, enabling distributed encoder cache transfer across vLLM instances. The connector supports both regular and zero-copy transfer modes for efficient multimodal data sharing.

Key Changes:

Introduces ECMooncakeStorageConnector with support for async batch operations and zero-copy transfers using pinned memory
Adds TensorMemoryPool with buddy allocation for efficient pinned memory management with FIFO eviction
Integrates synchronization mechanism (wait_for_save()) to ensure encoder cache persistence before request completion

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 20 comments.

Show a summary per file

File	Description
`vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py`	Core Mooncake store implementation with batch operations and zero-copy support
`vllm/distributed/ec_transfer/ec_connector/mooncake_storage_connector.py`	Connector interface implementation for EC transfer operations
`vllm/distributed/ec_transfer/utils/tensor_memory_pool.py`	Buddy allocator-based memory pool for pinned host memory management
`vllm/distributed/ec_transfer/ec_connector/base.py`	Added `wait_for_save()` interface method to base connector
`vllm/v1/worker/ec_connector_model_runner_mixin.py`	Added mixin method to wait for async save operations
`vllm/v1/worker/gpu_model_runner.py`	Integrated wait for save after multimodal encoding
`vllm/distributed/ec_transfer/ec_connector/factory.py`	Registered new Mooncake connector in factory
`tests/v1/ec_connector/unit/test_mooncake_store.py`	Comprehensive unit tests with fake Mooncake store implementation
`examples/.../mooncake_connector/disagg_1e1pd_example.sh`	Example script for 1 encoder + 1 prefill/decode setup
`examples/.../mooncake_connector/disagg_1e1p1d_example.sh`	Example script for 1 encoder + 1 prefill + 1 decode setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm/distributed/ec_transfer/utils/tensor_memory_pool.py

vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py

tests/v1/ec_connector/unit/test_mooncake_store.py

examples/online_serving/disaggregated_encoder/mooncake_connector/disagg_1e1pd_example.sh

vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py

vllm/distributed/ec_transfer/ec_connector/mooncake_storage_connector.py

Signed-off-by: Khuong Le <[email protected]>

Shirley125 · 2025-12-17T10:46:27Z

vllm/distributed/ec_transfer/utils/tensor_memory_pool.py

+        Raises:
+            ValueError: If tensor is not on CUDA or allocation fails
+        """
+        if not tensor.is_cuda:


Using is_cuda is incompatible with NPU. And this file is similar to distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py. Is it possible to avoid adding a separate file?

Signed-off-by: Khuong Le <[email protected]>

initial

b4d1e00

khuonglmhw changed the title ~~initial~~ [Feat] EPD Mooncake Store Connector Dec 11, 2025

fix precommit

29b1c45

knlnguyen1802 reviewed Dec 12, 2025

View reviewed changes

vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py Outdated Show resolved Hide resolved

vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py Outdated Show resolved Hide resolved

vllm/distributed/ec_transfer/ec_connector/base.py Outdated Show resolved Hide resolved

resolve comments

da63740

knlnguyen1802 approved these changes Dec 12, 2025

View reviewed changes

knlnguyen1802 requested a review from Copilot December 16, 2025 06:05

Copilot started reviewing on behalf of knlnguyen1802 December 16, 2025 06:05 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

resolve comments

c6159e6

Signed-off-by: Khuong Le <[email protected]>

Shirley125 reviewed Dec 17, 2025

View reviewed changes

khuonglm added 2 commits December 18, 2025 12:19

resolve comments

d7bbb65

Signed-off-by: Khuong Le <[email protected]>

remove metadata & non zerocopy methods

3f4bd26

Signed-off-by: Khuong Le <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] EPD Mooncake Store Connector #17

[Feat] EPD Mooncake Store Connector #17

Uh oh!

khuonglmhw commented Dec 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knlnguyen1802 left a comment

Uh oh!

knlnguyen1802 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shirley125 Dec 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Feat] EPD Mooncake Store Connector #17

Are you sure you want to change the base?

[Feat] EPD Mooncake Store Connector #17

Uh oh!

Conversation

khuonglmhw commented Dec 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knlnguyen1802 left a comment

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shirley125 Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

khuonglmhw commented Dec 11, 2025 •

edited by github-actions bot

Loading

Shirley125 Dec 17, 2025 •

edited

Loading