-
Notifications
You must be signed in to change notification settings - Fork 3
[Feat] EPD Mooncake Store Connector #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py
Outdated
Show resolved
Hide resolved
vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py
Outdated
Show resolved
Hide resolved
knlnguyen1802
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
|
cc @fake0fan PLTA again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a Mooncake storage connector for the Encoder Cache (EC) disaggregated architecture, enabling distributed encoder cache transfer across vLLM instances. The connector supports both regular and zero-copy transfer modes for efficient multimodal data sharing.
Key Changes:
- Introduces
ECMooncakeStorageConnectorwith support for async batch operations and zero-copy transfers using pinned memory - Adds
TensorMemoryPoolwith buddy allocation for efficient pinned memory management with FIFO eviction - Integrates synchronization mechanism (
wait_for_save()) to ensure encoder cache persistence before request completion
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py |
Core Mooncake store implementation with batch operations and zero-copy support |
vllm/distributed/ec_transfer/ec_connector/mooncake_storage_connector.py |
Connector interface implementation for EC transfer operations |
vllm/distributed/ec_transfer/utils/tensor_memory_pool.py |
Buddy allocator-based memory pool for pinned host memory management |
vllm/distributed/ec_transfer/ec_connector/base.py |
Added wait_for_save() interface method to base connector |
vllm/v1/worker/ec_connector_model_runner_mixin.py |
Added mixin method to wait for async save operations |
vllm/v1/worker/gpu_model_runner.py |
Integrated wait for save after multimodal encoding |
vllm/distributed/ec_transfer/ec_connector/factory.py |
Registered new Mooncake connector in factory |
tests/v1/ec_connector/unit/test_mooncake_store.py |
Comprehensive unit tests with fake Mooncake store implementation |
examples/.../mooncake_connector/disagg_1e1pd_example.sh |
Example script for 1 encoder + 1 prefill/decode setup |
examples/.../mooncake_connector/disagg_1e1p1d_example.sh |
Example script for 1 encoder + 1 prefill + 1 decode setup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
examples/online_serving/disaggregated_encoder/mooncake_connector/disagg_1e1pd_example.sh
Outdated
Show resolved
Hide resolved
examples/online_serving/disaggregated_encoder/mooncake_connector/disagg_1e1pd_example.sh
Outdated
Show resolved
Hide resolved
vllm/distributed/ec_transfer/ec_lookup_buffer/mooncake_store.py
Outdated
Show resolved
Hide resolved
vllm/distributed/ec_transfer/ec_connector/mooncake_storage_connector.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Khuong Le <[email protected]>
| Raises: | ||
| ValueError: If tensor is not on CUDA or allocation fails | ||
| """ | ||
| if not tensor.is_cuda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using is_cuda is incompatible with NPU. And this file is similar to distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py. Is it possible to avoid adding a separate file?
Signed-off-by: Khuong Le <[email protected]>
Signed-off-by: Khuong Le <[email protected]>
Purpose
Implement Mooncake storage EC connector
Test Plan
Test Result
The result of script disagg_1e1pd_example.sh on A100-40GB
rdma:
tcp:
example connector (disk):
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.