[EPD] Allow to deallocate ec cache #18

knlnguyen1802 · 2025-12-15T09:05:17Z

Purpose

We can enable a flag "deallocate_cache" in start script to allow deallocate ec cache after the PD read it

Click to expand code example

###############################################################################
# Encoder worker
###############################################################################
CUDA_VISIBLE_DEVICES="$GPU_E" vllm serve "$MODEL" \
    --gpu-memory-utilization 0.01 \
    --port "$ENCODE_PORT" \
    --enforce-eager \
    --enable-request-id-headers \
    --no-enable-prefix-caching \
    --max-num-batched-tokens 114688 \
    --max-num-seqs 128 \
    --allowed-local-media-path ${GIT_ROOT}/tests/v1/ec_connector/integration \
    --ec-transfer-config '{
        "ec_connector": "ECExampleConnector",
        "ec_role": "ec_producer",
        "ec_connector_extra_config": {
            "shared_storage_path": "'"$EC_SHARED_STORAGE_PATH"'",
            "deallocate_cache": true
        }
    }' \
    >"${ENC_LOG}" 2>&1 &

PIDS+=($!)

###############################################################################
# Prefill+Decode worker
###############################################################################
CUDA_VISIBLE_DEVICES="$GPU_PD" vllm serve "$MODEL" \
    --gpu-memory-utilization 0.7 \
    --port "$PREFILL_DECODE_PORT" \
    --enforce-eager \
    --enable-request-id-headers \
    --max-num-seqs 128 \
    --allowed-local-media-path ${GIT_ROOT}/tests/v1/ec_connector/integration \
    --ec-transfer-config '{
        "ec_connector": "ECExampleConnector",
        "ec_role": "ec_consumer",
        "ec_connector_extra_config": {
            "shared_storage_path": "'"$EC_SHARED_STORAGE_PATH"'",
            "deallocate_cache": true
        }
    }' \
    >"${PD_LOG}" 2>&1 &

Design

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 · 2025-12-15T09:07:01Z

cc: @fake0fan @khuonglmhw @herotai214 for review

Signed-off-by: knlnguyen1802 <[email protected]>

Copilot

Pull request overview

This PR introduces encoder cache deallocation capabilities for the EC (Encoder Cache) transfer system. The main purpose is to enable automatic cleanup of remote encoder cache files when they are no longer needed, based on read/write count tracking.

Key changes include:

Added metadata tracking (read_count, write_count) to monitor cache usage and trigger deallocation when appropriate
Extended the scheduler and connector interfaces to pass cache hit information (local_hit, remote_hit) through the allocation flow
Implemented file-based locking mechanism to prevent race conditions during concurrent metadata operations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
vllm/v1/worker/ec_connector_model_runner_mixin.py	Removed debug log and added call to update remote cache state
vllm/v1/core/sched/scheduler.py	Updated scheduler to track and pass local/remote cache hit information through allocation flow
vllm/distributed/ec_transfer/ec_connector/example_connector.py	Implemented deallocation logic with metadata tracking, file locking, and cache cleanup
vllm/distributed/ec_transfer/ec_connector/base.py	Added abstract methods for cache state updates with hit information parameters

Comments suppressed due to low confidence (2)

vllm/distributed/ec_transfer/ec_connector/example_connector.py:1

The variable name 'external_update_encoder_input' is ambiguous. Consider renaming it to 'encoder_cache_hit_tracking' or 'encoder_input_hit_states' to better reflect that it tracks cache hit status (local_hit, remote_hit) rather than just updates.

# SPDX-License-Identifier: Apache-2.0

vllm/distributed/ec_transfer/ec_connector/example_connector.py:1

Corrected 'Store to option update' to 'Store to optionally update' for grammatical correctness.

# SPDX-License-Identifier: Apache-2.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm/v1/core/sched/scheduler.py

vllm/distributed/ec_transfer/ec_connector/example_connector.py

Copilot · 2025-12-16T06:11:12Z

vllm/distributed/ec_transfer/ec_connector/example_connector.py

+        # Iterate over a snapshot so we can modify the original safely.
+        for mm_hash, num_token in list(self._mm_datas_need_saves.items()):


The _mm_datas_need_saves dictionary is accessed without acquiring the _mm_state_lock, but it's modified later under the lock (line 318). This creates a potential race condition. The lock should be acquired before creating the snapshot with list() to ensure thread-safe iteration.

Suggested change

# Iterate over a snapshot so we can modify the original safely.

for mm_hash, num_token in list(self._mm_datas_need_saves.items()):

# Take a snapshot under lock to ensure thread safety.

with self._mm_state_lock:

mm_datas_need_saves_snapshot = list(self._mm_datas_need_saves.items())

for mm_hash, num_token in mm_datas_need_saves_snapshot:

Copilot · 2025-12-16T06:11:12Z

vllm/distributed/ec_transfer/ec_connector/example_connector.py

+        # are a consumer/non-producer we increment the read_count
+        # (update_mm_meta_read) so deallocation can be observed.
+        processed_updates: list[str] = []
+        for mm_hash in list(self._mm_datas_need_update_meta):


The _mm_datas_need_update_meta list is accessed without acquiring the _mm_state_lock, but it's modified later under the lock (line 320). This creates a potential race condition. The lock should be acquired before creating the snapshot with list() to ensure thread-safe iteration.

Suggested change

for mm_hash in list(self._mm_datas_need_update_meta):

with self._mm_state_lock:

mm_datas_need_update_meta_snapshot = list(self._mm_datas_need_update_meta)

for mm_hash in mm_datas_need_update_meta_snapshot:

Co-authored-by: Copilot <[email protected]>

fake0fan

I've left a few suggestions for your consideration.

Please also consider what if such an atomic count needs to be maintained by an EC connector.

Would this be easily scalable for other external storage?

fake0fan · 2025-12-17T07:30:32Z

vllm/distributed/ec_transfer/ec_connector/example_connector.py


 logger = init_logger(__name__)

+# Cache FileLock objects per path to avoid reallocating locks repeatedly.


Therefore, we need a file lock for each mm-encoded embedding?

fake0fan · 2025-12-17T07:48:53Z

vllm/v1/core/sched/scheduler.py

+        external_update_encoder_input: list[tuple[int, bool, bool]] = []

        # Check remote cache first
        if self.ec_connector is not None:


I remember there was a previous PR that needed to avoid repeatedly checking all the encode caches every time it was accessed?

It here: knlnguyen1802#7

fake0fan · 2025-12-17T08:00:14Z

vllm/distributed/ec_transfer/ec_connector/example_connector.py

+                with open(meta_filename, "r+") as f:
+                    data = json.load(f)
+                    data["write_count"] = data.get("write_count", 0) + 1
+                    data["mm_hash"] = mm_meta.mm_hash


Why is there still an update here?

To make consistent between Encoder and PD, every access to local encoder cache also need to update remote cache

vllm/distributed/ec_transfer/ec_connector/example_connector.py

Signed-off-by: Khuong Le <[email protected]>

knlnguyen1802 added 3 commits December 15, 2025 09:23

[EPD] Optional allow deallocate ec cache

d6d36c9

Signed-off-by: knlnguyen1802 <[email protected]>

Try to use filelock

baf24b3

Signed-off-by: knlnguyen1802 <[email protected]>

[Optional] Allow deallocate ec cache

df44b05

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 added 5 commits December 16, 2025 02:55

[WIP] Sync ec cache between local and remote

ac041be

Signed-off-by: knlnguyen1802 <[email protected]>

[EPD] Allow sync cache between local and remote

8d34850

Signed-off-by: knlnguyen1802 <[email protected]>

Fix

16ba7a0

Signed-off-by: knlnguyen1802 <[email protected]>

Fix pre-commit

980fadc

Signed-off-by: knlnguyen1802 <[email protected]>

Fix pre-commit

4cfb0e6

Signed-off-by: knlnguyen1802 <[email protected]>

knlnguyen1802 requested a review from Copilot December 16, 2025 06:04

Copilot started reviewing on behalf of knlnguyen1802 December 16, 2025 06:05 View session

knlnguyen1802 requested review from herotai214 and khuonglmhw December 16, 2025 06:08

Copilot AI reviewed Dec 16, 2025

View reviewed changes

knlnguyen1802 and others added 2 commits December 16, 2025 14:52

Update vllm/v1/core/sched/scheduler.py

e54a878

Co-authored-by: Copilot <[email protected]>

Update vllm/distributed/ec_transfer/ec_connector/example_connector.py

cea43bf

Co-authored-by: Copilot <[email protected]>

fake0fan reviewed Dec 17, 2025

View reviewed changes

khuonglm added 8 commits December 17, 2025 08:17

fix bugs

ac07a33

refactor

a3af19b

add weakref dict

9beb2cc

fix precommit

492f767

Signed-off-by: Khuong Le <[email protected]>

fix precommit

b780934

Signed-off-by: Khuong Le <[email protected]>

fix precommit

d84c55d

Signed-off-by: Khuong Le <[email protected]>

fix precommit

a3a3d8b

remove caches for lock

79b0e70

		# Iterate over a snapshot so we can modify the original safely.
		for mm_hash, num_token in list(self._mm_datas_need_saves.items()):

-        # Iterate over a snapshot so we can modify the original safely.
-        for mm_hash, num_token in list(self._mm_datas_need_saves.items()):
+        # Take a snapshot under lock to ensure thread safety.
+        with self._mm_state_lock:
+            mm_datas_need_saves_snapshot = list(self._mm_datas_need_saves.items())
+        for mm_hash, num_token in mm_datas_need_saves_snapshot:


		logger = init_logger(__name__)

		# Cache FileLock objects per path to avoid reallocating locks repeatedly.

[EPD] Allow to deallocate ec cache #18

Are you sure you want to change the base?

[EPD] Allow to deallocate ec cache #18

Uh oh!

Conversation

knlnguyen1802 commented Dec 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Design

Test Result

Uh oh!

knlnguyen1802 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

fake0fan left a comment

Choose a reason for hiding this comment

Uh oh!

fake0fan Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

fake0fan Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

fake0fan Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

knlnguyen1802 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

knlnguyen1802 commented Dec 15, 2025 •

edited by github-actions bot

Loading

knlnguyen1802 commented Dec 15, 2025 •

edited

Loading