-
Notifications
You must be signed in to change notification settings - Fork 603
fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5fed545
to
78a1cdc
Compare
WalkthroughAdds an ENABLE_KVBM build flag to Docker and wheel build paths; switches Python wheel building to maturin with conditional features; updates shell script to defer Git LFS via GIT_LFS_SKIP_SMUDGE; updates doc commit reference; and refactors KVBM Python connectors to initialize from TorchLlmArgs instead of ExecutorConfig. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Image Builder
participant Docker as Dockerfile.trtllm
participant Wheel as manylinux wheel_builder
participant Cargo as Rust (block-manager)
participant Maturin as maturin
Dev->>Docker: docker build --build-arg ENABLE_KVBM={true|false}
Docker->>Wheel: Build stage (ARG ENABLE_KVBM)
alt ENABLE_KVBM=true
Wheel->>Cargo: cargo build --features block-manager
Wheel->>Maturin: uv run maturin build --features block-manager
else ENABLE_KVBM=false
Wheel->>Cargo: cargo build (default features)
Wheel->>Maturin: uv run maturin build (default)
end
note over Wheel,Maturin: Release wheels gated by RELEASE_BUILD for py3.10/3.11
sequenceDiagram
autonumber
participant App as TRT-LLM App
participant Leader as DynamoKVBMConnectorLeader
participant Worker as DynamoKVBMConnectorWorker
participant Args as TorchLlmArgs
participant Rust as RustKvConnector
App->>Leader: new(..., llm_args: TorchLlmArgs)
Leader->>Args: parallel_config.to_mapping()
Leader->>Args: kv_cache_config.tokens_per_block
Leader->>Rust: init(rank, world_size, block_size)
App->>Worker: new(..., llm_args: TorchLlmArgs)
Worker->>Args: derive rank, tokens_per_block
Worker->>Rust: init(rank, block_size)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/guides/run_kvbm_in_trtllm.md (1)
20-20
: Fix user-facing typos and clarity issues.These are in customer docs and should be clean.
Apply:
-This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm). +This guide explains how to leverage KVBM (KV Block Manager) to manage the KV cache and perform KV offloading in TensorRT-LLM (TRT-LLM). -> [!Note] +> [!Note] > - Ensure that `etcd` and `nats` are running before starting. > - KVBM does not currently support CUDA graphs in TensorRT-LLM. -> - KVBM only supports TensorRT-LLM’s PyTorch backend. +> - KVBM currently supports only TensorRT-LLM’s PyTorch backend. -> - To enable disk cache offloading, you must first enable a CPU memory cache offloading. +> - To enable disk cache offloading, you must first enable CPU memory cache offloading. -> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits. +> - Disable partial reuse (`enable_partial_reuse: false`) in the LLM API config’s `kv_connector_config` to increase offloading cache hits. > - KVBM requires TensorRT-LLM at commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer. > - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.-... hinting at ests that Aeloria holds ... lost familt clue is hidden. +... hinting that Aeloria holds ... lost family clue is hidden.Also applies to: 24-31, 91-91
container/build_trtllm_wheel.sh (1)
56-73
: Harden LFS handling and optional commit checkout.
- If git-lfs isn’t installed on the runner, the script will fail at
git lfs pull
.git checkout $TRTLLM_COMMIT
with an empty var will fail. Make both robust.Apply:
(cd /tmp && \ # Clone the TensorRT-LLM repository. if [ ! -d "TensorRT-LLM" ]; then git clone "${TRTLLM_GIT_URL}" fi cd TensorRT-LLM -# Checkout the specified commit. -# Switch to the main branch to pull the latest changes. -git checkout main -git pull -git checkout $TRTLLM_COMMIT +# Checkout the specified commit if provided, otherwise stay on the default branch. +git checkout main +git pull --ff-only +if [ -n "${TRTLLM_COMMIT:-}" ]; then + git checkout "$TRTLLM_COMMIT" +fi # Update the submodules. git submodule update --init --recursive -git lfs pull +if command -v git-lfs >/dev/null 2>&1; then + git lfs pull +else + echo "Warning: git-lfs not found; large files will not be pulled. Continuing..." >&2 +fiAlso consider adding
set -o pipefail -u
at the top for stricter error handling.
🧹 Nitpick comments (4)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)
36-39
: Use TRT-LLM logger instead of print; tighten comment.Replace stdout prints in library code with the standard logger. Also the comment about bytes_per_block reads as if you set it, but the code doesn’t; clarify.
Apply:
+from tensorrt_llm import logger @@ - print(f"KvConnectorLeader initialized with rank: {mappings.rank}") + logger.info(f"KvConnectorLeader initialized with rank: {mappings.rank}")- # Set bytes_per_block to 0, because we will retrieve the actual value from the worker side. + # bytes_per_block is determined on the worker side; leader does not require it here.Also applies to: 5-13
128-131
: Spelling nit: “critical”.Apply:
- # extract the critial aspects of the request that effect how the tokens are hashed + # Extract the critical aspects of the request that affect how the tokens are hashedlib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)
33-36
: Drop stdout prints; rely on logger.Keep logs consistent with TRT-LLM’s logger.
Apply:
- print(f"Register KV Caches on rank {self.rank}") - logger.info( - f"KvConnectorWorker started registering the kv caches on rank {self.rank}" - ) + logger.info(f"Registering KV caches on rank {self.rank}")
38-41
: Explicit type for page size (minor).Small readability boost.
Apply:
- page_size = self._llm_args.kv_cache_config.tokens_per_block + page_size: int = self._llm_args.kv_cache_config.tokens_per_block
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
container/Dockerfile.trtllm
(3 hunks)container/build_trtllm_wheel.sh
(1 hunks)docs/guides/run_kvbm_in_trtllm.md
(1 hunks)lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py
(2 hunks)lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime
(31-54)lib/bindings/python/rust/lib.rs (1)
detached
(348-360)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime
(31-54)lib/bindings/python/rust/lib.rs (1)
detached
(348-360)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
docs/guides/run_kvbm_in_trtllm.md (1)
41-42
: Unify the minimum TRT-LLM commit
The Note (L30) references commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6, while the Quick Start (L41-42) uses dcd110cfac07e577ce01343c455917832b0f3d5e. Pick the correct baseline commit and use “or newer” consistently across the doc.
Please confirm whether dcd110cfac07e577ce01343c455917832b0f3d5e is the intended commit that introduces the TorchLlmArgs and connector API changes.container/build_trtllm_wheel.sh (1)
43-47
: Good call: deferring Git LFS with GIT_LFS_SKIP_SMUDGE.This avoids rate limits and interactive prompts during clone. Consider also guarding for git-lfs availability (see next comment).
60d0816
to
6e63aed
Compare
8e492ea
to
4432848
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ok to test b5c6865 |
b5c6865
to
4dbb6f1
Compare
/ok to test b5c6865 |
4dbb6f1
to
93aca5f
Compare
/ok to test b5c6865 |
73910e5
to
fff33e6
Compare
ecb7dd3
to
b00cf90
Compare
/ok to test b00cf90 |
/ok to test b00cf90 |
d973eaa
to
fba11c0
Compare
/ok to test fba11c0 |
fba11c0
to
2c34a51
Compare
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
fix fix optional kvbm build Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
2c34a51
to
c87f54a
Compare
/ok to test c87f54a |
…test TRT-LLM connector API (#2979) Signed-off-by: richardhuo-nv <[email protected]>
Overview:
Details:
Update the KVBM–TRT-LLM integration interface
TRT-LLM recently started deprecating ExecutorConfig, replacing it with TorchLlmArgs. The corresponding change has been merged in the TRT-LLM repository: PR #7493.
Fix KVBM feature enablement in the TRT-LLM Docker image build
This is a bug fix. The Dynamo TRT-LLM image was not using the Dynamo base image, so the ENABLE_KVBM option was missing from the wheel build step. The fix adds this option, ensuring that compilation of the KVBM code is controlled by the user’s selection.
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit