fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979

richardhuo-nv · 2025-09-10T01:18:17Z

Overview:

Update the KVBM–TRT-LLM integration interface to match the latest TRT-LLM connector API.
Fix the issue where the KVBM feature is not enabled (or is disabled by option) in the TRT-LLM Docker image build.

Details:

Update the KVBM–TRT-LLM integration interface
TRT-LLM recently started deprecating ExecutorConfig, replacing it with TorchLlmArgs. The corresponding change has been merged in the TRT-LLM repository: PR #7493.
Fix KVBM feature enablement in the TRT-LLM Docker image build
This is a bug fix. The Dynamo TRT-LLM image was not using the Dynamo base image, so the ENABLE_KVBM option was missing from the wheel build step. The fix adds this option, ensuring that compilation of the KVBM code is controlled by the user’s selection.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added optional flag to enable KVBM at build time; default remains off.
Chores
- Switched wheel builds to maturin with conditional feature gating for Python 3.10/3.11.
- Deferred Git LFS downloads during clone to reduce rate-limit issues.
Refactor
- Updated Python KVBM connectors to initialize with unified LLM arguments; constructor signatures changed accordingly.
Documentation
- Refreshed KVBM quick start to reference the latest integration commit.

copy-pr-bot · 2025-09-10T01:18:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-10T01:21:34Z

Walkthrough

Adds an ENABLE_KVBM build flag to Docker and wheel build paths; switches Python wheel building to maturin with conditional features; updates shell script to defer Git LFS via GIT_LFS_SKIP_SMUDGE; updates doc commit reference; and refactors KVBM Python connectors to initialize from TorchLlmArgs instead of ExecutorConfig.

Changes

Cohort / File(s)	Summary of changes
Docker build & wheel pipeline `container/Dockerfile.trtllm`	Adds ARG ENABLE_KVBM (default false); propagates to wheel_builder stage; switches wheel builds to uv run maturin (with maturin[patchelf]); conditionally enables Rust feature `block-manager` when ENABLE_KVBM=true; updates cargo feature from `dynamo-llm/block-manager` to `block-manager`; gates release builds for Python 3.10/3.11 via RELEASE_BUILD.
Wheel helper script `container/build_trtllm_wheel.sh`	Sets `GIT_LFS_SKIP_SMUDGE=1` for clone to defer LFS downloads; later LFS pull remains. No other flow changes.
TRT-LLM KVBM connectors (Python) `lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_*.py`	Constructor API change: `ExecutorConfig` → `TorchLlmArgs`; derive rank/world_size/block_size from `llm_args` (parallel_config and kv_cache_config); update logging and Rust connector instantiation accordingly.
Docs `docs/guides/run_kvbm_in_trtllm.md`	Updates KVBM integration commit hash in quick start reference and example command.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Image Builder
  participant Docker as Dockerfile.trtllm
  participant Wheel as manylinux wheel_builder
  participant Cargo as Rust (block-manager)
  participant Maturin as maturin

  Dev->>Docker: docker build --build-arg ENABLE_KVBM={true|false}
  Docker->>Wheel: Build stage (ARG ENABLE_KVBM)
  alt ENABLE_KVBM=true
    Wheel->>Cargo: cargo build --features block-manager
    Wheel->>Maturin: uv run maturin build --features block-manager
  else ENABLE_KVBM=false
    Wheel->>Cargo: cargo build (default features)
    Wheel->>Maturin: uv run maturin build (default)
  end
  note over Wheel,Maturin: Release wheels gated by RELEASE_BUILD for py3.10/3.11

sequenceDiagram
  autonumber
  participant App as TRT-LLM App
  participant Leader as DynamoKVBMConnectorLeader
  participant Worker as DynamoKVBMConnectorWorker
  participant Args as TorchLlmArgs
  participant Rust as RustKvConnector

  App->>Leader: new(..., llm_args: TorchLlmArgs)
  Leader->>Args: parallel_config.to_mapping()
  Leader->>Args: kv_cache_config.tokens_per_block
  Leader->>Rust: init(rank, world_size, block_size)

  App->>Worker: new(..., llm_args: TorchLlmArgs)
  Worker->>Args: derive rank, tokens_per_block
  Worker->>Rust: init(rank, block_size)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

build: enable kvbm in vllm container #2763 — Also introduces/propagates ENABLE_KVBM in container builds; likely adjacent to this flag wiring.
docs: add GPT-OSS deployment guide #2297 — Modifies container/build_trtllm_wheel.sh; potential interaction with LFS clone behavior.
build: Add TensorRT-LLM to optional dependency and corresponding instructions #2113 — Adjusts TensorRT-LLM Docker/wheel packaging; related to the maturin/packaging flow changes here.

Poem

I hop through Docker layers light,
Flip KVBM on—feature in sight.
Maturin spins the wheels just right,
LFS waits till later bite.
Args now guide the rank’s first flight—
Connectors click, blocks snug and tight.
Thump! Release rolls into the night.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	⚠️ Warning	The pull request description follows the repository’s template structure by including Overview, Details, Where should the reviewer start, and Related Issues sections, but the “Where should the reviewer start?” section is empty and the Related Issues entry still uses a placeholder issue number (“#xxx”) instead of referencing a real issue. These omissions leave reviewers without guidance on which files to inspect and which issue the PR addresses.	Please populate the “Where should the reviewer start?” section with the specific files or code areas that need close review and replace the “#xxx” placeholder in Related Issues with the actual issue number(s) this PR closes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title succinctly describes the primary change of updating the KVBM–TRT-LLM integration interface to match the latest connector API and follows conventional commit style with the “fix:” prefix. It directly reflects the main code modifications without listing every ancillary tweak, making it clear and relevant for someone scanning the repository history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/guides/run_kvbm_in_trtllm.md (1)

20-20: Fix user-facing typos and clarity issues.

These are in customer docs and should be clean.

Apply:

-This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm).
+This guide explains how to leverage KVBM (KV Block Manager) to manage the KV cache and perform KV offloading in TensorRT-LLM (TRT-LLM).

-> [!Note]
+> [!Note]
 > - Ensure that `etcd` and `nats` are running before starting.
 > - KVBM does not currently support CUDA graphs in TensorRT-LLM.
-> - KVBM only supports TensorRT-LLM’s PyTorch backend.
+> - KVBM currently supports only TensorRT-LLM’s PyTorch backend.
-> - To enable disk cache offloading, you must first enable a CPU memory cache offloading.
+> - To enable disk cache offloading, you must first enable CPU memory cache offloading.
-> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
+> - Disable partial reuse (`enable_partial_reuse: false`) in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
 > - KVBM requires TensorRT-LLM at commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer.
 > - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.

-... hinting at ests that Aeloria holds ... lost familt clue is hidden.
+... hinting that Aeloria holds ... lost family clue is hidden.

Also applies to: 24-31, 91-91

container/build_trtllm_wheel.sh (1)

56-73: Harden LFS handling and optional commit checkout.

If git-lfs isn’t installed on the runner, the script will fail at git lfs pull.
git checkout $TRTLLM_COMMIT with an empty var will fail. Make both robust.

Apply:

 (cd /tmp && \
 # Clone the TensorRT-LLM repository.
 if [ ! -d "TensorRT-LLM" ]; then
   git clone "${TRTLLM_GIT_URL}"
 fi

 cd TensorRT-LLM

-# Checkout the specified commit.
-# Switch to the main branch to pull the latest changes.
-git checkout main
-git pull
-git checkout $TRTLLM_COMMIT
+# Checkout the specified commit if provided, otherwise stay on the default branch.
+git checkout main
+git pull --ff-only
+if [ -n "${TRTLLM_COMMIT:-}" ]; then
+  git checkout "$TRTLLM_COMMIT"
+fi

 # Update the submodules.
 git submodule update --init --recursive
-git lfs pull
+if command -v git-lfs >/dev/null 2>&1; then
+  git lfs pull
+else
+  echo "Warning: git-lfs not found; large files will not be pulled. Continuing..." >&2
+fi

Also consider adding set -o pipefail -u at the top for stricter error handling.

🧹 Nitpick comments (4)

lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)
36-39: Use TRT-LLM logger instead of print; tighten comment.

Replace stdout prints in library code with the standard logger. Also the comment about bytes_per_block reads as if you set it, but the code doesn’t; clarify.

Apply:
+from tensorrt_llm import logger
@@
-        print(f"KvConnectorLeader initialized with rank: {mappings.rank}")
+        logger.info(f"KvConnectorLeader initialized with rank: {mappings.rank}")
-        # Set bytes_per_block to 0, because we will retrieve the actual value from the worker side.
+        # bytes_per_block is determined on the worker side; leader does not require it here.
Also applies to: 5-13

128-131: Spelling nit: “critical”.

Apply:
-        # extract the critial aspects of the request that effect how the tokens are hashed
+        # Extract the critical aspects of the request that affect how the tokens are hashed
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)
33-36: Drop stdout prints; rely on logger.

Keep logs consistent with TRT-LLM’s logger.

Apply:
-        print(f"Register KV Caches on rank {self.rank}")
-        logger.info(
-            f"KvConnectorWorker started registering the kv caches on rank {self.rank}"
-        )
+        logger.info(f"Registering KV caches on rank {self.rank}")
38-41: Explicit type for page size (minor).

Small readability boost.

Apply:
-        page_size = self._llm_args.kv_cache_config.tokens_per_block
+        page_size: int = self._llm_args.kv_cache_config.tokens_per_block

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae38bd4 and 78a1cdc.

📒 Files selected for processing (5)

container/Dockerfile.trtllm (3 hunks)
container/build_trtllm_wheel.sh (1 hunks)
docs/guides/run_kvbm_in_trtllm.md (1 hunks)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2 hunks)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)

lib/bindings/python/src/dynamo/_core.pyi (1)

DistributedRuntime (31-54)

lib/bindings/python/rust/lib.rs (1)

detached (348-360)

lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)

lib/bindings/python/src/dynamo/_core.pyi (1)

DistributedRuntime (31-54)

lib/bindings/python/rust/lib.rs (1)

detached (348-360)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

docs/guides/run_kvbm_in_trtllm.md (1)

41-42: Unify the minimum TRT-LLM commit
The Note (L30) references commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6, while the Quick Start (L41-42) uses dcd110cfac07e577ce01343c455917832b0f3d5e. Pick the correct baseline commit and use “or newer” consistently across the doc.
Please confirm whether dcd110cfac07e577ce01343c455917832b0f3d5e is the intended commit that introduces the TorchLlmArgs and connector API changes.

container/build_trtllm_wheel.sh (1)

43-47: Good call: deferring Git LFS with GIT_LFS_SKIP_SMUDGE.

This avoids rate limits and interactive prompts during clone. Consider also guarding for git-lfs availability (see next comment).

container/Dockerfile.trtllm

ziqifan617

LGTM

richardhuo-nv · 2025-09-10T18:28:22Z

/ok to test b5c6865

richardhuo-nv · 2025-09-10T18:31:55Z

/ok to test b5c6865

richardhuo-nv · 2025-09-10T18:34:05Z

/ok to test b5c6865

docs/guides/run_kvbm_in_trtllm.md

richardhuo-nv · 2025-09-15T22:50:35Z

/ok to test b00cf90

richardhuo-nv · 2025-09-16T16:41:39Z

/ok to test b00cf90

richardhuo-nv · 2025-09-16T23:37:32Z

/ok to test fba11c0

Signed-off-by: richardhuo-nv <[email protected]>

fix fix optional kvbm build Signed-off-by: richardhuo-nv <[email protected]>

Signed-off-by: richardhuo-nv <[email protected]>

richardhuo-nv · 2025-09-17T15:41:02Z

/ok to test c87f54a

…test TRT-LLM connector API (#2979) Signed-off-by: richardhuo-nv <[email protected]>

richardhuo-nv requested review from ryanolson, oandreeva-nv and ziqifan617 September 10, 2025 01:18

richardhuo-nv requested review from rmccorm4, tanmayv25, ptarasiewiczNV, ishandhanani, alec-flowers, nnshah1 and a team as code owners September 10, 2025 01:18

pull-request-size bot added the size/M label Sep 10, 2025

github-actions bot added the fix label Sep 10, 2025

richardhuo-nv force-pushed the rihuo/update_llm_args branch from 5fed545 to 78a1cdc Compare September 10, 2025 01:19

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

container/Dockerfile.trtllm Show resolved Hide resolved

container/Dockerfile.trtllm Show resolved Hide resolved

richardhuo-nv force-pushed the rihuo/update_llm_args branch from 60d0816 to 6e63aed Compare September 10, 2025 01:36

ziqifan617 approved these changes Sep 10, 2025

View reviewed changes

richardhuo-nv force-pushed the rihuo/update_llm_args branch from 8e492ea to 4432848 Compare September 10, 2025 16:45

ziqifan617 approved these changes Sep 10, 2025

View reviewed changes

richardhuo-nv force-pushed the rihuo/update_llm_args branch from b5c6865 to 4dbb6f1 Compare September 10, 2025 18:31

richardhuo-nv force-pushed the rihuo/update_llm_args branch from 4dbb6f1 to 93aca5f Compare September 10, 2025 18:33

tanmayv25 reviewed Sep 15, 2025

View reviewed changes

docs/guides/run_kvbm_in_trtllm.md Outdated Show resolved Hide resolved

tanmayv25 approved these changes Sep 15, 2025

View reviewed changes

richardhuo-nv force-pushed the rihuo/update_llm_args branch 2 times, most recently from 73910e5 to fff33e6 Compare September 15, 2025 19:17

richardhuo-nv force-pushed the rihuo/update_llm_args branch from ecb7dd3 to b00cf90 Compare September 15, 2025 22:49

copy-pr-bot bot temporarily deployed to GITLAB September 15, 2025 22:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:43 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 16:45 Inactive

richardhuo-nv force-pushed the rihuo/update_llm_args branch 2 times, most recently from d973eaa to fba11c0 Compare September 16, 2025 23:36

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 23:36 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 23:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 03:09 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 03:16 Inactive

richardhuo-nv force-pushed the rihuo/update_llm_args branch from fba11c0 to 2c34a51 Compare September 17, 2025 15:39

richardhuo-nv added 8 commits September 17, 2025 08:40

change to integrate with llm_args

08a14cb

Signed-off-by: richardhuo-nv <[email protected]>

change to integrate with llm_args

d9a8270

Signed-off-by: richardhuo-nv <[email protected]>

update build

32e4457

fix fix optional kvbm build Signed-off-by: richardhuo-nv <[email protected]>

fix commit build

c938d56

Signed-off-by: richardhuo-nv <[email protected]>

revert LFS change

62a8224

Signed-off-by: richardhuo-nv <[email protected]>

add description for workaround the github username and password

4b67873

Signed-off-by: richardhuo-nv <[email protected]>

add GIT_LFS_SKIP_SMUDGE=1

135e33d

Signed-off-by: richardhuo-nv <[email protected]>

link to trtllm README

c87f54a

Signed-off-by: richardhuo-nv <[email protected]>

richardhuo-nv force-pushed the rihuo/update_llm_args branch from 2c34a51 to c87f54a Compare September 17, 2025 15:40

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 15:40 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 17, 2025 15:43 Inactive

richardhuo-nv merged commit 86ce03f into main Sep 17, 2025
16 of 17 checks passed

richardhuo-nv deleted the rihuo/update_llm_args branch September 17, 2025 22:33

nv-tusharma pushed a commit that referenced this pull request Sep 17, 2025

fix: Update the KVBM <> TRT-LLM integration interface to match the la…

d276064

…test TRT-LLM connector API (#2979) Signed-off-by: richardhuo-nv <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979

fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979

Uh oh!

richardhuo-nv commented Sep 10, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

ziqifan617 left a comment

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

Uh oh!

richardhuo-nv commented Sep 15, 2025

Uh oh!

richardhuo-nv commented Sep 16, 2025

Uh oh!

richardhuo-nv commented Sep 16, 2025

Uh oh!

richardhuo-nv commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979

fix: Update the KVBM <> TRT-LLM integration interface to match the latest TRT-LLM connector API #2979

Uh oh!

Conversation

richardhuo-nv commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks (1 passed, 2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ziqifan617 left a comment

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

richardhuo-nv commented Sep 10, 2025

Uh oh!

Uh oh!

richardhuo-nv commented Sep 15, 2025

Uh oh!

richardhuo-nv commented Sep 16, 2025

Uh oh!

richardhuo-nv commented Sep 16, 2025

Uh oh!

richardhuo-nv commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

richardhuo-nv commented Sep 10, 2025 •

edited

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading