Skip to content

Conversation

richardhuo-nv
Copy link
Contributor

@richardhuo-nv richardhuo-nv commented Sep 10, 2025

Overview:

  1. Update the KVBM–TRT-LLM integration interface to match the latest TRT-LLM connector API.
  2. Fix the issue where the KVBM feature is not enabled (or is disabled by option) in the TRT-LLM Docker image build.

Details:

  1. Update the KVBM–TRT-LLM integration interface
    TRT-LLM recently started deprecating ExecutorConfig, replacing it with TorchLlmArgs. The corresponding change has been merged in the TRT-LLM repository: PR #7493.

  2. Fix KVBM feature enablement in the TRT-LLM Docker image build
    This is a bug fix. The Dynamo TRT-LLM image was not using the Dynamo base image, so the ENABLE_KVBM option was missing from the wheel build step. The fix adds this option, ensuring that compilation of the KVBM code is controlled by the user’s selection.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added optional flag to enable KVBM at build time; default remains off.
  • Chores
    • Switched wheel builds to maturin with conditional feature gating for Python 3.10/3.11.
    • Deferred Git LFS downloads during clone to reduce rate-limit issues.
  • Refactor
    • Updated Python KVBM connectors to initialize with unified LLM arguments; constructor signatures changed accordingly.
  • Documentation
    • Refreshed KVBM quick start to reference the latest integration commit.

Copy link

copy-pr-bot bot commented Sep 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adds an ENABLE_KVBM build flag to Docker and wheel build paths; switches Python wheel building to maturin with conditional features; updates shell script to defer Git LFS via GIT_LFS_SKIP_SMUDGE; updates doc commit reference; and refactors KVBM Python connectors to initialize from TorchLlmArgs instead of ExecutorConfig.

Changes

Cohort / File(s) Summary of changes
Docker build & wheel pipeline
container/Dockerfile.trtllm
Adds ARG ENABLE_KVBM (default false); propagates to wheel_builder stage; switches wheel builds to uv run maturin (with maturin[patchelf]); conditionally enables Rust feature block-manager when ENABLE_KVBM=true; updates cargo feature from dynamo-llm/block-manager to block-manager; gates release builds for Python 3.10/3.11 via RELEASE_BUILD.
Wheel helper script
container/build_trtllm_wheel.sh
Sets GIT_LFS_SKIP_SMUDGE=1 for clone to defer LFS downloads; later LFS pull remains. No other flow changes.
TRT-LLM KVBM connectors (Python)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_*.py
Constructor API change: ExecutorConfigTorchLlmArgs; derive rank/world_size/block_size from llm_args (parallel_config and kv_cache_config); update logging and Rust connector instantiation accordingly.
Docs
docs/guides/run_kvbm_in_trtllm.md
Updates KVBM integration commit hash in quick start reference and example command.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Image Builder
  participant Docker as Dockerfile.trtllm
  participant Wheel as manylinux wheel_builder
  participant Cargo as Rust (block-manager)
  participant Maturin as maturin

  Dev->>Docker: docker build --build-arg ENABLE_KVBM={true|false}
  Docker->>Wheel: Build stage (ARG ENABLE_KVBM)
  alt ENABLE_KVBM=true
    Wheel->>Cargo: cargo build --features block-manager
    Wheel->>Maturin: uv run maturin build --features block-manager
  else ENABLE_KVBM=false
    Wheel->>Cargo: cargo build (default features)
    Wheel->>Maturin: uv run maturin build (default)
  end
  note over Wheel,Maturin: Release wheels gated by RELEASE_BUILD for py3.10/3.11
Loading
sequenceDiagram
  autonumber
  participant App as TRT-LLM App
  participant Leader as DynamoKVBMConnectorLeader
  participant Worker as DynamoKVBMConnectorWorker
  participant Args as TorchLlmArgs
  participant Rust as RustKvConnector

  App->>Leader: new(..., llm_args: TorchLlmArgs)
  Leader->>Args: parallel_config.to_mapping()
  Leader->>Args: kv_cache_config.tokens_per_block
  Leader->>Rust: init(rank, world_size, block_size)

  App->>Worker: new(..., llm_args: TorchLlmArgs)
  Worker->>Args: derive rank, tokens_per_block
  Worker->>Rust: init(rank, block_size)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I hop through Docker layers light,
Flip KVBM on—feature in sight.
Maturin spins the wheels just right,
LFS waits till later bite.
Args now guide the rank’s first flight—
Connectors click, blocks snug and tight.
Thump! Release rolls into the night.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description follows the repository’s template structure by including Overview, Details, Where should the reviewer start, and Related Issues sections, but the “Where should the reviewer start?” section is empty and the Related Issues entry still uses a placeholder issue number (“#xxx”) instead of referencing a real issue. These omissions leave reviewers without guidance on which files to inspect and which issue the PR addresses. Please populate the “Where should the reviewer start?” section with the specific files or code areas that need close review and replace the “#xxx” placeholder in Related Issues with the actual issue number(s) this PR closes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title succinctly describes the primary change of updating the KVBM–TRT-LLM integration interface to match the latest connector API and follows conventional commit style with the “fix:” prefix. It directly reflects the main code modifications without listing every ancillary tweak, making it clear and relevant for someone scanning the repository history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/guides/run_kvbm_in_trtllm.md (1)

20-20: Fix user-facing typos and clarity issues.

These are in customer docs and should be clean.

Apply:

-This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm).
+This guide explains how to leverage KVBM (KV Block Manager) to manage the KV cache and perform KV offloading in TensorRT-LLM (TRT-LLM).

-> [!Note]
+> [!Note]
 > - Ensure that `etcd` and `nats` are running before starting.
 > - KVBM does not currently support CUDA graphs in TensorRT-LLM.
-> - KVBM only supports TensorRT-LLM’s PyTorch backend.
+> - KVBM currently supports only TensorRT-LLM’s PyTorch backend.
-> - To enable disk cache offloading, you must first enable a CPU memory cache offloading.
+> - To enable disk cache offloading, you must first enable CPU memory cache offloading.
-> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
+> - Disable partial reuse (`enable_partial_reuse: false`) in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
 > - KVBM requires TensorRT-LLM at commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer.
 > - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.
-... hinting at ests that Aeloria holds ... lost familt clue is hidden.
+... hinting that Aeloria holds ... lost family clue is hidden.

Also applies to: 24-31, 91-91

container/build_trtllm_wheel.sh (1)

56-73: Harden LFS handling and optional commit checkout.

  • If git-lfs isn’t installed on the runner, the script will fail at git lfs pull.
  • git checkout $TRTLLM_COMMIT with an empty var will fail. Make both robust.

Apply:

 (cd /tmp && \
 # Clone the TensorRT-LLM repository.
 if [ ! -d "TensorRT-LLM" ]; then
   git clone "${TRTLLM_GIT_URL}"
 fi

 cd TensorRT-LLM

-# Checkout the specified commit.
-# Switch to the main branch to pull the latest changes.
-git checkout main
-git pull
-git checkout $TRTLLM_COMMIT
+# Checkout the specified commit if provided, otherwise stay on the default branch.
+git checkout main
+git pull --ff-only
+if [ -n "${TRTLLM_COMMIT:-}" ]; then
+  git checkout "$TRTLLM_COMMIT"
+fi

 # Update the submodules.
 git submodule update --init --recursive
-git lfs pull
+if command -v git-lfs >/dev/null 2>&1; then
+  git lfs pull
+else
+  echo "Warning: git-lfs not found; large files will not be pulled. Continuing..." >&2
+fi

Also consider adding set -o pipefail -u at the top for stricter error handling.

🧹 Nitpick comments (4)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)

36-39: Use TRT-LLM logger instead of print; tighten comment.

Replace stdout prints in library code with the standard logger. Also the comment about bytes_per_block reads as if you set it, but the code doesn’t; clarify.

Apply:

+from tensorrt_llm import logger
@@
-        print(f"KvConnectorLeader initialized with rank: {mappings.rank}")
+        logger.info(f"KvConnectorLeader initialized with rank: {mappings.rank}")
-        # Set bytes_per_block to 0, because we will retrieve the actual value from the worker side.
+        # bytes_per_block is determined on the worker side; leader does not require it here.

Also applies to: 5-13


128-131: Spelling nit: “critical”.

Apply:

-        # extract the critial aspects of the request that effect how the tokens are hashed
+        # Extract the critical aspects of the request that affect how the tokens are hashed
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)

33-36: Drop stdout prints; rely on logger.

Keep logs consistent with TRT-LLM’s logger.

Apply:

-        print(f"Register KV Caches on rank {self.rank}")
-        logger.info(
-            f"KvConnectorWorker started registering the kv caches on rank {self.rank}"
-        )
+        logger.info(f"Registering KV caches on rank {self.rank}")

38-41: Explicit type for page size (minor).

Small readability boost.

Apply:

-        page_size = self._llm_args.kv_cache_config.tokens_per_block
+        page_size: int = self._llm_args.kv_cache_config.tokens_per_block
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae38bd4 and 78a1cdc.

📒 Files selected for processing (5)
  • container/Dockerfile.trtllm (3 hunks)
  • container/build_trtllm_wheel.sh (1 hunks)
  • docs/guides/run_kvbm_in_trtllm.md (1 hunks)
  • lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2 hunks)
  • lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_leader.py (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (31-54)
lib/bindings/python/rust/lib.rs (1)
  • detached (348-360)
lib/bindings/python/src/dynamo/llm/trtllm_integration/connector/kvbm_connector_worker.py (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (31-54)
lib/bindings/python/rust/lib.rs (1)
  • detached (348-360)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
docs/guides/run_kvbm_in_trtllm.md (1)

41-42: Unify the minimum TRT-LLM commit
The Note (L30) references commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6, while the Quick Start (L41-42) uses dcd110cfac07e577ce01343c455917832b0f3d5e. Pick the correct baseline commit and use “or newer” consistently across the doc.
Please confirm whether dcd110cfac07e577ce01343c455917832b0f3d5e is the intended commit that introduces the TorchLlmArgs and connector API changes.

container/build_trtllm_wheel.sh (1)

43-47: Good call: deferring Git LFS with GIT_LFS_SKIP_SMUDGE.

This avoids rate limits and interactive prompts during clone. Consider also guarding for git-lfs availability (see next comment).

Copy link
Contributor

@ziqifan617 ziqifan617 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardhuo-nv
Copy link
Contributor Author

/ok to test b5c6865

@richardhuo-nv
Copy link
Contributor Author

/ok to test b5c6865

@richardhuo-nv
Copy link
Contributor Author

/ok to test b5c6865

@richardhuo-nv richardhuo-nv force-pushed the rihuo/update_llm_args branch 2 times, most recently from 73910e5 to fff33e6 Compare September 15, 2025 19:17
@richardhuo-nv
Copy link
Contributor Author

/ok to test b00cf90

@richardhuo-nv
Copy link
Contributor Author

/ok to test b00cf90

@richardhuo-nv
Copy link
Contributor Author

/ok to test fba11c0

fix

fix optional kvbm build

Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
@richardhuo-nv
Copy link
Contributor Author

/ok to test c87f54a

@richardhuo-nv richardhuo-nv merged commit 86ce03f into main Sep 17, 2025
16 of 17 checks passed
@richardhuo-nv richardhuo-nv deleted the rihuo/update_llm_args branch September 17, 2025 22:33
nv-tusharma pushed a commit that referenced this pull request Sep 17, 2025
…test TRT-LLM connector API (#2979)

Signed-off-by: richardhuo-nv <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants