[None][perf] AutoDeploy optimize _get_unique_value #8822

suyoggupta · 2025-10-31T02:07:19Z

Summary by CodeRabbit

Performance
- Optimized attention interface operations to improve efficiency and reduce memory overhead during value assignment calculations through streamlined lookup mechanisms.

Signed-off-by: Suyog Gupta <[email protected]>

coderabbitai · 2025-10-31T02:12:25Z

📝 Walkthrough

Walkthrough

Modified the _get_unique_value method in AttentionInterface to accept Sequence[int] instead of Set[int]. Implementation refactored to use linear scanning to find the smallest unoccupied value instead of materializing full range sets. Method signature, documentation, and all call sites updated accordingly.

Changes

Cohort / File(s)	Summary
Performance optimization for occupied value tracking `tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py`	Updated `_get_unique_value` method signature to accept `Sequence[int]` instead of `Set[int]`. Replaced set-based difference computation with linear scan to find smallest free value within [0, max_val). Added guard for out-of-range values. Updated all call sites to pass raw sequences (e.g., cache_loc, slot_idx) instead of converting to sets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–35 minutes

Algorithm correctness verification: the linear scan replacement for set-based diff logic requires careful validation to ensure it produces identical results across edge cases
Call site verification: confirm all invocations pass sequences in the expected format and that type conversions are removed properly
Performance impact assessment: validate that avoiding set materialization provides the intended optimization without regressing in other scenarios

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	⚠️ Warning	The pull request description is completely empty, with no content provided by the author. The repository's description template requires multiple critical sections including a detailed description explaining the issue and solution, a test coverage section listing relevant tests, and a PR checklist confirming alignment with coding guidelines and test requirements. While the PR objectives indicate a title exists ("[None][feat] AutoDeploy optimize _get_unique_value"), the actual description field contains no information to guide reviewers in understanding the motivation, implementation details, or testing approach for these code changes.	The author must provide a complete pull request description following the template structure. This should include: (1) a description explaining what the optimization does to _get_unique_value and why it improves the code, (2) a test coverage section identifying which tests validate these changes to the attention interface, and (3) confirmation of the PR checklist items to ensure adherence to coding guidelines and appropriate reviewer assignment. Given that this is a code optimization affecting public method signatures, clear documentation of the change rationale and testing approach is essential before merge.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title `[None][perf] AutoDeploy optimize _get_unique_value` is directly related to the main change in the pull request. The raw summary confirms that the changes optimize the `_get_unique_value` method by improving its signature from accepting `Set[int]` to `Sequence[int]` and replacing set-based logic with a more efficient linear scan. The title clearly conveys that this is a performance optimization to the `_get_unique_value` function in the AutoDeploy module, which is concise, specific, and accurately reflects the primary change without unnecessary verbosity.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3224cc and 1c02afb.

📒 Files selected for processing (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: thorjohnsen
Repo: NVIDIA/TensorRT-LLM PR: 6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

📚 Learning: 2025-08-14T21:04:50.248Z

Learnt from: thorjohnsen
Repo: NVIDIA/TensorRT-LLM PR: 6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

suyoggupta · 2025-10-31T02:24:36Z

/bot run

tensorrt-cicd · 2025-10-31T02:31:07Z

PR_Github #23107 [ run ] triggered by Bot. Commit: 0ba50a4

lucaslie

Great to see that this is sufficient.

I am still gonna keep #8631 open because it seems like we eventually we should do the right thing and just get that information from the cache manager

Signed-off-by: Suyog Gupta <[email protected]>

tensorrt-cicd · 2025-10-31T07:04:02Z

PR_Github #23107 [ run ] completed with state SUCCESS. Commit: 0ba50a4
/LLM/main/L0_MergeRequest_PR pipeline #17428 completed with status: 'SUCCESS'

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta · 2025-10-31T11:31:02Z

/bot reuse-pipeline

tensorrt-cicd · 2025-10-31T11:37:15Z

PR_Github #23192 [ reuse-pipeline ] triggered by Bot. Commit: b889d3e

tensorrt-cicd · 2025-10-31T11:57:08Z

PR_Github #23192 [ reuse-pipeline ] completed with state SUCCESS. Commit: b889d3e
Reusing PR_Github #23107 for commit b889d3e

Signed-off-by: Suyog Gupta <[email protected]> Signed-off-by: FredricZ-2007 <[email protected]>

Optimize _get_unique_value

1c02afb

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta requested a review from a team as a code owner October 31, 2025 02:07

suyoggupta requested a review from nvchenghaoz October 31, 2025 02:07

github-project-automation bot added this to AutoDeploy Board Oct 31, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board Oct 31, 2025

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py Show resolved Hide resolved

suyoggupta changed the title ~~[None][feat] AutoDeploy optimize _get_unique_value~~ [None][perf] AutoDeploy optimize _get_unique_value Oct 31, 2025

Merge branch 'main' into sg/reduce-cpu-overhead

0ba50a4

lucaslie approved these changes Oct 31, 2025

View reviewed changes

suyoggupta added 2 commits October 30, 2025 20:20

add nvtx range

f3a24ce

Signed-off-by: Suyog Gupta <[email protected]>

add nvtx range

bd418af

Signed-off-by: Suyog Gupta <[email protected]>

suyoggupta added 2 commits October 31, 2025 04:19

use set for O(1) lookups

aa21eb3

Signed-off-by: Suyog Gupta <[email protected]>

Merge branch 'main' into sg/reduce-cpu-overhead

b889d3e

suyoggupta enabled auto-merge (squash) October 31, 2025 11:45

suyoggupta merged commit 3d0e38e into NVIDIA:main Oct 31, 2025
6 of 7 checks passed

github-project-automation bot moved this from Backlog to Done in AutoDeploy Board Oct 31, 2025

fredricz-20070104 pushed a commit to fredricz-20070104/TensorRT-LLM that referenced this pull request Nov 5, 2025

[None][perf] AutoDeploy optimize _get_unique_value (NVIDIA#8822)

f691c6a

Signed-off-by: Suyog Gupta <[email protected]> Signed-off-by: FredricZ-2007 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][perf] AutoDeploy optimize _get_unique_value #8822

[None][perf] AutoDeploy optimize _get_unique_value #8822

Uh oh!

suyoggupta commented Oct 31, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 31, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

suyoggupta commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

lucaslie left a comment

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

suyoggupta commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[None][perf] AutoDeploy optimize _get_unique_value #8822

[None][perf] AutoDeploy optimize _get_unique_value #8822

Uh oh!

Conversation

suyoggupta commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

suyoggupta commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

suyoggupta commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

tensorrt-cicd commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

suyoggupta commented Oct 31, 2025 •

edited

Loading

coderabbitai bot commented Oct 31, 2025 •

edited

Loading