[CT] Fix CT Config to honor `fp8_inc` KV cache dtype #929

yiliu30 · 2026-02-04T11:16:12Z

Adapt the update in vllm-project/vllm#30141

        # llm-compressor mdls need to set cache_dtype to "fp8" manually.
        if getattr(quant_config, "kv_cache_scheme", None) is not None:
            kv_cache_dtype = "fp8"
            calculate_kv_scales = False
            if cache_config is not None:
                cache_config.cache_dtype = "fp8"
                cache_config.calculate_kv_scales = False

        self.kv_cache_torch_dtype = kv_cache_dtype_str_to_dtype(
            kv_cache_dtype, vllm_config.model_config
        )
        self.kv_cache_dtype = kv_cache_dtype

cc @hshen14 @thuang6 @lkk12014402

Signed-off-by: yiliu30 <[email protected]>

Copilot

Pull request overview

This PR fixes a configuration issue in the Compressed Tensors implementation for HPU (Habana Processing Unit) to properly handle the fp8_inc KV cache dtype instead of the default fp8 format.

Changes:

Added a custom __init__ method to HPUCompressedTensorsConfig that overrides KV cache settings after parent initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/ops/hpu_compressed_tensors.py

yiliu30 added 2 commits February 4, 2026 11:09

Revert LLMC override

60223ea

Signed-off-by: yiliu30 <[email protected]>

fix

329c363

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from xuechendi as a code owner February 4, 2026 11:16

Copilot AI review requested due to automatic review settings February 4, 2026 11:16

yiliu30 requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel and michalkuligowski as code owners February 4, 2026 11:16

Copilot AI reviewed Feb 4, 2026

View reviewed changes

vllm_gaudi/ops/hpu_compressed_tensors.py Outdated Show resolved Hide resolved

vllm_gaudi/ops/hpu_compressed_tensors.py Show resolved Hide resolved

yiliu30 mentioned this pull request Feb 4, 2026

[CT] Add FP8 GQA Support #874

Open

github-actions bot mentioned this pull request Feb 4, 2026

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype #929

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype #929

yiliu30 commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[CT] Fix CT Config to honor fp8_inc KV cache dtype #929

Are you sure you want to change the base?

[CT] Fix CT Config to honor fp8_inc KV cache dtype #929

Conversation

yiliu30 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype #929

[CT] Fix CT Config to honor `fp8_inc` KV cache dtype #929

yiliu30 commented Feb 4, 2026 •

edited

Loading