Block matmul and kv_cache in dynamic quantization #673

HolyFalafel · 2025-12-03T12:06:37Z

Currently disabling matmul and kv_cache in dynamic quantization mode

Copilot

Pull request overview

This PR configures dynamic quantization to exclude specific matrix multiplication and key-value cache operations from quantization. The changes prevent quantization of attention mechanism components that may be sensitive to quantization effects.

Key Changes:

Added a blocklist to the quantization configuration
Excluded Matmul and KVCache operation types from quantization
Excluded specific attention-related operations (matmul_qk, matmul_av, k_cache, v_cache) by name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2025-12-03T14:52:48Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
3a7751485b71ce5ef927e4aa03b28602cb90811c

xuechendi · 2025-12-03T16:08:23Z

Is it for performance reason?

HolyFalafel · 2025-12-04T10:19:43Z

Is it for performance reason?

No, dynamic quantization for KVCache+Matmul is a new feature that we don't want to enable by default

github-actions · 2025-12-04T13:26:07Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
899e2ef558e7345b99bc0d53c2e1c60ffdca7470

Signed-off-by: Danny Semiat <[email protected]>

github-actions · 2025-12-08T09:38:27Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Block matmul and kv_cache in dynamic quantization

69866c8

Copilot AI review requested due to automatic review settings December 3, 2025 12:06

HolyFalafel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 3, 2025 12:06

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Merge branch 'main' into dev/dsemiat/block_matmul_kv_cache_in_dynamic

f7673a7

Update maxabs_quant_dynamic_quantization.json

817de14

Signed-off-by: Danny Semiat <[email protected]>

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block matmul and kv_cache in dynamic quantization #673

Block matmul and kv_cache in dynamic quantization #673

HolyFalafel commented Dec 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

xuechendi commented Dec 3, 2025

Uh oh!

HolyFalafel commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Block matmul and kv_cache in dynamic quantization #673

Are you sure you want to change the base?

Block matmul and kv_cache in dynamic quantization #673

Conversation

HolyFalafel commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions bot commented Dec 3, 2025

✅ CI Passed

Uh oh!

xuechendi commented Dec 3, 2025

Uh oh!

HolyFalafel commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Dec 8, 2025

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HolyFalafel commented Dec 3, 2025 •

edited

Loading