[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim #749

dudilester · 2025-12-21T14:48:38Z

No description provided.

Copilot

Pull request overview

This PR introduces dynamic scaling for the V-Cache (value cache) on the hidden dimension, extending the existing dynamic scaling support beyond just the sequence length (T) dimension. The change modifies the value cache scaling mechanism to use a tuple of two scale tensors instead of a single tensor.

Key Changes:

Extended value cache scaling to support two dimensions: sequence length and hidden dimension
Modified value_scales from a single tensor to a tuple of two tensors (value_scales_on_T, value_scales_on_hidden)
Updated all related cache operations and type signatures to handle the new tuple structure

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Initializes two separate scale tensors for V-Cache and combines them into a tuple
vllm_gaudi/extension/ops.py	Updates unflatten operation to handle tuple structure for v_scales
vllm_gaudi/extension/cache_ops.py	Modifies copy_blocks to access first element of v_scales tuple
vllm_gaudi/attention/ops/hpu_paged_attn.py	Updates type hints to reflect tuple structure for value scales
vllm_gaudi/attention/backends/hpu_attn.py	Updates kv_cache type signature to reflect new tuple structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/extension/cache_ops.py

github-actions · 2025-12-30T06:39:53Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2025-12-30T06:55:45Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2025-12-30T09:05:11Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

Signed-off-by: Dudi Lester <[email protected]>

vllm_gaudi/attention/backends/hpu_attn.py

vllm_gaudi/extension/cache_ops.py

vllm_gaudi/extension/ops.py

Signed-off-by: Dudi Lester <[email protected]>

github-actions · 2026-01-01T08:54:35Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-01-01T10:21:54Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

github-actions · 2026-01-05T16:17:56Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

Signed-off-by: Dudi Lester <[email protected]>

github-actions · 2026-01-06T12:46:49Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

github-actions · 2026-01-07T18:19:45Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

github-actions · 2026-01-12T15:41:38Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-01-12T17:17:38Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
aa125ecf0edb9cd67656553d11d643aeb444ff9e

Signed-off-by: Dudi Lester <[email protected]>

github-actions · 2026-01-15T15:08:34Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-01-16T03:32:46Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
4c1c501a7ee1d5efbad945ea62a702ce5cefb799

Copilot AI review requested due to automatic review settings December 21, 2025 14:48

dudilester requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 21, 2025 14:48

Copilot AI reviewed Dec 21, 2025

View reviewed changes

vllm_gaudi/extension/cache_ops.py Outdated Show resolved Hide resolved

github-actions bot mentioned this pull request Dec 21, 2025

🚦 Team Review Dashboard #701

Open

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch 4 times, most recently from a38f37c to 348722d Compare December 30, 2025 06:39

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch from 316fef7 to 500c8ba Compare December 30, 2025 06:55

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim

db8e006

Signed-off-by: Dudi Lester <[email protected]>

linoybu reviewed Dec 31, 2025

View reviewed changes

vllm_gaudi/attention/backends/hpu_attn.py Outdated Show resolved Hide resolved

linoybu reviewed Dec 31, 2025

View reviewed changes

vllm_gaudi/extension/cache_ops.py Show resolved Hide resolved

linoybu reviewed Dec 31, 2025

View reviewed changes

vllm_gaudi/extension/ops.py Outdated Show resolved Hide resolved

Fix review comments

86ee21a

Signed-off-by: Dudi Lester <[email protected]>

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch from b695d45 to 86ee21a Compare January 1, 2026 08:54

linoybu approved these changes Jan 6, 2026

View reviewed changes

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch 2 times, most recently from e16b4c6 to 4d88cd6 Compare January 6, 2026 08:30

fix _create_dummy_decode_input_data to num_blocks value

b3aaad1

Signed-off-by: Dudi Lester <[email protected]>

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch from d765839 to b3aaad1 Compare January 6, 2026 09:02

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch from 2f61c6e to 686dba2 Compare January 12, 2026 15:41

Add block_size to VLLMKVCache forward call

b02f2cc

Signed-off-by: Dudi Lester <[email protected]>

dudilester force-pushed the dev/dudilester/dynamic_kv_on_h_dim branch from 88e44f5 to b02f2cc Compare January 15, 2026 15:08

Merge branch 'main' into dev/dudilester/dynamic_kv_on_h_dim

1d2ad8e

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim #749

Are you sure you want to change the base?

[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim #749

Conversation

dudilester commented Dec 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Dec 30, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Dec 30, 2025

✅ CI Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 1, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 1, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 5, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 6, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 7, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 12, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 12, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Jan 15, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 16, 2026

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants