-
Notifications
You must be signed in to change notification settings - Fork 99
[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[GAUDISW-244752] add dynamic scale for V-Cache on Hiddden dim #749
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces dynamic scaling for the V-Cache (value cache) on the hidden dimension, extending the existing dynamic scaling support beyond just the sequence length (T) dimension. The change modifies the value cache scaling mechanism to use a tuple of two scale tensors instead of a single tensor.
Key Changes:
- Extended value cache scaling to support two dimensions: sequence length and hidden dimension
- Modified value_scales from a single tensor to a tuple of two tensors (value_scales_on_T, value_scales_on_hidden)
- Updated all related cache operations and type signatures to handle the new tuple structure
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_model_runner.py | Initializes two separate scale tensors for V-Cache and combines them into a tuple |
| vllm_gaudi/extension/ops.py | Updates unflatten operation to handle tuple structure for v_scales |
| vllm_gaudi/extension/cache_ops.py | Modifies copy_blocks to access first element of v_scales tuple |
| vllm_gaudi/attention/ops/hpu_paged_attn.py | Updates type hints to reflect tuple structure for value scales |
| vllm_gaudi/attention/backends/hpu_attn.py | Updates kv_cache type signature to reflect new tuple structure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a38f37c to
348722d
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
316fef7 to
500c8ba
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Dudi Lester <[email protected]>
Signed-off-by: Dudi Lester <[email protected]>
b695d45 to
86ee21a
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
1 similar comment
✅ CI PassedAll checks passed successfully against the following vllm commit: |
e16b4c6 to
4d88cd6
Compare
Signed-off-by: Dudi Lester <[email protected]>
d765839 to
b3aaad1
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
1 similar comment
✅ CI PassedAll checks passed successfully against the following vllm commit: |
2f61c6e to
686dba2
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Dudi Lester <[email protected]>
88e44f5 to
b02f2cc
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
No description provided.