Add ovis models support with default buckets #846

testdig · 2026-01-21T02:35:08Z

No description provided.

Signed-off-by: Wang, Zheng W <[email protected]>

github-actions · 2026-01-21T11:32:29Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
6218034dd7f9a56596e4fd8c8c8fc1d8011ed9c2

michalkuligowski · 2026-01-28T11:54:31Z

vllm_gaudi/models/ovis.py

@@ -0,0 +1,12 @@
+from vllm.config import VllmConfig


Copyright header missing

updated, thanks for the review

github-actions · 2026-01-29T01:21:00Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Wang, Zheng W <[email protected]>

Adds Qwen3 model test case for image Signed-off-by: slokesha <[email protected]> Co-authored-by: Iryna Boiko <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

…ect#837) Signed-off-by: linoy buchnik <[email protected]> Signed-off-by: Iryna Boiko <[email protected]> Co-authored-by: Iryna Boiko <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Adds support for cross-layer KV cache sharing on HPU, enabling models like Gemma-3n that share KV cache between layers to run on Gaudi. **Changes** - hpu_attn.py: Store kv_sharing_target_layer_name and skip KV cache writes for sharing layers - hpu_model_runner.py: Track shared layers, validate config, and set up tensor sharing during initialization - test_hpu_model_runner.py: Enable KV sharing unit tests **Expected Benefits** Reduced KV cache memory usage for models with layer sharing Lower TTFT for long-context scenarios in supported models (e.g., Gemma-3n) **Testing** Unit tests pass E2E validation with a KV-sharing model (e.g., Gemma-3n) pending --------- Signed-off-by: jakub-sochacki <[email protected]> Co-authored-by: jakub-sochacki <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Signed-off-by: Iryna Boiko <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

## Motivation Qwen2.5-VL models have lower accuracy than expected, and this accuracy regressed due to PR vllm-project#698 (commit 18105cc on main). This PR introduces too changes to boost accuracy on Qwen2.5-VL-7B-Instruct on MMMU dataset from ~42% to 51%. The accuracy matches that seen on GPU version of vLLM (build 0.13.0) under similar test conditions. ## Changes - First change is a fix for the regression. The attn_mask was not being used in HPUQwen2_5_VisionBlock. - The second change is enabling fp32_softmax for qwen2_5_vl models. --------- Signed-off-by: Tanner Voas <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Signed-off-by: Milosz Grunwald <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

…llm-project#855) Llama4 for `max_model_len > 32k` enable temperature adjustment https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L719. Enabled adjustment causes tensor `q` shape modification from 2D to 3D: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L307. This tensor is passing to `UnqnatizedFusedMoEMetod -> forward`: https://github.com/vllm-project/vllm-gaudi/blob/main/vllm_gaudi/ops/hpu_fused_moe.py#L163 causing invalid reshaping - we trying to return a 3D `output.view` based on 2D output tensor. Found that following PR introduced the bug: vllm-project#680 and vllm-project#684 Cherry-picked from `releases/v0.13.0` --------- Signed-off-by: Artur Fierka <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

…vllm-project#852) Signed-off-by: Dudi Lester <[email protected]> Co-authored-by: Kamil Kaczor <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Reverts vllm-project#780 --------- Signed-off-by: Agata Dobrzyniewicz <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size. Signed-off-by: Jakub Byczkowski <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

…ject#785) Further experiments on top of vllm-project#784 - I wanted to check if we can avoid some OOMs by performing FlashAttention rescaling online rather than after computing all the parts - should save us memory on some intermediate buffers. Accuracy is surprisingly okay-ish, but I haven't tested this too thouroughly. --------- Signed-off-by: Konrad Zawora <[email protected]> Co-authored-by: Agata Dobrzyniewicz <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

…-project#867) 1. update example to support prefill HND and agreed_block_size 2. enable prefill side kv_layout and block_size update Port vllm-project/vllm#30448 to vllm-gaudi --------- Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Yeonsil Yoon <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

github-actions · 2026-01-29T01:34:48Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

testdig · 2026-01-29T01:50:04Z

I don't think my PR breaks the CI, anyone help here?

github-actions · 2026-01-29T09:52:19Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
6218034dd7f9a56596e4fd8c8c8fc1d8011ed9c2

Signed-off-by: Wang, Zheng W <[email protected]>

github-actions · 2026-02-02T08:32:40Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-02-04T11:07:15Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

testdig requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners January 21, 2026 02:35

github-actions bot mentioned this pull request Jan 21, 2026

🚦 Team Review Dashboard #701

Open

testdig added 2 commits January 21, 2026 13:57

Add ovis model support step 1

45106e9

Signed-off-by: Wang, Zheng W <[email protected]>

Update the format to pass pre-commit

f90e1c0

Signed-off-by: Wang, Zheng W <[email protected]>

testdig force-pushed the ovis branch from dd8b28f to f90e1c0 Compare January 21, 2026 05:58

michalkuligowski reviewed Jan 28, 2026

View reviewed changes

testdig and others added 14 commits January 29, 2026 09:34

Add copyright header

415d992

Signed-off-by: Wang, Zheng W <[email protected]>

Added Qwen3 Test (vllm-project#736)

def8760

Adds Qwen3 model test case for image Signed-off-by: slokesha <[email protected]> Co-authored-by: Iryna Boiko <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Interleaved sliding window fix (vllm-project#805)

dc2de8b

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

[GAUDISW-245665] fix diverge from vllm in multiModalBudget (vllm-proj…

b1560d0

…ect#837) Signed-off-by: linoy buchnik <[email protected]> Signed-off-by: Iryna Boiko <[email protected]> Co-authored-by: Iryna Boiko <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Fix for #32077 (vllm-project#851)

13d0aa3

Signed-off-by: Iryna Boiko <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

DP: Fix for torch.compile (vllm-project#722)

d675017

Signed-off-by: Xinyu Chen <[email protected]> Co-authored-by: Yaser Afshar <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Remove unused test utils (vllm-project#864)

95b0536

Signed-off-by: Milosz Grunwald <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Add support for chunked attention (vllm-project#821)

2314f85

Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Fix HPU model runner profile_run to work with dynamic kv-cache scales (…

e29844b

…vllm-project#852) Signed-off-by: Dudi Lester <[email protected]> Co-authored-by: Kamil Kaczor <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Revert "skip HPU graphs for long prefills" (vllm-project#850)

756eaf6

Reverts vllm-project#780 --------- Signed-off-by: Agata Dobrzyniewicz <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

Implement bucket corrector for Mamba chunk size (vllm-project#886)

2e43dcb

Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size. Signed-off-by: Jakub Byczkowski <[email protected]> Signed-off-by: Wang, Zheng W <[email protected]>

kzawora-intel and others added 2 commits January 29, 2026 09:34

testdig force-pushed the ovis branch from 67a9b08 to 406c869 Compare January 29, 2026 01:34

Merge branch 'main' into ovis

eb02dd2

Merge branch 'main' into ovis

f1e4bd3

Fix an incorrect prefix

691e11e

Signed-off-by: Wang, Zheng W <[email protected]>

testdig and others added 2 commits February 2, 2026 16:33

Merge branch 'main' into ovis

9ad424c

Merge branch 'main' into ovis

37ec32f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ovis models support with default buckets #846

Add ovis models support with default buckets #846

testdig commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

michalkuligowski Jan 28, 2026

Uh oh!

testdig Jan 29, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

testdig commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Add ovis models support with default buckets #846

Are you sure you want to change the base?

Add ovis models support with default buckets #846

Conversation

testdig commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

✅ CI Passed

Uh oh!

michalkuligowski Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

testdig Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 29, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Jan 29, 2026

🚧 CI Blocked

Uh oh!

testdig commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026

✅ CI Passed

Uh oh!

github-actions bot commented Feb 2, 2026

🚧 CI Blocked

Uh oh!

github-actions bot commented Feb 4, 2026

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

testdig Jan 29, 2026 •

edited

Loading