Update compatibility matrix, and refine installation instructions #920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

PatrykWo merged 4 commits into main from pwolsza/release-params

Feb 4, 2026

+27 −7

README.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -14,7 +14,7 @@ vLLM Hardware Plugin for Intel® Gaudi® @@
     ---
     *Latest News* 🔥
+    - [2026/02] Version 0.14.1 is now available, built on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces support for Granite 4.0h and Qwen 3 VL models.
     - [2026/01] Version 0.13.0 is now available, built on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models.
     - [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
     - [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for Intel® Gaudi®, which is not yet intended for general use.
@@ Expand Down @@

docs/getting_started/compatibility_matrix.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -5,8 +5,8 @@ title: Compatibility Matrix
  
    The following table detail the supported vLLM versions for Intel® Gaudi® 2 and Intel® Gaudi® 3 AI accelerators.

    | Intel Gaudi Software | vLLM v0.10.0 | vLLM v0.10.1 | vLLM v0.11.2 |  vLLM v0.12.0  |

    | Intel Gaudi Software | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.13.0 |  vLLM v0.14.1  |

    | :------------------- | :----------: | :----------: | :----------: | :------------: |

    | 1.22.1               |   ✅ Alfa    |   ✅ Beta    |      ❌      |       ❌       |

    | 1.22.2               |      ❌      |      ❌      |      ✅      |       ❌       |

    | 1.23.0               |      ❌      |      ❌      |      ❌      | In development |

    | 1.22.1               |   ✅ Beta    |   ❌    |      ❌      |       ❌       |

    | 1.22.2               |      ❌      |      ✅      |      ❌      |       ❌       |

    | 1.23.0               |      ❌      |      ✅      |      ✅      | ✅ |

docs/getting_started/installation.md

-Original file line number
+Diff line change
@@ Expand Up @@
 . Run the latest Docker image from the Intel® Gaudi® vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel® Gaudi®, operating system, and PyTorch. Ensure that these versions are supported, according to the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html).
-            docker pull vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
-            docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
+            docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
+            docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
         For more information, see the [Intel Gaudi documentation](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#pull-prebuilt-containers).
@@ Expand Down @@

docs/getting_started/validated_models.md

-Original file line number
+Diff line change
@@ Expand Up @@
     | [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)     | 1    | BF16   |Gaudi 3|
     | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 1 | BF16 | Gaudi 3 |
     | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)     | 4, 8    | BF16, FP8    | Gaudi 2, Gaudi 3|
+    | [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)     | 1    | BF16, FP8    | Gaudi 3|
+    | [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)     | 1    | BF16, FP8    | Gaudi 3|
+    | [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)     | 8    | BF16   | Gaudi 3|
+    | [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)     | 4    |  FP8    | Gaudi 3|
+    | [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)     | 8    |  BF16    | Gaudi 3|
+    | [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)     | 4    |  FP8    | Gaudi 3|
+    | [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)     | 1    |  BF16    | Gaudi 3|
     Validation of the following configurations is currently in progress:
@@ Expand Down @@

docs/release_notes.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -2,6 +2,19 @@ @@
     This document provides an overview of the features, changes, and fixes introduced in each release of the vLLM Hardware Plugin for Intel® Gaudi®.
+    ## 0.14.1
+    This version is based on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) with support [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html), and introduces support for the following models on Gaudi 3:
+    - [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)
+    - [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
+    - [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)
+    - [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)
+    - [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)
+    - [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)
+    - [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)
+    - [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)
     ## 0.13.0
     This version is based on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and supports [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html).
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update compatibility matrix, and refine installation instructions #920

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!