Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ vLLM Hardware Plugin for Intel® Gaudi®

---
*Latest News* 🔥

- [2026/02] Version 0.14.1 is now available, built on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces support for Granite 4.0h and Qwen 3 VL models.
- [2026/01] Version 0.13.0 is now available, built on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and fully compatible with [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models.
- [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
- [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for Intel® Gaudi®, which is not yet intended for general use.
Expand Down
8 changes: 4 additions & 4 deletions docs/getting_started/compatibility_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ title: Compatibility Matrix

The following table detail the supported vLLM versions for Intel® Gaudi® 2 and Intel® Gaudi® 3 AI accelerators.

| Intel Gaudi Software | vLLM v0.10.0 | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.12.0 |
| Intel Gaudi Software | vLLM v0.10.1 | vLLM v0.11.2 | vLLM v0.13.0 | vLLM v0.14.1 |
| :------------------- | :----------: | :----------: | :----------: | :------------: |
| 1.22.1 | ✅ Alfa | ✅ Beta | ❌ | ❌ |
| 1.22.2 | ❌ | | | ❌ |
| 1.23.0 | ❌ | | | In development |
| 1.22.1 | ✅ Beta | | ❌ | ❌ |
| 1.22.2 | ❌ | | | ❌ |
| 1.23.0 | ❌ | | | |
4 changes: 2 additions & 2 deletions docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ There are two ways to install vLLM Hardware Plugin for Intel® Gaudi® from sour

2. Run the latest Docker image from the Intel® Gaudi® vault as in the following code sample. Make sure to provide your versions of vLLM Hardware Plugin for Intel® Gaudi®, operating system, and PyTorch. Ensure that these versions are supported, according to the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html).

docker pull vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.23.0/ubuntu24.04/habanalabs/pytorch-installer-2.9.0:latest
docker pull vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/pytorch-installer-{{ PT_VERSION }}:latest

For more information, see the [Intel Gaudi documentation](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#pull-prebuilt-containers).

Expand Down
7 changes: 7 additions & 0 deletions docs/getting_started/validated_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ The following configurations have been validated to function with Intel® Gaudi
| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 1 | BF16 |Gaudi 3|
| [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 1 | BF16 | Gaudi 3 |
| [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | 4, 8 | BF16, FP8 | Gaudi 2, Gaudi 3|
| [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) | 1 | BF16, FP8 | Gaudi 3|
| [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking) | 1 | BF16, FP8 | Gaudi 3|
| [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) | 8 | BF16 | Gaudi 3|
| [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8) | 4 | FP8 | Gaudi 3|
| [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking) | 8 | BF16 | Gaudi 3|
| [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8) | 4 | FP8 | Gaudi 3|
| [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small) | 1 | BF16 | Gaudi 3|

Validation of the following configurations is currently in progress:

Expand Down
13 changes: 13 additions & 0 deletions docs/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@

This document provides an overview of the features, changes, and fixes introduced in each release of the vLLM Hardware Plugin for Intel® Gaudi®.

## 0.14.1

This version is based on [vLLM 0.14.1](https://github.com/vllm-project/vllm/releases/tag/v0.14.1) with support [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html), and introduces support for the following models on Gaudi 3:

- [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small)
- [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
- [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct)
- [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking)
- [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct)
- [Qwen/Qwen3-VL-235B-A22B-Instruct-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8)
- [Qwen/Qwen3-VL-235B-A22B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking)
- [Qwen/Qwen3-VL-235B-A22B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking-FP8)

## 0.13.0

This version is based on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and supports [Intel® Gaudi® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html).
Expand Down