Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/general/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ title: Frequently Asked Questions
- Ubuntu 22.04 LTS OS.
- Python 3.10.
- Intel Gaudi 2 or Intel Gaudi 3 AI accelerator.
- Intel Gaudi software version 1.22.2 and above.
- Intel Gaudi software version {{ VERSION }} and above.

### What is the vLLM plugin and where can I find its GitHub repository?

Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Before you start, ensure that your environment meets the following requirements:

- Python 3.10
- Intel® Gaudi® 2 or 3 AI accelerator
- Intel® Gaudi® software version 1.21.0 or later
- Intel® Gaudi® software version {{ VERSION }} or later

Additionally, ensure that the Gaudi execution environment is properly set up. If
it is not, complete the setup by using the [Gaudi Installation
Expand Down
6 changes: 3 additions & 3 deletions docs/getting_started/quickstart/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Before you start, ensure that your environment meets the following requirements:
- Ubuntu 22.04 or 24.04
- Python 3.10
- Intel® Gaudi® 2 or 3 AI accelerator
- Intel® Gaudi® software version 1.21.0 or later
- Intel® Gaudi® software version {{ VERSION }} or later

Additionally, ensure that the Intel® Gaudi® execution environment is properly set up. If
it is not, complete the setup by following the [Installation
Expand Down Expand Up @@ -54,7 +54,7 @@ Follow these steps to run the vLLM server or launch benchmarks on Gaudi using Do
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `MODEL` | Preferred large language model. For a list of the available models, see the next table. |
| `HF_TOKEN` | Hugging Face token generated from <https://huggingface.co>. |
| `DOCKER_IMAGE` | Docker image name or URL for the vLLM Gaudi container. When using the Gaudi repository, make sure to select Docker images with the *vllm-installer* prefix in the file name. |
| `DOCKER_IMAGE` | Docker image name or URL for the vLLM Gaudi container. When using the Gaudi repository, make sure to select Docker images with the *vllm-plugin* prefix in the file name. |

The following table lists the supported vLLM models:

Expand All @@ -81,7 +81,7 @@ Follow these steps to run the vLLM server or launch benchmarks on Gaudi using Do
```bash
MODEL="Qwen/Qwen2.5-14B-Instruct" \
HF_TOKEN="<your huggingface token>" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest"
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest"
```

5. Run the vLLM server using Docker Compose.
Expand Down
6 changes: 3 additions & 3 deletions docs/getting_started/quickstart/quickstart_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Set the preferred variable when running the vLLM server using Docker Compose, as
```bash
MODEL="Qwen/Qwen2.5-14B-Instruct" \
HF_TOKEN="<your huggingface token>" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \
TENSOR_PARALLEL_SIZE=1 \
MAX_MODEL_LEN=2048 \
docker compose up
Expand All @@ -59,7 +59,7 @@ Set the preferred variable when running the vLLM server using Docker Compose, as
```bash
MODEL="Qwen/Qwen2.5-14B-Instruct" \
HF_TOKEN="<your huggingface token>" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \
INPUT_TOK=128 \
OUTPUT_TOK=128 \
CON_REQ=16 \
Expand All @@ -76,7 +76,7 @@ This configuration allows you to launch the vLLM server and benchmark together.
```bash
MODEL="Qwen/Qwen2.5-14B-Instruct" \
HF_TOKEN="<your huggingface token>" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \
TENSOR_PARALLEL_SIZE=1 \
MAX_MODEL_LEN=2048 \
INPUT_TOK=128 \
Expand Down