diff --git a/docs/general/faq.md b/docs/general/faq.md index 9b36f1506..3da0a2687 100644 --- a/docs/general/faq.md +++ b/docs/general/faq.md @@ -10,7 +10,7 @@ title: Frequently Asked Questions - Ubuntu 22.04 LTS OS. - Python 3.10. - Intel Gaudi 2 or Intel Gaudi 3 AI accelerator. -- Intel Gaudi software version 1.22.2 and above. +- Intel Gaudi software version {{ VERSION }} and above. ### What is the vLLM plugin and where can I find its GitHub repository? diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md index f2b104097..5cb1364fe 100644 --- a/docs/getting_started/installation.md +++ b/docs/getting_started/installation.md @@ -18,7 +18,7 @@ Before you start, ensure that your environment meets the following requirements: - Python 3.10 - Intel® Gaudi® 2 or 3 AI accelerator -- Intel® Gaudi® software version 1.21.0 or later +- Intel® Gaudi® software version {{ VERSION }} or later Additionally, ensure that the Gaudi execution environment is properly set up. If it is not, complete the setup by using the [Gaudi Installation diff --git a/docs/getting_started/quickstart/quickstart.md b/docs/getting_started/quickstart/quickstart.md index 6cd609db8..4313c78fc 100644 --- a/docs/getting_started/quickstart/quickstart.md +++ b/docs/getting_started/quickstart/quickstart.md @@ -26,7 +26,7 @@ Before you start, ensure that your environment meets the following requirements: - Ubuntu 22.04 or 24.04 - Python 3.10 - Intel® Gaudi® 2 or 3 AI accelerator -- Intel® Gaudi® software version 1.21.0 or later +- Intel® Gaudi® software version {{ VERSION }} or later Additionally, ensure that the Intel® Gaudi® execution environment is properly set up. If it is not, complete the setup by following the [Installation @@ -54,7 +54,7 @@ Follow these steps to run the vLLM server or launch benchmarks on Gaudi using Do | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `MODEL` | Preferred large language model. For a list of the available models, see the next table. | | `HF_TOKEN` | Hugging Face token generated from . | - | `DOCKER_IMAGE` | Docker image name or URL for the vLLM Gaudi container. When using the Gaudi repository, make sure to select Docker images with the *vllm-installer* prefix in the file name. | + | `DOCKER_IMAGE` | Docker image name or URL for the vLLM Gaudi container. When using the Gaudi repository, make sure to select Docker images with the *vllm-plugin* prefix in the file name. | The following table lists the supported vLLM models: @@ -81,7 +81,7 @@ Follow these steps to run the vLLM server or launch benchmarks on Gaudi using Do ```bash MODEL="Qwen/Qwen2.5-14B-Instruct" \ HF_TOKEN="" \ - DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" + DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" ``` 5. Run the vLLM server using Docker Compose. diff --git a/docs/getting_started/quickstart/quickstart_configuration.md b/docs/getting_started/quickstart/quickstart_configuration.md index 0f1f92257..93c9e11d5 100644 --- a/docs/getting_started/quickstart/quickstart_configuration.md +++ b/docs/getting_started/quickstart/quickstart_configuration.md @@ -35,7 +35,7 @@ Set the preferred variable when running the vLLM server using Docker Compose, as ```bash MODEL="Qwen/Qwen2.5-14B-Instruct" \ HF_TOKEN="" \ -DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \ +DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \ TENSOR_PARALLEL_SIZE=1 \ MAX_MODEL_LEN=2048 \ docker compose up @@ -59,7 +59,7 @@ Set the preferred variable when running the vLLM server using Docker Compose, as ```bash MODEL="Qwen/Qwen2.5-14B-Instruct" \ HF_TOKEN="" \ -DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \ +DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \ INPUT_TOK=128 \ OUTPUT_TOK=128 \ CON_REQ=16 \ @@ -76,7 +76,7 @@ This configuration allows you to launch the vLLM server and benchmark together. ```bash MODEL="Qwen/Qwen2.5-14B-Instruct" \ HF_TOKEN="" \ -DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-installer-{{ PT_VERSION }}:latest" \ +DOCKER_IMAGE="vault.habana.ai/gaudi-docker/{{ VERSION }}/ubuntu24.04/habanalabs/vllm-plugin-{{ PT_VERSION }}:latest" \ TENSOR_PARALLEL_SIZE=1 \ MAX_MODEL_LEN=2048 \ INPUT_TOK=128 \