vllm-project
diff --git a/‎.cd/README.md‎
Lines changed: 2 additions & 3 deletions b/‎.cd/README.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 5 deletions b/‎README.md‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎docs/.nav.yml‎
Lines changed: 5 additions & 2 deletions b/‎docs/.nav.yml‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎docs/README.md‎
Lines changed: 22 additions & 8 deletions b/‎docs/README.md‎
Lines changed: 22 additions & 8 deletions
diff --git a/‎docs/assets/unified_attn/block_table.png‎
12.3 KB b/‎docs/assets/unified_attn/block_table.png‎
12.3 KB
diff --git a/‎docs/assets/unified_attn/block_table_annotated.png‎
17.2 KB b/‎docs/assets/unified_attn/block_table_annotated.png‎
17.2 KB
diff --git a/‎docs/assets/unified_attn/causal.png‎
11.3 KB b/‎docs/assets/unified_attn/causal.png‎
11.3 KB
diff --git a/‎docs/assets/unified_attn/causal_sliced.png‎
11.4 KB b/‎docs/assets/unified_attn/causal_sliced.png‎
11.4 KB
diff --git a/‎docs/assets/unified_attn/unique.png‎
10.2 KB b/‎docs/assets/unified_attn/unique.png‎
10.2 KB
@@ -148,10 +148,9 @@ cd vllm-gaudi/.cd/
 
    ```bash
    HF_TOKEN=<your huggingface token> \
-   DOCKER_IMAGE="vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/vllm-installer-2.7.1:latest" \  
-   VLLM_SERVER_CONFIG_FILE=server/server_text.yaml \
+   VLLM_SERVER_CONFIG_FILE=server/server_scenarios_text.yaml \
    VLLM_SERVER_CONFIG_NAME=llama31_8b_instruct \
-   VLLM_BENCHMARK_CONFIG_FILE=benchmark/benchmark_text.yaml \
+   VLLM_BENCHMARK_CONFIG_FILE=benchmark/benchmark_scenarios_text.yaml \
    VLLM_BENCHMARK_CONFIG_NAME=llama31_8b_instruct \
    docker compose --profile benchmark up
    ```
 
@@ -24,10 +24,7 @@ vLLM Gaudi plugin (vllm-gaudi) integrates Intel Gaudi accelerators with vLLM to
 
 This plugin follows the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162) and [[RFC]: Enhancing vLLM Plugin Architecture](https://github.com/vllm-project/vllm/issues/19161) principles, providing a modular interface for Intel Gaudi hardware.
 
-Learn more: 🚀 [vLLM Plugin System Overview](https://docs.vllm.ai/en/latest/design/plugin_system.html)
-
-## Running vLLM on Gaudi with Docker Compose
-We are delivering ready-to-run container images that include both vLLM and Gaudi software. Please follow the [instruction](https://github.com/vllm-project/vllm-gaudi/tree/releases/v0.11.0/.cd) to quickly launch vLLM on Gaudi using a prebuilt Docker image and Docker Compose, with options for custom parameters and benchmarking.
+Learn more: 🚀 [vLLM Plugin System Overview](https://vllm-gaudi.readthedocs.io/en/latest/design/plugin_system.html)
 
 ## Getting Started
 0. Preparation of the Setup
@@ -45,6 +42,7 @@ We are delivering ready-to-run container images that include both vLLM and Gaudi
     git clone https://github.com/vllm-project/vllm-gaudi
     cd vllm-gaudi
     export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null)
+    cd ..
     ```
 
 2. Install vLLM with `pip` or [from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source):
@@ -54,7 +52,7 @@ We are delivering ready-to-run container images that include both vLLM and Gaudi
     git clone https://github.com/vllm-project/vllm
     cd vllm
     git checkout $VLLM_COMMIT_HASH
-    pip install -r <(sed '/^[torch]/d' requirements/build.txt)
+    pip install -r <(sed '/^torch/d' requirements/build.txt)
     VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e .
     cd ..
     ```
 
@@ -2,8 +2,11 @@ nav:
   - Home: 
     - vLLM x Intel Gaudi: README.md
     - Getting Started:
-      - getting_started/quickstart.md
-      - getting_started/installation.md
+      - Quick Start: 
+        - getting_started/quickstart.md
+        - getting_started/quickstart_configuration.md
+        - getting_started/quickstart_inference.md
+      - Installation: getting_started/installation.md
     - Quick Links:
       - User Guide: user_guide/README.md
       - Developer Guide: dev_guide/README.md
 
@@ -1,4 +1,4 @@
-# Welcome to vLLM x Intel Gaudi
+# Intel® Gaudi® vLLM Plugin
 
 <figure markdown="span" style="display: flex; justify-content: center; align-items: center; gap: 10px; margin: auto;">
   <img src="./assets/logos/vllm-logo-text-light.png" alt="vLLM" style="width: 30%; margin: 0;"> x
@@ -15,14 +15,28 @@
 <a class="github-button" href="https://github.com/vllm-project/vllm-gaudi/fork" data-show-count="true" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
 </p>
 
-vLLM Gaudi plugin (vllm-gaudi) integrates Intel Gaudi accelerators with vLLM to optimize large language model inference.
+Welcome to the **vLLM-Gaudi plugin**, a community-maintained integration layer that enables high-performance large language model (LLM) inference on Intel® Gaudi® AI accelerators.
 
-This plugin follows the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162) and [[RFC]: Enhancing vLLM Plugin Architecture](https://github.com/vllm-project/vllm/issues/19161) principles, providing a modular interface for Intel Gaudi hardware.
+## 🔍 What is vLLM-Gaudi?
 
-Learn more:
+The **vLLM-Gaudi plugin** connects the vLLM serving engine with Intel Gaudi hardware, offering optimized inference capabilities for enterprise-scale LLM workloads. It is developed and maintained by Intel/Gaudi team and follows the Hardware Pluggable [RFC](https://github.com/vllm-project/vllm/issues/11162) and vLLM Plugin Architecture [RFC](https://github.com/vllm-project/vllm/issues/19161) for modular integration.
 
-📚 [Intel Gaudi Documentation](https://docs.habana.ai/en/v1.21.1/index.html)  
-🚀 [vLLM Plugin System Overview](https://docs.vllm.ai/en/latest/design/plugin_system.html)
+## 🚀 Why Use It?
 
-## Running vLLM on Gaudi with Docker Compose
-We are delivering ready-to-run container images that include both vLLM and Gaudi software. Please follow the [instruction](https://github.com/vllm-project/vllm-gaudi/tree/releases/v0.11.0/.cd) to quickly launch vLLM on Gaudi using a prebuilt Docker image and Docker Compose, with options for custom parameters and benchmarking.
+- **Optimized for Gaudi**: Supports advanced features like bucketing mechanism, FP8 quantization, and custom graph caching for fast warm-up and efficient memory use.
+- **Scalable and Efficient**: Designed to maximize throughput and minimize latency for large-scale deployments, making it ideal for production-grade LLM inference.
+- **Community-Ready**: Actively maintained on [GitHub](https://github.com/vllm-project/vllm-gaudi) with contributions from Intel, Gaudi team, and the broader vLLM ecosystem.
+
+## ✅ Action Items
+
+To get started with the Intel® Gaudi® vLLM Plugin:
+
+- [ ] **Set up your environment** using the [quickstart](getting_started/quickstart.md) and plugin locally or in your containerized environment.
+- [ ] **Run inference** using supported models like Llama 3.1, Mixtral, or DeepSeek.
+- [ ] **Explore advanced features** such as FP8 quantization, recipe caching, and expert parallelism.
+- [ ] **Join the community** by contributing to the [vLLM-Gaudi GitHub repo](https://github.com/vllm-project/vllm-gaudi).
+
+### Learn more
+
+📚 [Intel Gaudi Documentation](https://docs.habana.ai/en/latest/index.html)  
+📦 [vLLM Plugin System Overview](https://docs.vllm.ai/en/latest/design/plugin_system.html)