Skip to content

Commit 397ec53

Browse files
authored
Docs: Getting started llava update + vllm execution modes table fix (#788)
Getting started llava update + vllm execution modes table fix
1 parent 1428268 commit 397ec53

File tree

1 file changed

+10
-23
lines changed

1 file changed

+10
-23
lines changed

docs/source/getting_started/installation/ai_accelerator/hpu-gaudi.inc.md

Lines changed: 10 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -149,37 +149,23 @@ The following configurations have been validated to be function with Gaudi2 devi
149149
- [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) with tensor parallelism on 8x HPU, BF16 datatype with random or greedy sampling
150150
- [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on single HPU or with tensor parallelism on 2x HPU, BF16 datatype with random or greedy sampling
151151
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) with tensor parallelism on 2x HPU, BF16 datatype with random or greedy sampling
152+
- [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) on single HPU or with tensor parallelism on 8x HPU, BF16 datatype
152153

153154
## Performance Tuning
154155

155156
### Execution Modes
156157

157158
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
158159

159-
:::{list-table} vLLM execution modes
160-
:widths: 25 25 50
161-
:header-rows: 1
162-
163-
- * `PT_HPU_LAZY_MODE`
164-
* `enforce_eager`
165-
* execution mode
166-
- * 0
167-
* 0
168-
* torch.compile
169-
- * 0
170-
* 1
171-
* PyTorch eager mode
172-
- * 1
173-
* 0
174-
* HPU Graphs
175-
- * 1
176-
* 1
177-
* PyTorch lazy mode
178-
:::
160+
| `PT_HPU_LAZY_MODE` | `enforce_eager` | Execution Mode |
161+
| ------------------ | --------------- | ------------------ |
162+
| 0 | 0 | torch.compile |
163+
| 0 | 1 | PyTorch eager mode |
164+
| 1 | 0 | HPU Graphs |
165+
| 1 | 1 | PyTorch lazy mode |
179166

180-
:::{warning}
181-
All modes using PT_HPU_LAZY_MODE=0 are experimental and should only be used for validating functional correctness. To achieve the best performance, use HPU Graphs or PyTorch Lazy Mode. Performance improvements are planned for future releases.
182-
:::
167+
> [!WARNING]
168+
> All modes using PT_HPU_LAZY_MODE=0 are experimental and should only be used for validating functional correctness. To achieve the best performance, use HPU Graphs or PyTorch Lazy Mode. Performance improvements are planned for future releases.
183169
184170
### Bucketing Mechanism
185171

@@ -373,6 +359,7 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM
373359

374360
- `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used. `1` is the default.
375361
- `PT_HPU_ENABLE_LAZY_COLLECTIVES` must be set to `true` for tensor parallel inference with HPU Graphs.
362+
- `PT_HPUGRAPH_DISABLE_TENSOR_CACHE` must be set to `false` for llava model.
376363

377364
## Quantization, FP8 Inference and Model Calibration Process
378365

0 commit comments

Comments
 (0)