Skip to content

Commit b885b51

Browse files
authored
Add Multi-GPU inference note in deployment apps
2 parents 2d79280 + f8d5164 commit b885b51

File tree

4 files changed

+29
-5
lines changed

4 files changed

+29
-5
lines changed

docs/webapp/applications/apps_embed_model_deployment.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,13 @@ values from the file, which can be modified before launching the app instance
9292
* **Instance name** - Name for the Embedding Model Deployment instance. This will appear in the instance list
9393
* **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored
9494
* **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model
95-
Deployment app instance task will be enqueued (make sure an agent is assigned to it)
95+
Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.
96+
97+
:::tip Multi-GPU inference
98+
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
99+
for an example configuration of a queue that allocates multiple GPUs and shared memory.
100+
:::
101+
96102
* **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
97103
* **Model Configuration**
98104
* Model - A ClearML Model ID or a Hugging Face model name (e.g. `openai-community/gpt2`)

docs/webapp/applications/apps_llama_deployment.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,14 @@ values from the file, which can be modified before launching the app instance
8888
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
8989
project-level permissions (i.e. users with read access can use the app instance).
9090
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the
91-
llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it)
92-
**AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
91+
llama.cpp Model Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.
92+
93+
:::tip Multi-GPU inference
94+
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
95+
for an example configuration of a queue that allocates multiple GPUs and shared memory.
96+
:::
97+
98+
* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
9399
* **Model Configuration**: Configure the behavior and performance of the model serving engine.
94100
* CLI: Llama.cpp CLI arguments. If set, these arguments will be passed to Llama.cpp and all following entries will be
95101
ignored, except for the `Model` field.

docs/webapp/applications/apps_model_deployment.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,13 @@ values from the file, which can be modified before launching the app instance
9191
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
9292
project-level permissions (i.e. users with read access can use the app).
9393
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app
94-
instance task will be enqueued (make sure an agent is assigned to that queue)
94+
instance task will be enqueued. Make sure an agent is assigned to that queue.
95+
96+
:::tip Multi-GPU inference
97+
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
98+
for an example configuration of a queue that allocates multiple GPUs and shared memory.
99+
:::
100+
95101
* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
96102
* **Model Configuration**: Configure the behavior and performance of the model engine.
97103
* Trust Remote Code: Select to set Hugging Face [`trust_remote_code`](https://huggingface.co/docs/text-generation-inference/main/en/reference/launcher#trustremotecode)

docs/webapp/applications/apps_sglang.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,13 @@ values from the file, which can be modified before launching the app instance
9090
* **Service Project - Access Control** - The ClearML project where the app instance is created. Access is determined by
9191
project-level permissions (i.e. users with read access can use the app).
9292
* **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the SGLang Model Deployment app
93-
instance task will be enqueued (make sure an agent is assigned to that queue)
93+
instance task will be enqueued. Make sure an agent is assigned to that queue.
94+
95+
:::tip Multi-GPU inference
96+
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
97+
for an example configuration of a queue that allocates multiple GPUs and shared memory.
98+
:::
99+
94100
* **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
95101
* **Model** - A ClearML Model ID or a HuggingFace model name (e.g. `openai-community/gpt2`)
96102
* **Model Configuration**: Configure the behavior and performance of the language model engine. This allows you to

0 commit comments

Comments
 (0)