Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/webapp/applications/apps_embed_model_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ values from the file, which can be modified before launching the app instance
* **Instance name** - Name for the Embedding Model Deployment instance. This will appear in the instance list
* **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored
* **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model
Deployment app instance task will be enqueued (make sure an agent is assigned to it)
Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

Expand Down
4 changes: 2 additions & 2 deletions docs/webapp/applications/apps_llama_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,10 @@ values from the file, which can be modified before launching the app instance
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app instance).
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the
llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it)
llama.cpp Model Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

Expand Down
4 changes: 2 additions & 2 deletions docs/webapp/applications/apps_model_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,10 @@ values from the file, which can be modified before launching the app instance
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app).
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app
instance task will be enqueued (make sure an agent is assigned to that queue)
instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

Expand Down
2 changes: 1 addition & 1 deletion docs/webapp/applications/apps_sglang.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ values from the file, which can be modified before launching the app instance
instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, the queue must be associated with a multi-GPU pod template. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) requests multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

Expand Down