Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/webapp/applications/apps_embed_model_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,13 @@ values from the file, which can be modified before launching the app instance
* **Instance name** - Name for the Embedding Model Deployment instance. This will appear in the instance list
* **Service Project** - ClearML Project where your Embedding Model Deployment app instance will be stored
* **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the Embedding Model
Deployment app instance task will be enqueued (make sure an agent is assigned to it)
Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

* **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
* **Model Configuration**
* Model - A ClearML Model ID or a Hugging Face model name (e.g. `openai-community/gpt2`)
Expand Down
10 changes: 8 additions & 2 deletions docs/webapp/applications/apps_llama_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,14 @@ values from the file, which can be modified before launching the app instance
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app instance).
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the
llama.cpp Model Deployment app instance task will be enqueued (make sure an agent is assigned to it)
**AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
llama.cpp Model Deployment app instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
* **Model Configuration**: Configure the behavior and performance of the model serving engine.
* CLI: Llama.cpp CLI arguments. If set, these arguments will be passed to Llama.cpp and all following entries will be
ignored, except for the `Model` field.
Expand Down
8 changes: 7 additions & 1 deletion docs/webapp/applications/apps_model_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,13 @@ values from the file, which can be modified before launching the app instance
* **Service Project (Access Control)**: The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app).
* **Queue**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the vLLM Model Deployment app
instance task will be enqueued (make sure an agent is assigned to that queue)
instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
* **Model Configuration**: Configure the behavior and performance of the model engine.
* Trust Remote Code: Select to set Hugging Face [`trust_remote_code`](https://huggingface.co/docs/text-generation-inference/main/en/reference/launcher#trustremotecode)
Expand Down
8 changes: 7 additions & 1 deletion docs/webapp/applications/apps_sglang.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,13 @@ values from the file, which can be modified before launching the app instance
* **Service Project - Access Control** - The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app).
* **Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the SGLang Model Deployment app
instance task will be enqueued (make sure an agent is assigned to that queue)
instance task will be enqueued. Make sure an agent is assigned to that queue.

:::tip Multi-GPU inference
To run multi-GPU inference, ensure the queue's pod specification (from the base template and/or `templateOverrides`) defines multiple GPUs. See [GPU Queues with Shared Memory](../../clearml_agent/clearml_agent_custom_workload.md#example-gpu-queues-with-shared-memory)
for an example configuration of a queue that allocates multiple GPUs and shared memory.
:::

* **AI Gateway Route** - Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
* **Model** - A ClearML Model ID or a HuggingFace model name (e.g. `openai-community/gpt2`)
* **Model Configuration**: Configure the behavior and performance of the language model engine. This allows you to
Expand Down