Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# SPDX-FileCopyrightText: Copyright (c) 2020-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:26.03-py3
ARG TRITONSDK_BASE_IMAGE=nvcr.io/nvidia/tritonserver:26.03-py3-sdk
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:26.04-py3
ARG TRITONSDK_BASE_IMAGE=nvcr.io/nvidia/tritonserver:26.04-py3-sdk

ARG MODEL_ANALYZER_VERSION=1.54.0dev
ARG MODEL_ANALYZER_CONTAINER_VERSION=26.05dev
Expand All @@ -20,9 +20,15 @@ RUN apt update -qq && apt install -y docker.io wkhtmltopdf

# Install tritonclient
COPY --from=sdk /workspace/install/python /tmp/tritonclient
RUN find /tmp/tritonclient -maxdepth 1 -type f -name \
"tritonclient-*-manylinux*.whl" | xargs printf -- '%s[all]' | \
xargs pip3 install --upgrade && rm -rf /tmp/tritonclient/

RUN --mount=type=secret,id=triton_ci_pip_extra_values,env=TRITON_CI_PYPI_EXTRA_VALUES \
if [ -n "${TRITON_CI_PYPI_EXTRA_VALUES}" ]; then \
find /tmp/tritonclient -maxdepth 1 -type f -name \
"tritonclient-*-any*.whl" -exec pip3 install --upgrade ${TRITON_CI_PYPI_EXTRA_VALUES} {}[all] \; ; \
else \
find /tmp/tritonclient -maxdepth 1 -type f -name \
"tritonclient-*-any*.whl" -exec pip3 install --upgrade {}[all] \; ; \
fi

WORKDIR /opt/triton-model-analyzer

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ Triton Model Analyzer is a CLI tool which can help you find a more optimal confi

- [Optuna Search](docs/config_search.md#optuna-search-mode) **_-ALPHA RELEASE-_** allows you to search for every parameter that can be specified in the model configuration, using a hyperparameter optimization framework. Please see the [Optuna](https://optuna.org/) website if you are interested in specific details on how the algorithm functions.

- [Quick Search](docs/config_search.md#quick-search-mode) will **sparsely** search the [Max Batch Size](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_configuration.md#maximum-batch-size),
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/batcher.md#dynamic-batcher), and
[Instance Group](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_configuration.md#instance-groups) spaces by utilizing a heuristic hill-climbing algorithm to help you quickly find a more optimal configuration
- [Quick Search](docs/config_search.md#quick-search-mode) will **sparsely** search the [Max Batch Size](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_configuration.md#maximum-batch-size),
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/batcher.md#dynamic-batcher), and
[Instance Group](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_configuration.md#instance-groups) spaces by utilizing a heuristic hill-climbing algorithm to help you quickly find a more optimal configuration

- [Automatic Brute Search](docs/config_search.md#automatic-brute-search) will **exhaustively** search the
[Max Batch Size](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_configuration.md#maximum-batch-size),
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/batcher.md#dynamic-batcher), and
[Instance Group](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_configuration.md#instance-groups)
[Max Batch Size](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_configuration.md#maximum-batch-size),
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/batcher.md#dynamic-batcher), and
[Instance Group](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_configuration.md#instance-groups)
parameters of your model configuration

- [Manual Brute Search](docs/config_search.md#manual-brute-search) allows you to create manual sweeps for every parameter that can be specified in the model configuration
Expand Down
6 changes: 3 additions & 3 deletions docs/bls_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ git pull origin main
**1. Pull the SDK container:**

```
docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**2. Run the SDK container**
Expand All @@ -48,7 +48,7 @@ docker run -it --gpus 1 \
--shm-size 2G \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
--net=host nvcr.io/nvidia/tritonserver:26.03-py3-sdk
--net=host nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
Expand All @@ -57,7 +57,7 @@ docker run -it --gpus 1 \

---

The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_repository.md) that contains the BLS model `bls` which calculates the sum of two inputs using `add` model.
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_repository.md) that contains the BLS model `bls` which calculates the sum of two inputs using `add` model.

An example model analyzer YAML config that performs a BLS model search

Expand Down
2 changes: 1 addition & 1 deletion docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ cpu_only_composing_models: <comma-delimited-string-list>
[ reload_model_disable: <bool> | default: false]

# Triton Docker image tag used when launching using Docker mode
[ triton_docker_image: <string> | default: nvcr.io/nvidia/tritonserver:26.03-py3 ]
[ triton_docker_image: <string> | default: nvcr.io/nvidia/tritonserver:26.04-py3 ]

# Triton Server HTTP endpoint url used by Model Analyzer client"
[ triton_http_endpoint: <string> | default: localhost:8000 ]
Expand Down
6 changes: 3 additions & 3 deletions docs/ensemble_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ git pull origin main
**1. Pull the SDK container:**

```
docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**2. Run the SDK container**
Expand All @@ -48,7 +48,7 @@ docker run -it --gpus 1 \
--shm-size 1G \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
--net=host nvcr.io/nvidia/tritonserver:26.03-py3-sdk
--net=host nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
Expand All @@ -57,7 +57,7 @@ docker run -it --gpus 1 \

---

The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models.
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models.

Run the Model Analyzer `profile` subcommand inside the container with:

Expand Down
2 changes: 1 addition & 1 deletion docs/kubernetes_deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ images:

triton:
image: nvcr.io/nvidia/tritonserver
tag: 26.03-py3
tag: 26.04-py3
```

The model analyzer executable uses the config file defined in `helm-chart/templates/config-map.yaml`. This config can be modified to supply arguments to model analyzer. Only the content under the `config.yaml` section of the file should be modified.
Expand Down
6 changes: 3 additions & 3 deletions docs/mm_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ git pull origin main
**1. Pull the SDK container:**

```
docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**2. Run the SDK container**
Expand All @@ -47,15 +47,15 @@ docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker run -it --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
--net=host nvcr.io/nvidia/tritonserver:26.03-py3-sdk
--net=host nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

## `Step 3:` Profile both models concurrently

---

The [examples/quick-start](../examples/quick-start) directory is an example
[Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_repository.md) that contains two libtorch models: `add_sub` & `resnet50_python`
[Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_repository.md) that contains two libtorch models: `add_sub` & `resnet50_python`

Run the Model Analyzer `profile` subcommand inside the container with:

Expand Down
6 changes: 3 additions & 3 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ git pull origin main
**1. Pull the SDK container:**

```
docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

**2. Run the SDK container**
Expand All @@ -47,15 +47,15 @@ docker pull nvcr.io/nvidia/tritonserver:26.03-py3-sdk
docker run -it --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
--net=host nvcr.io/nvidia/tritonserver:26.03-py3-sdk
--net=host nvcr.io/nvidia/tritonserver:26.04-py3-sdk
```

## `Step 3:` Profile the `add_sub` model

---

The [examples/quick-start](../examples/quick-start) directory is an example
[Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.03/docs/user_guide/model_repository.md) that contains a simple libtorch model which calculates
[Triton Model Repository](https://github.com/triton-inference-server/server/blob/r26.04/docs/user_guide/model_repository.md) that contains a simple libtorch model which calculates
the sum and difference of two inputs.

Run the Model Analyzer `profile` subcommand inside the container with:
Expand Down
2 changes: 1 addition & 1 deletion helm-chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ images:

triton:
image: nvcr.io/nvidia/tritonserver
tag: 26.03-py3
tag: 26.04-py3
2 changes: 1 addition & 1 deletion model_analyzer/config/input/config_defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
DEFAULT_CONCURRENCY_SWEEP_DISABLE = False
DEFAULT_DCGM_DISABLE = False
DEFAULT_TRITON_LAUNCH_MODE = "local"
DEFAULT_TRITON_DOCKER_IMAGE = "nvcr.io/nvidia/tritonserver:26.03-py3"
DEFAULT_TRITON_DOCKER_IMAGE = "nvcr.io/nvidia/tritonserver:26.04-py3"
DEFAULT_TRITON_HTTP_ENDPOINT = "localhost:8000"
DEFAULT_TRITON_GRPC_ENDPOINT = "localhost:8001"
DEFAULT_TRITON_METRICS_URL = "http://localhost:8002/metrics"
Expand Down
Loading