Container image env var by maryamtahhan · Pull Request #94 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-04-03T16:00:27Z

Summary by CodeRabbit

New Features
- Add environment variables to override container images; runtime now reports which image is used.
Documentation
- Setup guide updated with the new environment variable options for customizing container images.
Chores
- Test infra accepts dynamic workload definitions with clearer validation messages; server startup flow adjusted to skip LLM server for embedding workloads.

coderabbitai · 2026-04-03T16:00:43Z

📝 Walkthrough

Walkthrough

Adds environment-variable driven container image overrides (VLLM_CONTAINER_IMAGE, GUIDELLM_CONTAINER_IMAGE, VLLM_BENCH_CONTAINER_IMAGE), updates Ansible group vars to use those env lookups with fallbacks, and replaces fixed workload allowlists with dynamic validation against test_configs.keys(); also expands a debug output to show the GuideLLM image.

Changes

Cohort / File(s)	Summary
Docs `automation/test-execution/ansible/ansible.md`, `docs/getting-started.md`	Documented/exported new env vars: `VLLM_CONTAINER_IMAGE`, `GUIDELLM_CONTAINER_IMAGE` (with defaults).
Inventory / Container images `automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml`, `automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`	Replaced hardcoded container image strings with Jinja2 `lookup('env', ...)
Playbook: concurrent load `automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`	Changed `base_workload` validation to accept any key in `test_configs.keys()` and updated the failure message to list supported workloads dynamically.
Role: vllm_server `automation/test-execution/ansible/roles/vllm_server/tasks/main.yml`, `.../start-llm.yml`	Replaced static workload allowlist with dynamic `test_configs.keys()` checks (excluding `embedding`), updated fail messages, and adjusted includes so the LLM server starts for non-embedding workloads.
Role: benchmark_guidellm `automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`	Extended debug output to display `guidellm_cfg.container_image` when using container mode; shows `N/A (using host guidellm)` otherwise.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Container image env var' directly refers to the main change: adding environment variable support for container images used in test execution.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml (1)

18-19: Consider using a versioned tag instead of latest for reproducibility.

Using latest tag can lead to non-reproducible benchmark results if the image is updated between runs. The role's internal default at line 46 uses v0.5.3, creating an inconsistency.

Suggested change for consistency

    # Using GuideLLM official container image
    # Can be overridden with environment variable: export GUIDELLM_CONTAINER_IMAGE=...
-   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:latest', true) }}"
+   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:v0.5.3', true) }}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`
around lines 18 - 19, The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 94-97: The validation currently allows any key from test_configs
(via base_workload in test_configs.keys()) which lets invalid types through;
change it to only accept true base workloads (e.g., restrict to ['chat','code'])
by replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.

---

Nitpick comments:
In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`:
- Around line 18-19: The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 684522b9-f263-44b4-b2e8-f6a2147363b8

📥 Commits

Reviewing files that changed from the base of the PR and between c991528 and a995583.

📒 Files selected for processing (6)

automation/test-execution/ansible/ansible.md
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
docs/getting-started.md

jharriga · 2026-04-03T18:26:09Z

I reviewed this PR in two steps

"-e $IMAGE" support
Using this syntax I had success
for VLLM_IMAGE in "${image_array[@]}"; do
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"
New workload_type support
I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure
llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

maryamtahhan · 2026-04-06T04:25:38Z

I reviewed this PR in two steps

"-e $IMAGE" support
Using this syntax I had success
for VLLM_IMAGE in "${image_array[@]}"; do
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"

New workload_type support
I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section?

jharriga · 2026-04-06T14:50:41Z

yes, I added it in test_configs using this syntax $ vi automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml test_configs: # NEW Chat-lite workload chat_lite: workload_type: "chat_lite" isl: 256 # Standard input: user query osl: 128 # Standard output: assistant response variability: false backend: "openai-chat" vllm_args: - "--dtype=auto" # Fallback dtype - overridden by model-specific dtype - "--no-enable-prefix-caching" # Baseline mode: no prefix caching - "--max-model-len=2048" # Limit model context to workload needs (2x total tokens for headroom) kv_cache_space: "40GiB" # Fallback value - should be overridden by model-specific kv_cache_sizes Eventually I hijacked 'chat_var', reduced the ISL and OSL values and ran a test using that workload-type. # NEW Chat-lite workload chat_var: workload_type: "chat_var" isl: 256 # Standard input: user query osl: 128 # Standard output: assistant response variability: false backend: "openai-chat" vllm_args: - "--dtype=auto" # Fallback dtype - overridden by model-specific dtype - "--no-enable-prefix-caching" # Baseline mode: no prefix caching - "--max-model-len=2048" # Limit model context to workload needs (2x total tokens for headroom) kv_cache_space: "40GiB" # Fallback value - should be overridden by model-specific kv_cache_sizes - John

…

On Mon, Apr 6, 2026 at 12:26 AM Maryam Tahhan ***@***.***> wrote: *maryamtahhan* left a comment (redhat-et/vllm-cpu-perf-eval#94) <#94 (comment)> I reviewed this PR in two steps 1. "-e $IMAGE" support Using this syntax I had success for VLLM_IMAGE in "${image_array[@]}"; do -e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}" 2. New workload_type support I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \ TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" } NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94 Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section? — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFNXSAWC3FJQS5SMHEKZVI34UMWVRAVCNFSM6AAAAACXL5U7COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOJQGI4TCMRSHA> . You are receiving this because your review was requested.Message ID: ***@***.***>

jharriga · 2026-04-07T17:41:18Z

Retested using the changes implemented in
[https://github.com//pull/94/commits/770fc59d6e6d7f0dbdff8a854598d89900922d27]

SUCCESS. I was able to add a new workload_type 'chat_lite' and the test ran successfully.
Thank you!

jharriga · 2026-04-07T18:46:19Z

Hold on, I'm seeing an issue with the "-e $IMAGE" support.
While the Playbook is not complaining about the syntax it does appear that the ENVvar is being ignored.
No matter which container I provide the tests are always running on the DUT with the default container (docker.io/vllm/vllm-openai-cpu:v0.18.0).

Script syntax syntax
image_array=("public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.18.0")
for VLLM_IMAGE in "${image_array[@]}"; do
ansible-playbook -i inventory/hosts.yml
llm-benchmark-concurrent-load.yml
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"
done

Ansible Controller
laptop$ ps -efl | grep concurrent
0 S jharriga 151978 151975 3 80 0 - 112578 hrtime 14:29 pts/0 00:00:05 /usr/bin/python3 -P /usr/bin/ansible-playbook -i inventory/hosts.yml llm-benchmark-concurrent-load.yml -e test_model=Qwen/Qwen3-0.6B -e base_workload=chat -e requested_cores=16 -e skip_phase_2=true -e skip_phase_3=true -e skip_prometheus_export=true -e guidellm_max_seconds=600 -e guidellm_rate=[1,4,8] -e VLLM_CONTAINER_IMAGE={'image': 'public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.12.0'}

DUT$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
34e78c015506 docker.io/vllm/vllm-openai-cpu:v0.18.0 --model Qwen/Qwen... 4 minutes ago Up 4 minutes vllm-server

jharriga · 2026-04-08T18:00:13Z

This PR looks good.
Both image ENVvar and 'workload_type' specifications are now working
I was incorrectly passing the ENVvar using this syntax
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"

I changed my script syntax to use "export" and now the testruns are using the designated ENVvar container image.

for VLLM_IMAGE in "${image_array[@]}"; do
export VLLM_CONTAINER_IMAGE="${VLLM_IMAGE}"

Add support for configuring container images via environment variables: - VLLM_CONTAINER_IMAGE: vLLM server image - GUIDELLM_CONTAINER_IMAGE: GuideLLM benchmark tool image - VLLM_BENCH_CONTAINER_IMAGE: vLLM bench tool image All variables include sensible defaults matching current configuration, allowing users to easily override images without editing config files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add container image display to GuideLLM configuration output to match the vLLM server display, making it easier to verify which image is being used during test execution. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Replace hardcoded workload type validation with dynamic check against test_configs.keys(), matching the approach used in llm-benchmark-auto.yml. This allows users to add custom workloads to test-workloads.yml and automatically use them in concurrent load testing without modifying the playbook validation logic. Changes: - base_workload validation now checks test_configs.keys() - variable workload check now dynamic instead of hardcoded list - Updated documentation to reflect workload flexibility Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Update workload type validation in vllm_server role to dynamically check against test_configs.keys() instead of hardcoded workload list. This allows users to add custom workloads to test-workloads.yml without modifying role code. Changes: - main.yml: Validate against test_configs.keys() - start-llm.yml: Validate non-embedding workloads dynamically - Improved error messages to show available workloads Fixes issue where new workloads in test-workloads.yml were rejected by hardcoded validation, even after concurrent-load playbook was made dynamic. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

coderabbitai

🧹 Nitpick comments (1)

automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml (1)

24-61: Move workload validation before test_configs[workload_type] access.

Line 26 reads test_configs[workload_type] before the assert at Lines 55-60. If this task file is called directly with an invalid workload_type, execution fails early with a dict-key error and skips your clearer validation message.

Proposed reordering

-- name: Get workload and core configuration
-  ansible.builtin.set_fact:
-    workload_cfg: "{{ test_configs[workload_type] }}"
-    core_cfg: "{{ core_configuration }}"
-    container_cfg: "{{ container_runtime }}"
-
 # ============================================================================
 # Caching Mode Configuration
 # ============================================================================
@@
 # ============================================================================
 # Workload Validation
 # ============================================================================
 
 - name: Validate workload type
   ansible.builtin.assert:
     that:
       - workload_type in test_configs.keys()
       - workload_type != 'embedding'
     fail_msg: "Invalid workload_type: {{ workload_type }}. Must be a non-embedding workload from: {{ test_configs.keys() | list | select('ne', 'embedding') | sort | join(', ') }}"
+
+- name: Get workload and core configuration
+  ansible.builtin.set_fact:
+    workload_cfg: "{{ test_configs[workload_type] }}"
+    core_cfg: "{{ core_configuration }}"
+    container_cfg: "{{ container_runtime }}"

As per coding guidelines, "`**`: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`
around lines 24 - 61, The workload validation currently runs after the task that
sets workload_cfg using test_configs[workload_type], causing a dict-key error
for invalid inputs; move the "Validate workload type" ansible.builtin.assert
task (the block that checks workload_type in test_configs.keys() and !=
'embedding') so it appears before the "Get workload and core configuration"
ansible.builtin.set_fact (which sets workload_cfg: "{{
test_configs[workload_type] }}"), ensuring the assert runs first and prevents
accessing test_configs with an invalid key.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`:
- Around line 24-61: The workload validation currently runs after the task that
sets workload_cfg using test_configs[workload_type], causing a dict-key error
for invalid inputs; move the "Validate workload type" ansible.builtin.assert
task (the block that checks workload_type in test_configs.keys() and !=
'embedding') so it appears before the "Get workload and core configuration"
ansible.builtin.set_fact (which sets workload_cfg: "{{
test_configs[workload_type] }}"), ensuring the assert runs first and prevents
accessing test_configs with an invalid key.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b227a5c-4c65-4688-9b9c-dd4444bc0673

📥 Commits

Reviewing files that changed from the base of the PR and between 770fc59 and 6841dfe.

📒 Files selected for processing (8)

automation/test-execution/ansible/ansible.md
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
automation/test-execution/ansible/roles/vllm_server/tasks/main.yml
automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml
docs/getting-started.md

✅ Files skipped from review due to trivial changes (3)

automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
automation/test-execution/ansible/ansible.md
automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml

🚧 Files skipped from review as they are similar to previous changes (4)

automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/roles/vllm_server/tasks/main.yml
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
docs/getting-started.md

maryamtahhan requested a review from jharriga April 3, 2026 16:00

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread automation/test-execution/ansible/llm-benchmark-concurrent-load.yml Outdated

maryamtahhan and others added 4 commits April 10, 2026 11:43

maryamtahhan force-pushed the container-image-env-var branch from 770fc59 to 6841dfe Compare April 10, 2026 10:45

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

maryamtahhan merged commit 6a75fa9 into redhat-et:main Apr 10, 2026
3 checks passed

maryamtahhan deleted the container-image-env-var branch April 10, 2026 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container image env var#94

Container image env var#94
maryamtahhan merged 4 commits intoredhat-et:mainfrom
maryamtahhan:container-image-env-var

maryamtahhan commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

jharriga commented Apr 3, 2026

Uh oh!

maryamtahhan commented Apr 6, 2026

Uh oh!

jharriga commented Apr 6, 2026 via email

Uh oh!

jharriga commented Apr 7, 2026

Uh oh!

jharriga commented Apr 7, 2026 •

edited

Loading

Uh oh!

jharriga commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maryamtahhan commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jharriga commented Apr 3, 2026

Uh oh!

maryamtahhan commented Apr 6, 2026

Uh oh!

jharriga commented Apr 6, 2026 via email

Uh oh!

jharriga commented Apr 7, 2026

Uh oh!

jharriga commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jharriga commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maryamtahhan commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

jharriga commented Apr 7, 2026 •

edited

Loading