Smoke tests by maryamtahhan · Pull Request #86 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-03-31T09:37:07Z

PR Summary: Add smoke tests

⚠️ This PR depends on #87 - Please merge #87 first to avoid conflicts.

Summary

This PR adds a comprehensive smoke test suite that validates Ansible playbooks, model configurations, and container settings before deployment. The tests run in < 5 seconds, require no infrastructure, and have already caught 2 critical configuration bugs!

What's Changed

New Test Suite (`automation/test-execution/ansible/tests/smoke/`)

1. Playbook Syntax Validation (test_playbook_syntax.py) - 12 tests

✅ YAML syntax validation for all playbooks
✅ Ansible syntax checking (ansible-playbook --syntax-check)
✅ Inventory file validation
✅ Role structure validation
✅ Filter plugin syntax checking

2. Model Matrix Validation (test_model_matrix.py) - 10 tests

✅ Model configuration completeness
✅ Workload compatibility checks
✅ Context length validation
✅ KV cache size validation
✅ Duplicate detection

3. Container Configuration (test_container_config.py) - 7 tests

✅ Container runtime availability
✅ Role existence validation
✅ Image accessibility checks

Infrastructure Updates

GitHub Workflow: Updated .github/workflows/unit-tests.yml
- Renamed to "Tests" (more accurate)
- Added separate smoke-tests job
- Runs on every PR and push to main
Pytest Configuration: Added pytest.ini
- Custom markers: @pytest.mark.smoke, @pytest.mark.unit, @pytest.mark.slow
- Standardized test discovery
Documentation:
- tests/README.md - comprehensive testing guide
- TESTING-ROADMAP.md - future testing phases
Test Markers: Added @pytest.mark.unit to all existing unit tests

🎉 Bugs Found Immediately!

The smoke tests caught 2 real configuration bugs on the first run:

Bug #1: OPT Models - Context Length Mismatch ⚠️

opt-125m: workload 'summarization' max_model_len (4096) > context_length (2048)
opt-1.3b: workload 'summarization' max_model_len (4096) > context_length (2048)

Impact: These models would fail at runtime when attempting to run the summarization workload.

Bug #2: Embedding Models - Missing Required Fields ⚠️

granite-embedding-english-r2: missing fields {'dimensions', 'max_sequence_length'}
granite-embedding-278m-multilingual: missing fields {'dimensions', 'max_sequence_length'}

Impact: Incomplete model definitions that could cause issues during test execution.

Test Results

Before This PR

Unit tests: 51 tests (no markers, manual file selection)
Smoke tests: 0 tests
CI validation: Linting only

After This PR

Unit tests: ✅ 51/51 passing (now with @pytest.mark.unit)
Smoke tests: ⚠️ 18/20 passing (2 expected failures = real bugs found!)
CI validation: Linting + syntax + configuration validation

Performance

# Fast smoke tests only (excludes slow container pulls)
pytest -m "smoke and not slow"

# Result: 18 tests in 4.69 seconds ⚡

How to Test Locally

cd automation/test-execution/ansible/tests

# Install dependencies (one-time)
pip install pytest ansible pyyaml

# Run all fast tests
pytest -m "smoke and not slow"

# Run all smoke tests (includes slow container tests)
pytest -m smoke

# Run specific test file
pytest smoke/test_model_matrix.py -v

# Run both unit and smoke tests
pytest -m "unit or smoke"

CI Integration

Tests run automatically on:

✅ Pull requests
✅ Pushes to main branch
✅ Manual workflow dispatch

Workflow: .github/workflows/unit-tests.yml

Job 1: unit-tests (runs unit tests)
Job 2: smoke-tests (runs smoke tests, depends on unit-tests)

Files Changed

New Files:

automation/test-execution/ansible/tests/
├── smoke/
│   ├── __init__.py
│   ├── test_playbook_syntax.py
│   ├── test_model_matrix.py
│   └── test_container_config.py
├── pytest.ini
└── README.md

TESTING-ROADMAP.md

Modified Files:

.github/workflows/unit-tests.yml  (renamed from "Unit Tests" to "Tests", added smoke-tests job)
automation/test-execution/ansible/tests/unit/test_cpu_utils.py  (added @pytest.mark.unit decorators)

Why This Matters

Before: Manual Validation

Configuration errors discovered at runtime ❌
Long feedback loop (deploy → run → fail → fix) ⏱️
No automated validation in CI ❌

After: Automated Validation

Configuration errors caught in seconds ✅
Fast feedback loop (commit → CI → instant results) ⚡
Runs on every PR automatically ✅
Prevents deployment of broken configs 🛡️

Future Work

Fix configuration issues found by smoke tests
Add integration tests for NUMA configurations
Add container integration tests (full startup with small models)
Add ansible-lint integration

Breaking Changes

None. This PR only adds new tests and doesn't modify any runtime behavior.

Checklist

Tests added and passing locally
Documentation added (README.md, TESTING-ROADMAP.md)
CI workflow updated
Pytest configuration added
Test markers applied to existing tests
No breaking changes

Demo: Smoke Tests in Action

$ cd automation/test-execution/ansible/tests
$ pytest -m "smoke and not slow" -v

============================= test session starts ==============================
collected 23 items / 3 deselected / 20 selected

smoke/test_container_config.py::TestContainerBasics::test_container_runtime_available PASSED
smoke/test_container_config.py::TestContainerConfiguration::test_vllm_server_role_exists PASSED
smoke/test_container_config.py::TestContainerConfiguration::test_benchmark_roles_exist PASSED
smoke/test_container_config.py::TestContainerNetworking::test_default_ports_defined PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_llm_matrix_has_required_structure PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_all_models_have_required_fields PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_workloads_match_model_requirements FAILED
  ❌ opt-125m: workload 'summarization' max_model_len (4096) > context_length (2048)
  ❌ opt-1.3b: workload 'summarization' max_model_len (4096) > context_length (2048)
smoke/test_model_matrix.py::TestModelMatrix::test_all_workloads_have_required_fields PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_kv_cache_sizes_defined PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_test_suites_are_valid PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_gated_models_marked_correctly PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_embedding_matrix_structure FAILED
  ❌ granite-embedding-english-r2: missing fields {'dimensions', 'max_sequence_length'}
  ❌ granite-embedding-278m-multilingual: missing fields {'dimensions', 'max_sequence_length'}
smoke/test_model_matrix.py::TestModelMatrix::test_no_duplicate_model_names PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_no_duplicate_workload_names PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_all_playbooks_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_ansible_syntax_check PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_inventory_files_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_role_defaults_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestRoleStructure::test_all_roles_have_required_structure PASSED
smoke/test_playbook_syntax.py::TestFilterPlugins::test_filter_plugins_importable PASSED

=================== 2 failed, 18 passed, 3 deselected in 4.69s ===================

🤖 Generated with Claude Code

…rdening This commit improves the existing Ansible playbook infrastructure for vLLM CPU performance evaluation with enhanced AWX compatibility, security hardening, and comprehensive testing. - Fix type normalization to handle AnsibleUnsafeText from AWX - Fix allocated_nodes to return integers instead of strings - Handle empty strings and Jinja2 None conversions properly - Simplify node eligibility checking and allocation logic - Improve error messages for better validation feedback - Add no_log: true to all tasks handling HF_TOKEN - Prevent token exposure in container start operations - Secure environment variable handling in AWX jobs - Add comprehensive unit tests for cpu_utils filter plugin (598 lines) - Test coverage for: CPU range conversion, NUMA extraction, multi-NUMA allocation, OMP binding, and real-world scenarios - Support for both pytest and standalone execution - Add collections/requirements.yml for Ansible collection dependencies - Better parameter validation for AWX jobs in concurrent load testing - AWX detection for result path handling - Improved NUMA topology detection in core sweep - Enhanced result path consistency in main benchmark - Better workload configuration handling - Simplify prerequisites section - Update examples with current best practices - Clearer workflow documentation Files changed: 13 files, 772 insertions(+), 323 deletions(-) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Remove undefined 'short_codegen' workload from validation lists to prevent KeyError during test execution. Add CPU allocation validation to detect under-allocation early with clear error messages. Fix pytest.raises fallback to properly suppress exceptions when pytest is unavailable. Pin exact collection versions for reproducible AWX/CI runs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Fix inconsistent output filename reference in ansible.md (benchmarks.html → benchmarks.json) - Improve requested_cores validation to handle non-numeric input safely - Document requirements files usage (AWX/production vs CLI/development) - Fix hardcoded log_collection_dest to use centralized local_results_base - Remove obsolete vLLM configuration fields from test metadata docs - Improve node capacity validation with sorting and per-node checks in cpu_utils.py - Refactor duplicate AWX_JOB_ID lookup in llm-core-sweep-auto.yml - Add NUMA node id type coercion for consistent integer types Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

The previous commit changed NUMA node ids from strings to integers, but the selectattr filters were still trying to match them as strings, causing "No first item, sequence was empty" errors. Remove the | string filter from selectattr comparisons to match integer ids correctly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

- Fix allocate_with_fixed_tp to filter nodes by capacity (cpu_utils.py) - Update Results summary to use local_results_base (llm-benchmark-auto.yml) - Implement core_sweep_counts parameter handling (llm-benchmark-concurrent-load.yml) - Normalize core_sweep_counts to requested_cores for single-element lists - Reject multi-element lists with helpful error directing to llm-core-sweep-auto.yml - Pass effective_requested_cores to all 3 phases - Restore model configuration facts in start-llm.yml for metadata collection - Set model_kv_cache, model_dtype, model_max_len, vllm_caching_mode - Fixes compatibility with get-vllm-config-from-dut.yml assertions - Add regression tests for serialized TP inputs (test_cpu_utils.py) - Test empty string '' behaves like auto-TP - Test string 'None' behaves like auto-TP Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

…ainer start Addresses two critical issues in vLLM configuration: 1. Per-model overrides (model_kv_cache_space, model_dtype) now properly override workload defaults instead of being ignored 2. Added preflight validation of --max-model-len against workload requirements (ISL + OSL) to fail early with clear error messages Both fixes ensure configuration errors are caught early and per-model settings take precedence over workload fallbacks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

coderabbitai · 2026-03-31T09:37:26Z

Warning

Rate limit exceeded

@maryamtahhan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e159c33-abc0-4c7b-b6e8-a548c5a71a11

📥 Commits

Reviewing files that changed from the base of the PR and between fce2e05 and 3e4fa16.

📒 Files selected for processing (13)

.github/workflows/unit-tests.yml
README.md
automation/test-execution/ansible/tests/README.md
automation/test-execution/ansible/tests/pytest.ini
automation/test-execution/ansible/tests/smoke/__init__.py
automation/test-execution/ansible/tests/smoke/test_container_config.py
automation/test-execution/ansible/tests/smoke/test_model_matrix.py
automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py
automation/test-execution/ansible/tests/unit/test_cpu_utils.py
docs/methodology/overview.md
models/embedding-models/model-matrix.yaml
models/llm-models/model-matrix.yaml
models/models.md

📝 Walkthrough

Walkthrough

This pull request implements AWX-aware result path handling across multiple Ansible playbooks, refactors NUMA node core allocation logic with improved type normalization, expands test coverage with comprehensive unit and smoke tests including pytest configuration, and updates documentation while removing vLLM configuration collection from benchmark execution.

Changes

Cohort / File(s)	Summary
CI/CD Workflow `.github/workflows/unit-tests.yml`	Renamed workflow from "Unit Tests" to "Tests", added `workflow_dispatch` for manual execution, split test jobs into `unit-tests` (running `unit/` directory) and new `smoke-tests` job (running `smoke/` directory with marker restrictions).
Test Infrastructure `automation/test-execution/ansible/tests/pytest.ini`, `automation/test-execution/ansible/tests/smoke/__init__.py`	Added pytest configuration with marker definitions (`smoke`, `integration`, `slow`, `unit`), discovery patterns, and filter warnings; added smoke package docstring.
Smoke Tests `automation/test-execution/ansible/tests/smoke/test_container_config.py`, `automation/test-execution/ansible/tests/smoke/test_model_matrix.py`, `automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py`	Added three new smoke test modules validating container runtime availability/configuration, model matrix YAML structure and field requirements, and Ansible playbook/role YAML syntax and structure.
Unit Tests `automation/test-execution/ansible/tests/unit/test_cpu_utils.py`	Added comprehensive unit test module (634 lines) covering filter parsing/formatting, NUMA node allocation, tensor-parallel selection, and multi-NUMA scenarios.
Core Logic - CPU Allocation `automation/test-execution/ansible/filter_plugins/cpu_utils.py`	Refactored NUMA node handling with type normalization (`id`→`str`, `physical_cores`→`int`); reworked `allocate_with_auto_tp` and `allocate_with_fixed_tp` to sort nodes by capacity and validate before allocation; improved error messaging; added empty-string/`'None'` handling for `requested_tp`.
Playbook AWX Integration `automation/test-execution/ansible/llm-benchmark-auto.yml`, `automation/test-execution/ansible/llm-benchmark.yml`, `automation/test-execution/ansible/llm-core-sweep-auto.yml`	Added AWX job detection via `AWX_JOB_ID`, dynamic `local_results_base` switching to `HOME/benchmark-results` under AWX (else `playbook_dir/../../../results/llm`); removed vLLM configuration collection tasks; updated all result path references to use `hostvars['localhost']['local_results_base']`.
Playbook Core Config `automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`	Added AWX detection and `is_awx_job` fact; enforced non-empty core configuration validation; normalized `core_sweep_counts` input to single `effective_requested_cores`; restricted concurrent load to single-entry sweeps; updated results directory path for AWX execution.
Role Tasks - Core Allocation `automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml`, `automation/test-execution/ansible/roles/common/tasks/detect-numa-topology.yml`	Unified tensor-parallel handling by removing conditional branches in `allocate-cores-from-count.yml`; in `detect-numa-topology.yml`, changed NUMA node `id` to integer storage, refactored backward-compatibility field derivation via `selectattr` lookups, and simplified `physical_cores` casting.
Role Tasks - Server Configuration `automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml`, `automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`	Added `no_log: true` to environment variable and container start tasks in embedding role to suppress `HF_TOKEN`; in LLM role, removed `model-matrix.yaml` loading, consolidated dtype/KV cache handling into effective-value facts, replaced per-model derivation with conditional dtype override step, and added metadata population for later collection.
Role Tasks - GuideLLM `automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`	Added rate normalization via `_rate_value` and `_rate_list` intermediate variables; renamed timeout variable from `guidellm_wait_timeout` to `guidellm_wait_timeout_seconds`; refactored log capture to stream `podman logs` directly to file with `test_run_id` naming; changed `podman wait` failure detection to check any non-zero exit code.
Documentation `automation/test-execution/ansible/README.md`, `automation/test-execution/ansible/ansible.md`, `automation/test-execution/ansible/requirements.yml`, `automation/test-execution/ansible/tests/README.md`	Updated prerequisites/setup sections; removed vLLM metadata field references; clarified requirements.yml for CLI vs. production (referencing `collections/requirements.yml` for pinned versions); added comprehensive test documentation with marker usage and troubleshooting.
Production Dependencies `collections/requirements.yml`	Added new requirements file with pinned Ansible Galaxy collection versions: `containers.podman` `1.9.4` and `ansible.posix` `1.5.4` for AWX/production deployments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'Smoke tests' is a generic label that does not clearly describe the main changes in the changeset.	Revise the title to be more specific about the primary objective, such as 'Add smoke tests for configuration and playbook validation' to provide better context about what changes are included.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 96.59% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml (1)

18-27: ⚠️ Potential issue | 🟠 Major

Exclude string 'None' from validation to prevent premature int conversion failure.

The guards at lines 24-27 don't exclude the string 'None', which commonly comes from CLI arguments. This allows string 'None' to reach line 21's | int filter, causing an unnecessary failure before the Python function's auto-TP logic can handle it. Add explicit string handling to the when conditions.

Suggested fix

 - name: Validate requested_tensor_parallel if provided
   ansible.builtin.assert:
     that:
       - requested_tensor_parallel | int in [1, 2, 4, 8]
     fail_msg: "Invalid tensor_parallel: {{ requested_tensor_parallel }}. Valid values: 1, 2, 4, 8"
   when:
     - requested_tensor_parallel is defined
     - requested_tensor_parallel != omit
-    - requested_tensor_parallel != None
-    - requested_tensor_parallel != ""
+    - requested_tensor_parallel is not none
+    - (requested_tensor_parallel | string | trim | lower) not in ["", "none"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml`
around lines 18 - 27, The assert task validating requested_tensor_parallel
(ansible.builtin.assert) can receive the literal string 'None' from CLI and fail
on the `| int` conversion prematurely; update the task's when guards for the
assert (the block referencing requested_tensor_parallel) to explicitly exclude
the string 'None' (e.g., add a condition like requested_tensor_parallel !=
'None') so that string 'None' doesn't reach the `requested_tensor_parallel |
int` check and allows the downstream auto-TP logic to run.

🧹 Nitpick comments (6)

automation/test-execution/ansible/requirements.yml (1)
7-10: Align dev collection versions to AWX pins or add CI parity check.

Flexible ranges (>=1.9.0,<2.0.0 for containers.podman, >=1.4.0,<1.6.0 for ansible.posix) allow versions to drift from the exact AWX pins (1.9.4 and 1.5.4 respectively). This creates a risk of "works locally, fails in AWX" regressions. Either lock dev ranges to match AWX pins or add a CI check to prevent parity divergence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/requirements.yml` around lines 7 - 10,
Update the flexible version ranges in requirements.yml so dev collection
versions match AWX pins or add a CI parity check: either change the
containers.podman and ansible.posix entries to exact pins 1.9.4 and 1.5.4
respectively (replace >=1.9.0,<2.0.0 and >=1.4.0,<1.6.0) or add a CI job that
compares this requirements.yml to the canonical AWX pin file
(collections/requirements.yml) and fails on divergence; reference the collection
names containers.podman and ansible.posix and the AWX pin file when implementing
the change.
.github/workflows/unit-tests.yml (1)
26-30: Consider using a pinned test requirements file for reproducible CI runs.

Lines 29 and 58 both install floating versions (pytest ansible pyyaml), which can cause non-deterministic failures over time. Create a shared pinned requirements-test.txt and use it in both the unit-tests and smoke-tests jobs for better maintainability and reproducibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yml around lines 26 - 30, Replace the floating
pip installs with a pinned requirements file: add a committed
requirements-test.txt listing exact versions for pytest, ansible, pyyaml (and
any other test deps), then update the workflow "Install dependencies" steps (the
ones currently running "pip install pytest ansible pyyaml") in both the
unit-tests and smoke-tests jobs to use "pip install -r requirements-test.txt" so
CI uses reproducible, pinned test dependencies.
automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py (1)
28-28: Remove unnecessary f-string prefixes.

Multiple assertion messages use an f-string prefix without placeholders in the first part: f"Message:\n" + "\n".join(errors). The f prefix is unnecessary since there are no interpolations before the concatenation.
💡 Proposed fix pattern (apply to all similar lines)
-        assert not errors, f"YAML validation errors:\n" + "\n".join(errors)
+        assert not errors, "YAML validation errors:\n" + "\n".join(errors)
Or use a single f-string:
-        assert not errors, f"YAML validation errors:\n" + "\n".join(errors)
+        assert not errors, f"YAML validation errors:\n{chr(10).join(errors)}"
Also applies to: 60-60, 91-91, 110-110, 137-137, 165-165
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py` at
line 28, Remove the unnecessary f-string prefixes in the assertion messages in
test_playbook_syntax (the assertions that use f"YAML validation errors:\n" +
"\n".join(errors) and similar patterns); either drop the leading "f" from the
first literal (making it a normal string concatenated with "\n".join(...)) or
combine into a single f-string that interpolates the join (e.g., f"YAML
validation errors:\n{'\n'.join(errors)}"). Update all occurrences matching that
pattern (the assertion lines that build messages by concatenating a literal with
"\n".join(...)) so there are no unused f-prefixes.
automation/test-execution/ansible/tests/smoke/test_model_matrix.py (1)
61-61: Remove unnecessary f-string prefixes (same pattern as other test files).

Same issue as test_playbook_syntax.py - the f-string prefix adds no value when there are no placeholders before the concatenation.

Also applies to: 88-88, 120-120, 136-136, 154-154, 177-177, 193-193
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py` at line
61, Several assertions in test_model_matrix.py use an unnecessary f-string
prefix before a plain string that's concatenated with "\n".join(errors) (e.g.,
the assertion that starts with assert not errors, f"Model validation errors:\n"
+ "\n".join(errors)); remove the leading f from those string literals so they
become regular strings (e.g., "Model validation errors:\n" + "\n".join(errors)).
Apply the same change to the other similar assertions in this file that use the
f-prefix at lines referenced (the patterns containing "Model validation errors"
+ "\n".join(errors) and the other error message strings) so they match the style
used in other test files.
automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml (1)
302-307: Verify model_dtype fallback behavior when no --dtype argument is present.

If vllm_args_merged contains no --dtype=* argument and model_dtype was not explicitly set upstream, the default('--dtype=auto') ensures a safe fallback. However, this also means model_dtype in metadata may show auto even when vLLM internally selects a specific dtype. Consider documenting this behavior if downstream tooling expects the actual runtime dtype.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`
around lines 302 - 307, The current set_fact sets model_dtype from
vllm_args_merged with default('--dtype=auto'), which can misrepresent the actual
runtime dtype when vLLM chooses dtype automatically; update the task that sets
model_dtype (and related facts) so that it does not blindly record "auto" as the
dtype—either set model_dtype to an empty/nullable value when no --dtype is
present or add a separate fact like model_dtype_note/runtime_dtype_hint that
records "auto (runtime-selected)" so downstream tooling knows this is a
fallback, and keep references to vllm_args_merged and model_dtype in the updated
logic and documentation.
automation/test-execution/ansible/tests/smoke/test_container_config.py (1)

181-186: Incomplete port structure validation.

The test loads and validates that endpoints.yml exists and is non-empty, but the comment indicates port structure validation is pending. Consider adding the port checks or removing the placeholder comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/ansible.md`:
- Around line 66-69: Update the Quick Start in ansible.md to instruct users to
install Ansible collections before running setup-platform.yml: add a step after
the Ansible install note that tells the user to change into the
automation/test-execution/ansible folder on the control machine and run
ansible-galaxy collection install -r requirements.yml so the collections listed
in requirements.yml are present prior to running setup-platform.yml.

In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 59-73: Update the "Validate required parameters" task's fail_msg
to reflect the new single-entry behavior: mention that core_sweep_counts must be
a single-element list (e.g., core_sweep_counts=[16]) or recommend using
requested_cores=<N>, and remove the example suggesting multiple values (e.g.,
[16,32,64]); reference the variables requested_cores and core_sweep_counts in
the message so users know which inputs are accepted under the new validation in
the Validate required parameters assertion.

In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`:
- Around line 239-253: The shell task "Stream GuideLLM container logs directly
to file" is redirecting podman output to an unquoted filename built from
workload_type, core_cfg.name and test_run_id which can break if those variables
contain spaces or metacharacters; update the ansible.builtin.shell cmd to quote
the redirection target (e.g. use the Jinja2 quote filter or wrap the generated
path in single quotes) so the >/2>&1 target is treated as a single literal path,
and ensure the subsequent ansible.builtin.fetch src uses the identical
quoted/generated filename so Fetch GuideLLM logs to controller finds the file.

In `@automation/test-execution/ansible/tests/smoke/test_container_config.py`:
- Around line 108-116: The test currently checks only for the Docker-specific
"unable to find image" string in result.stderr before skipping; update the check
in the test_container_config.py test to broaden the stderr matching to include
Podman variants (e.g., "image not known", "no such image", "not found") or use a
regex/any-of-substrings approach against result.stderr.lower() before calling
pytest.skip, so the failure-to-find-image logic around result and
pytest.skip("vLLM image not available locally") correctly handles both Docker
and Podman outputs.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py`:
- Around line 179-193: The test test_embedding_matrix_structure asserts each
entry in embedding_matrix["matrix"]["embedding_models"] contains the fields in
required_fields (name, full_name, dimensions, max_sequence_length); currently
the model definitions lack dimensions and max_sequence_length. Fix by adding
those two keys to every embedding model definition in the embedding model matrix
(embedding_models) with appropriate integer values (e.g., dimensions: <embedding
vector size>, max_sequence_length: <token limit>) or, if those fields are not
applicable, remove them from required_fields in test_embedding_matrix_structure
to match the actual model schema.

In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py`:
- Around line 596-602: The test runner currently calls pytest.main([...]) but
does not propagate its exit code, so update the __main__ block to pass
pytest.main(...) into sys.exit so failures yield non-zero process exit;
specifically, in the if HAS_PYTEST branch replace the standalone call to
pytest.main([__file__, "-v"]) with sys.exit(pytest.main([__file__, "-v"]))
(ensure sys is imported and that the change is made inside the existing if
__name__ == "__main__" / if HAS_PYTEST block referencing HAS_PYTEST,
pytest.main, and sys.exit).
- Around line 20-37: The fallback pytest shim defines class pytest with nested
raises but misses the mark attribute used by decorators like `@pytest.mark.unit`;
add a mark object to the pytest shim (e.g., add an attribute named mark on the
pytest class or module that exposes a unit attribute usable as a decorator) so
imports that reference `@pytest.mark.unit` do not raise AttributeError; ensure the
unit attribute is a callable/decorator that returns the original function
(no-op) and keep the existing raises implementation (refer to the pytest class
and its nested raises in the current shim).

---

Outside diff comments:
In
`@automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml`:
- Around line 18-27: The assert task validating requested_tensor_parallel
(ansible.builtin.assert) can receive the literal string 'None' from CLI and fail
on the `| int` conversion prematurely; update the task's when guards for the
assert (the block referencing requested_tensor_parallel) to explicitly exclude
the string 'None' (e.g., add a condition like requested_tensor_parallel !=
'None') so that string 'None' doesn't reach the `requested_tensor_parallel |
int` check and allows the downstream auto-TP logic to run.

---

Nitpick comments:
In @.github/workflows/unit-tests.yml:
- Around line 26-30: Replace the floating pip installs with a pinned
requirements file: add a committed requirements-test.txt listing exact versions
for pytest, ansible, pyyaml (and any other test deps), then update the workflow
"Install dependencies" steps (the ones currently running "pip install pytest
ansible pyyaml") in both the unit-tests and smoke-tests jobs to use "pip install
-r requirements-test.txt" so CI uses reproducible, pinned test dependencies.

In `@automation/test-execution/ansible/requirements.yml`:
- Around line 7-10: Update the flexible version ranges in requirements.yml so
dev collection versions match AWX pins or add a CI parity check: either change
the containers.podman and ansible.posix entries to exact pins 1.9.4 and 1.5.4
respectively (replace >=1.9.0,<2.0.0 and >=1.4.0,<1.6.0) or add a CI job that
compares this requirements.yml to the canonical AWX pin file
(collections/requirements.yml) and fails on divergence; reference the collection
names containers.podman and ansible.posix and the AWX pin file when implementing
the change.

In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`:
- Around line 302-307: The current set_fact sets model_dtype from
vllm_args_merged with default('--dtype=auto'), which can misrepresent the actual
runtime dtype when vLLM chooses dtype automatically; update the task that sets
model_dtype (and related facts) so that it does not blindly record "auto" as the
dtype—either set model_dtype to an empty/nullable value when no --dtype is
present or add a separate fact like model_dtype_note/runtime_dtype_hint that
records "auto (runtime-selected)" so downstream tooling knows this is a
fallback, and keep references to vllm_args_merged and model_dtype in the updated
logic and documentation.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py`:
- Line 61: Several assertions in test_model_matrix.py use an unnecessary
f-string prefix before a plain string that's concatenated with "\n".join(errors)
(e.g., the assertion that starts with assert not errors, f"Model validation
errors:\n" + "\n".join(errors)); remove the leading f from those string literals
so they become regular strings (e.g., "Model validation errors:\n" +
"\n".join(errors)). Apply the same change to the other similar assertions in
this file that use the f-prefix at lines referenced (the patterns containing
"Model validation errors" + "\n".join(errors) and the other error message
strings) so they match the style used in other test files.

In `@automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py`:
- Line 28: Remove the unnecessary f-string prefixes in the assertion messages in
test_playbook_syntax (the assertions that use f"YAML validation errors:\n" +
"\n".join(errors) and similar patterns); either drop the leading "f" from the
first literal (making it a normal string concatenated with "\n".join(...)) or
combine into a single f-string that interpolates the join (e.g., f"YAML
validation errors:\n{'\n'.join(errors)}"). Update all occurrences matching that
pattern (the assertion lines that build messages by concatenating a literal with
"\n".join(...)) so there are no unused f-prefixes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15043db5-287d-468d-b6b2-d6948c171ad5

📥 Commits

Reviewing files that changed from the base of the PR and between b4126fb and fce2e05.

📒 Files selected for processing (22)

.github/workflows/unit-tests.yml
automation/test-execution/ansible/README.md
automation/test-execution/ansible/ansible.md
automation/test-execution/ansible/filter_plugins/cpu_utils.py
automation/test-execution/ansible/llm-benchmark-auto.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/llm-benchmark.yml
automation/test-execution/ansible/llm-core-sweep-auto.yml
automation/test-execution/ansible/requirements.yml
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml
automation/test-execution/ansible/roles/common/tasks/detect-numa-topology.yml
automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml
automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml
automation/test-execution/ansible/tests/README.md
automation/test-execution/ansible/tests/pytest.ini
automation/test-execution/ansible/tests/smoke/__init__.py
automation/test-execution/ansible/tests/smoke/test_container_config.py
automation/test-execution/ansible/tests/smoke/test_model_matrix.py
automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py
automation/test-execution/ansible/tests/unit/test_cpu_utils.py
collections/requirements.yml

💤 Files with no reviewable changes (1)

automation/test-execution/ansible/README.md

coderabbitai · 2026-03-31T09:51:18Z

automation/test-execution/ansible/ansible.md

+- **Option 2 (setup-platform.yml)**: Automatically installs Podman, Python 3,
+  and all performance tools on **DUT and Load Generator hosts**. Your Ansible
+  control machine only needs Ansible itself and the collections from
+  requirements.yml.


⚠️ Potential issue | 🟡 Minor

Add the collection install command to Quick Start.

These notes now say the control machine needs the collections from requirements.yml, but the setup steps never show how to install them. A first-time user can follow this page exactly and still hit missing collection errors on the first setup-platform.yml run.

Suggested addition near the Ansible install step

cd automation/test-execution/ansible ansible-galaxy collection install -r requirements.yml

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/ansible.md` around lines 66 - 69, Update the Quick Start in ansible.md to instruct users to install Ansible collections before running setup-platform.yml: add a step after the Ansible install note that tells the user to change into the automation/test-execution/ansible folder on the control machine and run ansible-galaxy collection install -r requirements.yml so the collections listed in requirements.yml are present prior to running setup-platform.yml.

coderabbitai · 2026-03-31T09:51:19Z

automation/test-execution/ansible/llm-benchmark-concurrent-load.yml

+    - name: Validate required parameters
+      ansible.builtin.assert:
+        that:
+          - >-
+            (requested_cores | default(0) | int > 0) or
+            (core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)
+        fail_msg: |
+          Missing required parameter: requested_cores or core_sweep_counts
+          Please provide either:
+            - Single core count: -e "requested_cores=<8|16|32|64|...>"
+            - Core sweep: -e "core_sweep_counts=[16,32,64]"
+
+          The concurrent load test requires at least one core configuration to test.
+          This playbook will run 3 phases with the specified core configuration(s).
+


⚠️ Potential issue | 🟡 Minor

Make the missing-parameter help text match the new single-entry behavior.

This message still tells users to pass core_sweep_counts=[16,32,64], but Lines 78-90 now reject more than one value. That makes the primary remediation path fail immediately.

Suggested fix

- - Core sweep: -e "core_sweep_counts=[16,32,64]" + - Single-entry core list: -e "core_sweep_counts=[16]" @@ - This playbook will run 3 phases with the specified core configuration(s). + This playbook will run 3 phases with the specified core configuration.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Validate required parameters

ansible.builtin.assert:

that:

- >-

(requested_cores | default(0) | int > 0) or

(core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)

fail_msg: |

Missing required parameter: requested_cores or core_sweep_counts

Please provide either:

- Single core count: -e "requested_cores=<8|16|32|64|...>"

- Core sweep: -e "core_sweep_counts=[16,32,64]"

The concurrent load test requires at least one core configuration to test.

This playbook will run 3 phases with the specified core configuration(s).

- name: Validate required parameters

ansible.builtin.assert:

that:

- >-

(requested_cores | default(0) | int > 0) or

(core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)

fail_msg: |

Missing required parameter: requested_cores or core_sweep_counts

Please provide either:

- Single core count: -e "requested_cores=<8|16|32|64|...>"

- Single-entry core list: -e "core_sweep_counts=[16]"

The concurrent load test requires at least one core configuration to test.

This playbook will run 3 phases with the specified core configuration.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around lines 59 - 73, Update the "Validate required parameters" task's fail_msg to reflect the new single-entry behavior: mention that core_sweep_counts must be a single-element list (e.g., core_sweep_counts=[16]) or recommend using requested_cores=<N>, and remove the example suggesting multiple values (e.g., [16,32,64]); reference the variables requested_cores and core_sweep_counts in the message so users know which inputs are accepted under the new validation in the Validate required parameters assertion.

coderabbitai · 2026-03-31T09:51:19Z

automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml

+- name: Stream GuideLLM container logs directly to file
  ansible.builtin.shell:
-    cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} 2>&1"
+    cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"
  args:
    executable: /bin/bash
-  register: guidellm_full_logs
  changed_when: false
  failed_when: false
  when:
    - use_guidellm_container | bool
    - container_exit_code.stdout is defined

- name: Save GuideLLM logs to temporary file on remote
-  ansible.builtin.copy:
-    content: "{{ guidellm_full_logs.stdout }}"
-    dest: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
-    mode: "0644"
-  when:
-    - use_guidellm_container | bool
-    - guidellm_full_logs is defined
-    - guidellm_full_logs.stdout is defined
-
 - name: Fetch GuideLLM logs to controller
  ansible.builtin.fetch:
-    src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
+    src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"
    dest: "{{ results_path }}/guidellm.log"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd automation/test-execution/ansible/roles/benchmark_guidellm/tasks && cat -n main.yml | sed -n '230,260p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1271

🏁 Script executed:

# Check how workload_type, core_cfg.name, and test_run_id are defined/used rg -n 'workload_type|core_cfg\.name|test_run_id' automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml | head -30

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1466

🏁 Script executed:

# Check the broader context to understand variable sources rg -n 'set_fact|register|vars:' automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml | head -20

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 729

Quote the redirected log filename to prevent shell interpretation of special characters.

The container name is quoted on line 241, but the output path is not. Since core_cfg.name, workload_type, and test_run_id are configuration-derived variables, they may contain spaces or shell metacharacters that would break the redirect or cause unintended file creation.

Suggested fix

- name: Stream GuideLLM container logs directly to file ansible.builtin.shell: - cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1" + cmd: >- + podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} + > {{ ('/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log') | quote }} 2>&1 @@ - name: Fetch GuideLLM logs to controller ansible.builtin.fetch: - src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log" + src: "{{ '/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log' }}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Stream GuideLLM container logs directly to file

ansible.builtin.shell:

cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} 2>&1"

cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"

args:

executable: /bin/bash

register: guidellm_full_logs

changed_when: false

failed_when: false

when:

- use_guidellm_container | bool

- container_exit_code.stdout is defined

- name: Save GuideLLM logs to temporary file on remote

ansible.builtin.copy:

content: "{{ guidellm_full_logs.stdout }}"

dest: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"

mode: "0644"

when:

- use_guidellm_container | bool

- guidellm_full_logs is defined

- guidellm_full_logs.stdout is defined

- name: Fetch GuideLLM logs to controller

ansible.builtin.fetch:

src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"

src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"

dest: "{{ results_path }}/guidellm.log"

- name: Stream GuideLLM container logs directly to file

ansible.builtin.shell:

cmd: >-

podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }}

> {{ ('/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log') | quote }} 2>&1

args:

executable: /bin/bash

changed_when: false

failed_when: false

when:

- use_guidellm_container | bool

- container_exit_code.stdout is defined

- name: Fetch GuideLLM logs to controller

ansible.builtin.fetch:

src: "{{ '/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log' }}"

dest: "{{ results_path }}/guidellm.log"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml` around lines 239 - 253, The shell task "Stream GuideLLM container logs directly to file" is redirecting podman output to an unquoted filename built from workload_type, core_cfg.name and test_run_id which can break if those variables contain spaces or metacharacters; update the ansible.builtin.shell cmd to quote the redirection target (e.g. use the Jinja2 quote filter or wrap the generated path in single quotes) so the >/2>&1 target is treated as a single literal path, and ensure the subsequent ansible.builtin.fetch src uses the identical quoted/generated filename so Fetch GuideLLM logs to controller finds the file.

coderabbitai · 2026-03-31T09:51:19Z

automation/test-execution/ansible/tests/smoke/test_container_config.py

+        # If image doesn't exist, skip (don't auto-pull in tests)
+        if "unable to find image" in result.stderr.lower():
+            pytest.skip("vLLM image not available locally")
+
+        if result.returncode == 0:
+            assert (
+                "vllm serve" in result.stdout
+                or "usage:" in result.stdout.lower()
+            )


⚠️ Potential issue | 🟡 Minor

Docker-specific error message may not match Podman output.

The check for "unable to find image" is a Docker-specific error message. Podman typically uses different wording (e.g., "image not known" or "no such image"). This may cause the test to fail differently on Podman-only systems.

💡 Proposed fix for cross-runtime compatibility

# If image doesn't exist, skip (don't auto-pull in tests) - if "unable to find image" in result.stderr.lower(): + if any(msg in result.stderr.lower() for msg in [ + "unable to find image", # docker + "image not known", # podman + "no such image", # podman alternative + ]): pytest.skip("vLLM image not available locally")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# If image doesn't exist, skip (don't auto-pull in tests)

if "unable to find image" in result.stderr.lower():

pytest.skip("vLLM image not available locally")

if result.returncode == 0:

assert (

"vllm serve" in result.stdout

or "usage:" in result.stdout.lower()

)

# If image doesn't exist, skip (don't auto-pull in tests)

if any(msg in result.stderr.lower() for msg in [

"unable to find image", # docker

"image not known", # podman

"no such image", # podman alternative

]):

pytest.skip("vLLM image not available locally")

if result.returncode == 0:

assert (

"vllm serve" in result.stdout

or "usage:" in result.stdout.lower()

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/tests/smoke/test_container_config.py` around lines 108 - 116, The test currently checks only for the Docker-specific "unable to find image" string in result.stderr before skipping; update the check in the test_container_config.py test to broaden the stderr matching to include Podman variants (e.g., "image not known", "no such image", "not found") or use a regex/any-of-substrings approach against result.stderr.lower() before calling pytest.skip, so the failure-to-find-image logic around result and pytest.skip("vLLM image not available locally") correctly handles both Docker and Podman outputs.

automation/test-execution/ansible/tests/smoke/test_model_matrix.py

coderabbitai · 2026-03-31T09:51:19Z

automation/test-execution/ansible/tests/unit/test_cpu_utils.py

+    # Mock pytest.raises for fallback tests
+    class pytest:
+
+        class raises:
+
+            def __init__(self, exc):
+                self.exc = exc
+
+            def __enter__(self):
+                return self
+
+            def __exit__(self, exc_type, exc_value, traceback):
+                # Check if exception matches expected type
+                if exc_type is not None and issubclass(exc_type, self.exc):
+                    # Suppress the exception by returning True
+                    return True
+                # Let unexpected exceptions propagate
+                return False


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the file fd -t f "test_cpu_utils.py" --exec wc -l {} \;

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 204

🏁 Script executed:

# Read lines 15-50 to see the fallback shim and context cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '15,50p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1288

🏁 Script executed:

# Check for `@pytest.mark.unit` decorators in the file rg "@pytest\.mark\.unit" automation/test-execution/ansible/tests/unit/test_cpu_utils.py

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 233

🏁 Script executed:

# Find the exact lines where `@pytest.mark.unit` is used rg -n "@pytest\.mark\.unit" automation/test-execution/ansible/tests/unit/test_cpu_utils.py | head -5

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 181

🏁 Script executed:

# Show the context around the first occurrence of `@pytest.mark.unit` to confirm the issue cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '110,125p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 720

🏁 Script executed:

# Verify if there are any other uses of pytest.mark besides .unit rg -n "pytest\.mark\." automation/test-execution/ansible/tests/unit/test_cpu_utils.py

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 269

Add mark.unit to the fallback pytest shim to prevent import failure.

When pytest is not installed, the module cannot be imported because @pytest.mark.unit decorators start at line 113, but the fallback shim (lines 20–37) only defines raises, not mark. This causes AttributeError during module import, so the fallback runner never runs.

Suggested fix

except ImportError: HAS_PYTEST = False # Mock pytest.raises for fallback tests class pytest: + class mark: + `@staticmethod` + def unit(obj): + return obj class raises:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines 20 - 37, The fallback pytest shim defines class pytest with nested raises but misses the mark attribute used by decorators like `@pytest.mark.unit`; add a mark object to the pytest shim (e.g., add an attribute named mark on the pytest class or module that exposes a unit attribute usable as a decorator) so imports that reference `@pytest.mark.unit` do not raise AttributeError; ensure the unit attribute is a callable/decorator that returns the original function (no-op) and keep the existing raises implementation (refer to the pytest class and its nested raises in the current shim).

coderabbitai · 2026-03-31T09:51:19Z

automation/test-execution/ansible/tests/unit/test_cpu_utils.py

+if __name__ == "__main__":
+    # Run tests if pytest not available
+    import sys
+
+    if HAS_PYTEST:
+        pytest.main([__file__, "-v"])
+    else:


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the specific file and lines cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '596,602p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 270

🏁 Script executed:

# Get more context around these lines to understand the full picture cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '590,610p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 791

🌐 Web query:

pytest.main() return value exit code

💡 Result:

pytest.main returns an exit code as either an int or a pytest.ExitCode enum value (since pytest 5.0). These correspond to the standard pytest exit codes: - 0 / ExitCode.OK: All tests passed - 1 / ExitCode.TESTS_FAILED: Some tests failed - 2 / ExitCode.INTERRUPTED: Interrupted by user - 3 / ExitCode.INTERNAL_ERROR: Internal error - 4 / ExitCode.USAGE_ERROR: Command line usage error - 5 / ExitCode.NO_TESTS_COLLECTED: No tests collected Unlike command-line invocation, pytest.main does not raise SystemExit but returns the code directly for programmatic use. The enum values have integer values matching these codes and can be imported via from pytest import ExitCode.

Citations:

1: https://stackoverflow.com/questions/62648688/capture-exit-code-for-pytest-from-python

2: https://docs.pytest.org/en/4.6.x/how-to/usage.html

3: https://happytest.readthedocs.io/en/latest/usage/

4: https://github.com/pytest-dev/pytest/blob/main/doc/en/reference/exit-codes.rst

5: https://docs.pytest.org/en/stable/reference.html

6: https://happytest-apidoc.readthedocs.io/en/latest/_modules/pytest/

7: https://happytest-apidoc.readthedocs.io/en/latest/api/pytest/

Return pytest.main()'s exit status.

pytest.main() returns an exit code that should be passed to sys.exit(). Without it, python3 test_cpu_utils.py exits with status 0 even when tests fail, breaking CI/CD integration.

Suggested fix

if __name__ == "__main__": - # Run tests if pytest not available - import sys - if HAS_PYTEST: - pytest.main([__file__, "-v"]) + sys.exit(pytest.main([__file__, "-v"]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if __name__ == "__main__":

# Run tests if pytest not available

import sys

if HAS_PYTEST:

pytest.main([__file__, "-v"])

else:

if __name__ == "__main__":

# Run tests if pytest not available

import sys

if HAS_PYTEST:

sys.exit(pytest.main([__file__, "-v"]))

else:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines 596 - 602, The test runner currently calls pytest.main([...]) but does not propagate its exit code, so update the __main__ block to pass pytest.main(...) into sys.exit so failures yield non-zero process exit; specifically, in the if HAS_PYTEST branch replace the standalone call to pytest.main([__file__, "-v"]) with sys.exit(pytest.main([__file__, "-v"])) (ensure sys is imported and that the change is made inside the existing if __name__ == "__main__" / if HAS_PYTEST block referencing HAS_PYTEST, pytest.main, and sys.exit).

This commit fixes two configuration bugs found by smoke tests: Bug #1: Remove OPT models (opt-125m, opt-1.3b) - Context length (2048) incompatible with summarization workload (4096) - Legacy models causing validation failures - Reduced model count from 8 to 6 LLM models Changes: - Removed opt-125m and opt-1.3b from model-matrix.yaml - Updated README.md: 8 → 6 LLM models - Updated docs/methodology/overview.md: 8 → 6 LLM models - Removed OPT Family section from models/models.md - Removed all OPT entries from model tables and test scenarios - Removed "Decode-Heavy Models" section (was OPT-only) Bug #2: Add missing embedding model fields - granite-embedding-english-r2: added dimensions=384, max_sequence_length=512 - granite-embedding-278m-multilingual: added dimensions=768, max_sequence_length=512 These fixes ensure all model configurations pass validation tests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan and others added 7 commits March 31, 2026 09:11

docs: restore required pacakges instructions

be4dcda

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

maryamtahhan and others added 2 commits March 31, 2026 11:02

Add smoke tests for configuration validation

3e4fa16

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan force-pushed the smoke-tests branch from fce2e05 to 3e4fa16 Compare March 31, 2026 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smoke tests#86

Smoke tests#86
maryamtahhan wants to merge 9 commits intoredhat-et:mainfrom
maryamtahhan:smoke-tests

maryamtahhan commented Mar 31, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maryamtahhan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary: Add smoke tests

Summary

What's Changed

New Test Suite (automation/test-execution/ansible/tests/smoke/)

Infrastructure Updates

🎉 Bugs Found Immediately!

Bug #1: OPT Models - Context Length Mismatch ⚠️

Bug #2: Embedding Models - Missing Required Fields ⚠️

Test Results

Before This PR

After This PR

Performance

How to Test Locally

CI Integration

Files Changed

Why This Matters

Before: Manual Validation

After: Automated Validation

Future Work

Breaking Changes

Checklist

Demo: Smoke Tests in Action

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maryamtahhan commented Mar 31, 2026 •

edited

Loading

New Test Suite (`automation/test-execution/ansible/tests/smoke/`)

coderabbitai bot commented Mar 31, 2026 •

edited

Loading