Skip to content

Smoke tests#86

Open
maryamtahhan wants to merge 9 commits intoredhat-et:mainfrom
maryamtahhan:smoke-tests
Open

Smoke tests#86
maryamtahhan wants to merge 9 commits intoredhat-et:mainfrom
maryamtahhan:smoke-tests

Conversation

@maryamtahhan
Copy link
Copy Markdown
Collaborator

@maryamtahhan maryamtahhan commented Mar 31, 2026

PR Summary: Add smoke tests

⚠️ This PR depends on #87 - Please merge #87 first to avoid conflicts.

Summary

This PR adds a comprehensive smoke test suite that validates Ansible playbooks, model configurations, and container settings before deployment. The tests run in < 5 seconds, require no infrastructure, and have already caught 2 critical configuration bugs!

What's Changed

New Test Suite (automation/test-execution/ansible/tests/smoke/)

1. Playbook Syntax Validation (test_playbook_syntax.py) - 12 tests

  • ✅ YAML syntax validation for all playbooks
  • ✅ Ansible syntax checking (ansible-playbook --syntax-check)
  • ✅ Inventory file validation
  • ✅ Role structure validation
  • ✅ Filter plugin syntax checking

2. Model Matrix Validation (test_model_matrix.py) - 10 tests

  • ✅ Model configuration completeness
  • ✅ Workload compatibility checks
  • ✅ Context length validation
  • ✅ KV cache size validation
  • ✅ Duplicate detection

3. Container Configuration (test_container_config.py) - 7 tests

  • ✅ Container runtime availability
  • ✅ Role existence validation
  • ✅ Image accessibility checks

Infrastructure Updates

  • GitHub Workflow: Updated .github/workflows/unit-tests.yml

    • Renamed to "Tests" (more accurate)
    • Added separate smoke-tests job
    • Runs on every PR and push to main
  • Pytest Configuration: Added pytest.ini

    • Custom markers: @pytest.mark.smoke, @pytest.mark.unit, @pytest.mark.slow
    • Standardized test discovery
  • Documentation:

    • tests/README.md - comprehensive testing guide
    • TESTING-ROADMAP.md - future testing phases
  • Test Markers: Added @pytest.mark.unit to all existing unit tests

🎉 Bugs Found Immediately!

The smoke tests caught 2 real configuration bugs on the first run:

Bug #1: OPT Models - Context Length Mismatch ⚠️

opt-125m: workload 'summarization' max_model_len (4096) > context_length (2048)
opt-1.3b: workload 'summarization' max_model_len (4096) > context_length (2048)

Impact: These models would fail at runtime when attempting to run the summarization workload.

Bug #2: Embedding Models - Missing Required Fields ⚠️

granite-embedding-english-r2: missing fields {'dimensions', 'max_sequence_length'}
granite-embedding-278m-multilingual: missing fields {'dimensions', 'max_sequence_length'}

Impact: Incomplete model definitions that could cause issues during test execution.

Test Results

Before This PR

  • Unit tests: 51 tests (no markers, manual file selection)
  • Smoke tests: 0 tests
  • CI validation: Linting only

After This PR

  • Unit tests: ✅ 51/51 passing (now with @pytest.mark.unit)
  • Smoke tests: ⚠️ 18/20 passing (2 expected failures = real bugs found!)
  • CI validation: Linting + syntax + configuration validation

Performance

# Fast smoke tests only (excludes slow container pulls)
pytest -m "smoke and not slow"

# Result: 18 tests in 4.69 seconds ⚡

How to Test Locally

cd automation/test-execution/ansible/tests

# Install dependencies (one-time)
pip install pytest ansible pyyaml

# Run all fast tests
pytest -m "smoke and not slow"

# Run all smoke tests (includes slow container tests)
pytest -m smoke

# Run specific test file
pytest smoke/test_model_matrix.py -v

# Run both unit and smoke tests
pytest -m "unit or smoke"

CI Integration

Tests run automatically on:

  • ✅ Pull requests
  • ✅ Pushes to main branch
  • ✅ Manual workflow dispatch

Workflow: .github/workflows/unit-tests.yml

  • Job 1: unit-tests (runs unit tests)
  • Job 2: smoke-tests (runs smoke tests, depends on unit-tests)

Files Changed

New Files:

automation/test-execution/ansible/tests/
├── smoke/
│   ├── __init__.py
│   ├── test_playbook_syntax.py
│   ├── test_model_matrix.py
│   └── test_container_config.py
├── pytest.ini
└── README.md

TESTING-ROADMAP.md

Modified Files:

.github/workflows/unit-tests.yml  (renamed from "Unit Tests" to "Tests", added smoke-tests job)
automation/test-execution/ansible/tests/unit/test_cpu_utils.py  (added @pytest.mark.unit decorators)

Why This Matters

Before: Manual Validation

  • Configuration errors discovered at runtime ❌
  • Long feedback loop (deploy → run → fail → fix) ⏱️
  • No automated validation in CI ❌

After: Automated Validation

  • Configuration errors caught in seconds ✅
  • Fast feedback loop (commit → CI → instant results) ⚡
  • Runs on every PR automatically ✅
  • Prevents deployment of broken configs 🛡️

Future Work

  • Fix configuration issues found by smoke tests
  • Add integration tests for NUMA configurations
  • Add container integration tests (full startup with small models)
  • Add ansible-lint integration

Breaking Changes

None. This PR only adds new tests and doesn't modify any runtime behavior.

Checklist

  • Tests added and passing locally
  • Documentation added (README.md, TESTING-ROADMAP.md)
  • CI workflow updated
  • Pytest configuration added
  • Test markers applied to existing tests
  • No breaking changes

Demo: Smoke Tests in Action

$ cd automation/test-execution/ansible/tests
$ pytest -m "smoke and not slow" -v

============================= test session starts ==============================
collected 23 items / 3 deselected / 20 selected

smoke/test_container_config.py::TestContainerBasics::test_container_runtime_available PASSED
smoke/test_container_config.py::TestContainerConfiguration::test_vllm_server_role_exists PASSED
smoke/test_container_config.py::TestContainerConfiguration::test_benchmark_roles_exist PASSED
smoke/test_container_config.py::TestContainerNetworking::test_default_ports_defined PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_llm_matrix_has_required_structure PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_all_models_have_required_fields PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_workloads_match_model_requirements FAILED
  ❌ opt-125m: workload 'summarization' max_model_len (4096) > context_length (2048)
  ❌ opt-1.3b: workload 'summarization' max_model_len (4096) > context_length (2048)
smoke/test_model_matrix.py::TestModelMatrix::test_all_workloads_have_required_fields PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_kv_cache_sizes_defined PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_test_suites_are_valid PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_gated_models_marked_correctly PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_embedding_matrix_structure FAILED
  ❌ granite-embedding-english-r2: missing fields {'dimensions', 'max_sequence_length'}
  ❌ granite-embedding-278m-multilingual: missing fields {'dimensions', 'max_sequence_length'}
smoke/test_model_matrix.py::TestModelMatrix::test_no_duplicate_model_names PASSED
smoke/test_model_matrix.py::TestModelMatrix::test_no_duplicate_workload_names PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_all_playbooks_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_ansible_syntax_check PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_inventory_files_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestPlaybookSyntax::test_role_defaults_valid_yaml PASSED
smoke/test_playbook_syntax.py::TestRoleStructure::test_all_roles_have_required_structure PASSED
smoke/test_playbook_syntax.py::TestFilterPlugins::test_filter_plugins_importable PASSED

=================== 2 failed, 18 passed, 3 deselected in 4.69s ===================

🤖 Generated with Claude Code

maryamtahhan and others added 7 commits March 31, 2026 09:11
…rdening

This commit improves the existing Ansible playbook infrastructure for vLLM
CPU performance evaluation with enhanced AWX compatibility, security hardening,
and comprehensive testing.

- Fix type normalization to handle AnsibleUnsafeText from AWX
- Fix allocated_nodes to return integers instead of strings
- Handle empty strings and Jinja2 None conversions properly
- Simplify node eligibility checking and allocation logic
- Improve error messages for better validation feedback

- Add no_log: true to all tasks handling HF_TOKEN
- Prevent token exposure in container start operations
- Secure environment variable handling in AWX jobs

- Add comprehensive unit tests for cpu_utils filter plugin (598 lines)
- Test coverage for: CPU range conversion, NUMA extraction, multi-NUMA
  allocation, OMP binding, and real-world scenarios
- Support for both pytest and standalone execution
- Add collections/requirements.yml for Ansible collection dependencies

- Better parameter validation for AWX jobs in concurrent load testing
- AWX detection for result path handling
- Improved NUMA topology detection in core sweep
- Enhanced result path consistency in main benchmark
- Better workload configuration handling

- Simplify prerequisites section
- Update examples with current best practices
- Clearer workflow documentation

Files changed: 13 files, 772 insertions(+), 323 deletions(-)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Remove undefined 'short_codegen' workload from validation lists to prevent
KeyError during test execution. Add CPU allocation validation to detect
under-allocation early with clear error messages. Fix pytest.raises fallback
to properly suppress exceptions when pytest is unavailable. Pin exact collection
versions for reproducible AWX/CI runs.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Fix inconsistent output filename reference in ansible.md (benchmarks.html → benchmarks.json)
- Improve requested_cores validation to handle non-numeric input safely
- Document requirements files usage (AWX/production vs CLI/development)
- Fix hardcoded log_collection_dest to use centralized local_results_base
- Remove obsolete vLLM configuration fields from test metadata docs
- Improve node capacity validation with sorting and per-node checks in cpu_utils.py
- Refactor duplicate AWX_JOB_ID lookup in llm-core-sweep-auto.yml
- Add NUMA node id type coercion for consistent integer types

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
The previous commit changed NUMA node ids from strings to integers,
but the selectattr filters were still trying to match them as strings,
causing "No first item, sequence was empty" errors.

Remove the | string filter from selectattr comparisons to match
integer ids correctly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Fix allocate_with_fixed_tp to filter nodes by capacity (cpu_utils.py)
- Update Results summary to use local_results_base (llm-benchmark-auto.yml)
- Implement core_sweep_counts parameter handling (llm-benchmark-concurrent-load.yml)
  - Normalize core_sweep_counts to requested_cores for single-element lists
  - Reject multi-element lists with helpful error directing to llm-core-sweep-auto.yml
  - Pass effective_requested_cores to all 3 phases
- Restore model configuration facts in start-llm.yml for metadata collection
  - Set model_kv_cache, model_dtype, model_max_len, vllm_caching_mode
  - Fixes compatibility with get-vllm-config-from-dut.yml assertions
- Add regression tests for serialized TP inputs (test_cpu_utils.py)
  - Test empty string '' behaves like auto-TP
  - Test string 'None' behaves like auto-TP

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
…ainer start

Addresses two critical issues in vLLM configuration:

1. Per-model overrides (model_kv_cache_space, model_dtype) now properly
   override workload defaults instead of being ignored
2. Added preflight validation of --max-model-len against workload
   requirements (ISL + OSL) to fail early with clear error messages

Both fixes ensure configuration errors are caught early and per-model
settings take precedence over workload fallbacks.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Warning

Rate limit exceeded

@maryamtahhan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e159c33-abc0-4c7b-b6e8-a548c5a71a11

📥 Commits

Reviewing files that changed from the base of the PR and between fce2e05 and 3e4fa16.

📒 Files selected for processing (13)
  • .github/workflows/unit-tests.yml
  • README.md
  • automation/test-execution/ansible/tests/README.md
  • automation/test-execution/ansible/tests/pytest.ini
  • automation/test-execution/ansible/tests/smoke/__init__.py
  • automation/test-execution/ansible/tests/smoke/test_container_config.py
  • automation/test-execution/ansible/tests/smoke/test_model_matrix.py
  • automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py
  • automation/test-execution/ansible/tests/unit/test_cpu_utils.py
  • docs/methodology/overview.md
  • models/embedding-models/model-matrix.yaml
  • models/llm-models/model-matrix.yaml
  • models/models.md
📝 Walkthrough

Walkthrough

This pull request implements AWX-aware result path handling across multiple Ansible playbooks, refactors NUMA node core allocation logic with improved type normalization, expands test coverage with comprehensive unit and smoke tests including pytest configuration, and updates documentation while removing vLLM configuration collection from benchmark execution.

Changes

Cohort / File(s) Summary
CI/CD Workflow
.github/workflows/unit-tests.yml
Renamed workflow from "Unit Tests" to "Tests", added workflow_dispatch for manual execution, split test jobs into unit-tests (running unit/ directory) and new smoke-tests job (running smoke/ directory with marker restrictions).
Test Infrastructure
automation/test-execution/ansible/tests/pytest.ini, automation/test-execution/ansible/tests/smoke/__init__.py
Added pytest configuration with marker definitions (smoke, integration, slow, unit), discovery patterns, and filter warnings; added smoke package docstring.
Smoke Tests
automation/test-execution/ansible/tests/smoke/test_container_config.py, automation/test-execution/ansible/tests/smoke/test_model_matrix.py, automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py
Added three new smoke test modules validating container runtime availability/configuration, model matrix YAML structure and field requirements, and Ansible playbook/role YAML syntax and structure.
Unit Tests
automation/test-execution/ansible/tests/unit/test_cpu_utils.py
Added comprehensive unit test module (634 lines) covering filter parsing/formatting, NUMA node allocation, tensor-parallel selection, and multi-NUMA scenarios.
Core Logic - CPU Allocation
automation/test-execution/ansible/filter_plugins/cpu_utils.py
Refactored NUMA node handling with type normalization (idstr, physical_coresint); reworked allocate_with_auto_tp and allocate_with_fixed_tp to sort nodes by capacity and validate before allocation; improved error messaging; added empty-string/'None' handling for requested_tp.
Playbook AWX Integration
automation/test-execution/ansible/llm-benchmark-auto.yml, automation/test-execution/ansible/llm-benchmark.yml, automation/test-execution/ansible/llm-core-sweep-auto.yml
Added AWX job detection via AWX_JOB_ID, dynamic local_results_base switching to HOME/benchmark-results under AWX (else playbook_dir/../../../results/llm); removed vLLM configuration collection tasks; updated all result path references to use hostvars['localhost']['local_results_base'].
Playbook Core Config
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
Added AWX detection and is_awx_job fact; enforced non-empty core configuration validation; normalized core_sweep_counts input to single effective_requested_cores; restricted concurrent load to single-entry sweeps; updated results directory path for AWX execution.
Role Tasks - Core Allocation
automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml, automation/test-execution/ansible/roles/common/tasks/detect-numa-topology.yml
Unified tensor-parallel handling by removing conditional branches in allocate-cores-from-count.yml; in detect-numa-topology.yml, changed NUMA node id to integer storage, refactored backward-compatibility field derivation via selectattr lookups, and simplified physical_cores casting.
Role Tasks - Server Configuration
automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml, automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml
Added no_log: true to environment variable and container start tasks in embedding role to suppress HF_TOKEN; in LLM role, removed model-matrix.yaml loading, consolidated dtype/KV cache handling into effective-value facts, replaced per-model derivation with conditional dtype override step, and added metadata population for later collection.
Role Tasks - GuideLLM
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
Added rate normalization via _rate_value and _rate_list intermediate variables; renamed timeout variable from guidellm_wait_timeout to guidellm_wait_timeout_seconds; refactored log capture to stream podman logs directly to file with test_run_id naming; changed podman wait failure detection to check any non-zero exit code.
Documentation
automation/test-execution/ansible/README.md, automation/test-execution/ansible/ansible.md, automation/test-execution/ansible/requirements.yml, automation/test-execution/ansible/tests/README.md
Updated prerequisites/setup sections; removed vLLM metadata field references; clarified requirements.yml for CLI vs. production (referencing collections/requirements.yml for pinned versions); added comprehensive test documentation with marker usage and troubleshooting.
Production Dependencies
collections/requirements.yml
Added new requirements file with pinned Ansible Galaxy collection versions: containers.podman 1.9.4 and ansible.posix 1.5.4 for AWX/production deployments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'Smoke tests' is a generic label that does not clearly describe the main changes in the changeset. Revise the title to be more specific about the primary objective, such as 'Add smoke tests for configuration and playbook validation' to provide better context about what changes are included.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 96.59% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml (1)

18-27: ⚠️ Potential issue | 🟠 Major

Exclude string 'None' from validation to prevent premature int conversion failure.

The guards at lines 24-27 don't exclude the string 'None', which commonly comes from CLI arguments. This allows string 'None' to reach line 21's | int filter, causing an unnecessary failure before the Python function's auto-TP logic can handle it. Add explicit string handling to the when conditions.

Suggested fix
 - name: Validate requested_tensor_parallel if provided
   ansible.builtin.assert:
     that:
       - requested_tensor_parallel | int in [1, 2, 4, 8]
     fail_msg: "Invalid tensor_parallel: {{ requested_tensor_parallel }}. Valid values: 1, 2, 4, 8"
   when:
     - requested_tensor_parallel is defined
     - requested_tensor_parallel != omit
-    - requested_tensor_parallel != None
-    - requested_tensor_parallel != ""
+    - requested_tensor_parallel is not none
+    - (requested_tensor_parallel | string | trim | lower) not in ["", "none"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml`
around lines 18 - 27, The assert task validating requested_tensor_parallel
(ansible.builtin.assert) can receive the literal string 'None' from CLI and fail
on the `| int` conversion prematurely; update the task's when guards for the
assert (the block referencing requested_tensor_parallel) to explicitly exclude
the string 'None' (e.g., add a condition like requested_tensor_parallel !=
'None') so that string 'None' doesn't reach the `requested_tensor_parallel |
int` check and allows the downstream auto-TP logic to run.
🧹 Nitpick comments (6)
automation/test-execution/ansible/requirements.yml (1)

7-10: Align dev collection versions to AWX pins or add CI parity check.

Flexible ranges (>=1.9.0,<2.0.0 for containers.podman, >=1.4.0,<1.6.0 for ansible.posix) allow versions to drift from the exact AWX pins (1.9.4 and 1.5.4 respectively). This creates a risk of "works locally, fails in AWX" regressions. Either lock dev ranges to match AWX pins or add a CI check to prevent parity divergence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/requirements.yml` around lines 7 - 10,
Update the flexible version ranges in requirements.yml so dev collection
versions match AWX pins or add a CI parity check: either change the
containers.podman and ansible.posix entries to exact pins 1.9.4 and 1.5.4
respectively (replace >=1.9.0,<2.0.0 and >=1.4.0,<1.6.0) or add a CI job that
compares this requirements.yml to the canonical AWX pin file
(collections/requirements.yml) and fails on divergence; reference the collection
names containers.podman and ansible.posix and the AWX pin file when implementing
the change.
.github/workflows/unit-tests.yml (1)

26-30: Consider using a pinned test requirements file for reproducible CI runs.

Lines 29 and 58 both install floating versions (pytest ansible pyyaml), which can cause non-deterministic failures over time. Create a shared pinned requirements-test.txt and use it in both the unit-tests and smoke-tests jobs for better maintainability and reproducibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yml around lines 26 - 30, Replace the floating
pip installs with a pinned requirements file: add a committed
requirements-test.txt listing exact versions for pytest, ansible, pyyaml (and
any other test deps), then update the workflow "Install dependencies" steps (the
ones currently running "pip install pytest ansible pyyaml") in both the
unit-tests and smoke-tests jobs to use "pip install -r requirements-test.txt" so
CI uses reproducible, pinned test dependencies.
automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py (1)

28-28: Remove unnecessary f-string prefixes.

Multiple assertion messages use an f-string prefix without placeholders in the first part: f"Message:\n" + "\n".join(errors). The f prefix is unnecessary since there are no interpolations before the concatenation.

💡 Proposed fix pattern (apply to all similar lines)
-        assert not errors, f"YAML validation errors:\n" + "\n".join(errors)
+        assert not errors, "YAML validation errors:\n" + "\n".join(errors)

Or use a single f-string:

-        assert not errors, f"YAML validation errors:\n" + "\n".join(errors)
+        assert not errors, f"YAML validation errors:\n{chr(10).join(errors)}"

Also applies to: 60-60, 91-91, 110-110, 137-137, 165-165

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py` at
line 28, Remove the unnecessary f-string prefixes in the assertion messages in
test_playbook_syntax (the assertions that use f"YAML validation errors:\n" +
"\n".join(errors) and similar patterns); either drop the leading "f" from the
first literal (making it a normal string concatenated with "\n".join(...)) or
combine into a single f-string that interpolates the join (e.g., f"YAML
validation errors:\n{'\n'.join(errors)}"). Update all occurrences matching that
pattern (the assertion lines that build messages by concatenating a literal with
"\n".join(...)) so there are no unused f-prefixes.
automation/test-execution/ansible/tests/smoke/test_model_matrix.py (1)

61-61: Remove unnecessary f-string prefixes (same pattern as other test files).

Same issue as test_playbook_syntax.py - the f-string prefix adds no value when there are no placeholders before the concatenation.

Also applies to: 88-88, 120-120, 136-136, 154-154, 177-177, 193-193

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py` at line
61, Several assertions in test_model_matrix.py use an unnecessary f-string
prefix before a plain string that's concatenated with "\n".join(errors) (e.g.,
the assertion that starts with assert not errors, f"Model validation errors:\n"
+ "\n".join(errors)); remove the leading f from those string literals so they
become regular strings (e.g., "Model validation errors:\n" + "\n".join(errors)).
Apply the same change to the other similar assertions in this file that use the
f-prefix at lines referenced (the patterns containing "Model validation errors"
+ "\n".join(errors) and the other error message strings) so they match the style
used in other test files.
automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml (1)

302-307: Verify model_dtype fallback behavior when no --dtype argument is present.

If vllm_args_merged contains no --dtype=* argument and model_dtype was not explicitly set upstream, the default('--dtype=auto') ensures a safe fallback. However, this also means model_dtype in metadata may show auto even when vLLM internally selects a specific dtype. Consider documenting this behavior if downstream tooling expects the actual runtime dtype.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`
around lines 302 - 307, The current set_fact sets model_dtype from
vllm_args_merged with default('--dtype=auto'), which can misrepresent the actual
runtime dtype when vLLM chooses dtype automatically; update the task that sets
model_dtype (and related facts) so that it does not blindly record "auto" as the
dtype—either set model_dtype to an empty/nullable value when no --dtype is
present or add a separate fact like model_dtype_note/runtime_dtype_hint that
records "auto (runtime-selected)" so downstream tooling knows this is a
fallback, and keep references to vllm_args_merged and model_dtype in the updated
logic and documentation.
automation/test-execution/ansible/tests/smoke/test_container_config.py (1)

181-186: Incomplete port structure validation.

The test loads and validates that endpoints.yml exists and is non-empty, but the comment indicates port structure validation is pending. Consider adding the port checks or removing the placeholder comment.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/ansible.md`:
- Around line 66-69: Update the Quick Start in ansible.md to instruct users to
install Ansible collections before running setup-platform.yml: add a step after
the Ansible install note that tells the user to change into the
automation/test-execution/ansible folder on the control machine and run
ansible-galaxy collection install -r requirements.yml so the collections listed
in requirements.yml are present prior to running setup-platform.yml.

In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 59-73: Update the "Validate required parameters" task's fail_msg
to reflect the new single-entry behavior: mention that core_sweep_counts must be
a single-element list (e.g., core_sweep_counts=[16]) or recommend using
requested_cores=<N>, and remove the example suggesting multiple values (e.g.,
[16,32,64]); reference the variables requested_cores and core_sweep_counts in
the message so users know which inputs are accepted under the new validation in
the Validate required parameters assertion.

In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`:
- Around line 239-253: The shell task "Stream GuideLLM container logs directly
to file" is redirecting podman output to an unquoted filename built from
workload_type, core_cfg.name and test_run_id which can break if those variables
contain spaces or metacharacters; update the ansible.builtin.shell cmd to quote
the redirection target (e.g. use the Jinja2 quote filter or wrap the generated
path in single quotes) so the >/2>&1 target is treated as a single literal path,
and ensure the subsequent ansible.builtin.fetch src uses the identical
quoted/generated filename so Fetch GuideLLM logs to controller finds the file.

In `@automation/test-execution/ansible/tests/smoke/test_container_config.py`:
- Around line 108-116: The test currently checks only for the Docker-specific
"unable to find image" string in result.stderr before skipping; update the check
in the test_container_config.py test to broaden the stderr matching to include
Podman variants (e.g., "image not known", "no such image", "not found") or use a
regex/any-of-substrings approach against result.stderr.lower() before calling
pytest.skip, so the failure-to-find-image logic around result and
pytest.skip("vLLM image not available locally") correctly handles both Docker
and Podman outputs.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py`:
- Around line 179-193: The test test_embedding_matrix_structure asserts each
entry in embedding_matrix["matrix"]["embedding_models"] contains the fields in
required_fields (name, full_name, dimensions, max_sequence_length); currently
the model definitions lack dimensions and max_sequence_length. Fix by adding
those two keys to every embedding model definition in the embedding model matrix
(embedding_models) with appropriate integer values (e.g., dimensions: <embedding
vector size>, max_sequence_length: <token limit>) or, if those fields are not
applicable, remove them from required_fields in test_embedding_matrix_structure
to match the actual model schema.

In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py`:
- Around line 596-602: The test runner currently calls pytest.main([...]) but
does not propagate its exit code, so update the __main__ block to pass
pytest.main(...) into sys.exit so failures yield non-zero process exit;
specifically, in the if HAS_PYTEST branch replace the standalone call to
pytest.main([__file__, "-v"]) with sys.exit(pytest.main([__file__, "-v"]))
(ensure sys is imported and that the change is made inside the existing if
__name__ == "__main__" / if HAS_PYTEST block referencing HAS_PYTEST,
pytest.main, and sys.exit).
- Around line 20-37: The fallback pytest shim defines class pytest with nested
raises but misses the mark attribute used by decorators like `@pytest.mark.unit`;
add a mark object to the pytest shim (e.g., add an attribute named mark on the
pytest class or module that exposes a unit attribute usable as a decorator) so
imports that reference `@pytest.mark.unit` do not raise AttributeError; ensure the
unit attribute is a callable/decorator that returns the original function
(no-op) and keep the existing raises implementation (refer to the pytest class
and its nested raises in the current shim).

---

Outside diff comments:
In
`@automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml`:
- Around line 18-27: The assert task validating requested_tensor_parallel
(ansible.builtin.assert) can receive the literal string 'None' from CLI and fail
on the `| int` conversion prematurely; update the task's when guards for the
assert (the block referencing requested_tensor_parallel) to explicitly exclude
the string 'None' (e.g., add a condition like requested_tensor_parallel !=
'None') so that string 'None' doesn't reach the `requested_tensor_parallel |
int` check and allows the downstream auto-TP logic to run.

---

Nitpick comments:
In @.github/workflows/unit-tests.yml:
- Around line 26-30: Replace the floating pip installs with a pinned
requirements file: add a committed requirements-test.txt listing exact versions
for pytest, ansible, pyyaml (and any other test deps), then update the workflow
"Install dependencies" steps (the ones currently running "pip install pytest
ansible pyyaml") in both the unit-tests and smoke-tests jobs to use "pip install
-r requirements-test.txt" so CI uses reproducible, pinned test dependencies.

In `@automation/test-execution/ansible/requirements.yml`:
- Around line 7-10: Update the flexible version ranges in requirements.yml so
dev collection versions match AWX pins or add a CI parity check: either change
the containers.podman and ansible.posix entries to exact pins 1.9.4 and 1.5.4
respectively (replace >=1.9.0,<2.0.0 and >=1.4.0,<1.6.0) or add a CI job that
compares this requirements.yml to the canonical AWX pin file
(collections/requirements.yml) and fails on divergence; reference the collection
names containers.podman and ansible.posix and the AWX pin file when implementing
the change.

In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`:
- Around line 302-307: The current set_fact sets model_dtype from
vllm_args_merged with default('--dtype=auto'), which can misrepresent the actual
runtime dtype when vLLM chooses dtype automatically; update the task that sets
model_dtype (and related facts) so that it does not blindly record "auto" as the
dtype—either set model_dtype to an empty/nullable value when no --dtype is
present or add a separate fact like model_dtype_note/runtime_dtype_hint that
records "auto (runtime-selected)" so downstream tooling knows this is a
fallback, and keep references to vllm_args_merged and model_dtype in the updated
logic and documentation.

In `@automation/test-execution/ansible/tests/smoke/test_model_matrix.py`:
- Line 61: Several assertions in test_model_matrix.py use an unnecessary
f-string prefix before a plain string that's concatenated with "\n".join(errors)
(e.g., the assertion that starts with assert not errors, f"Model validation
errors:\n" + "\n".join(errors)); remove the leading f from those string literals
so they become regular strings (e.g., "Model validation errors:\n" +
"\n".join(errors)). Apply the same change to the other similar assertions in
this file that use the f-prefix at lines referenced (the patterns containing
"Model validation errors" + "\n".join(errors) and the other error message
strings) so they match the style used in other test files.

In `@automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py`:
- Line 28: Remove the unnecessary f-string prefixes in the assertion messages in
test_playbook_syntax (the assertions that use f"YAML validation errors:\n" +
"\n".join(errors) and similar patterns); either drop the leading "f" from the
first literal (making it a normal string concatenated with "\n".join(...)) or
combine into a single f-string that interpolates the join (e.g., f"YAML
validation errors:\n{'\n'.join(errors)}"). Update all occurrences matching that
pattern (the assertion lines that build messages by concatenating a literal with
"\n".join(...)) so there are no unused f-prefixes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15043db5-287d-468d-b6b2-d6948c171ad5

📥 Commits

Reviewing files that changed from the base of the PR and between b4126fb and fce2e05.

📒 Files selected for processing (22)
  • .github/workflows/unit-tests.yml
  • automation/test-execution/ansible/README.md
  • automation/test-execution/ansible/ansible.md
  • automation/test-execution/ansible/filter_plugins/cpu_utils.py
  • automation/test-execution/ansible/llm-benchmark-auto.yml
  • automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
  • automation/test-execution/ansible/llm-benchmark.yml
  • automation/test-execution/ansible/llm-core-sweep-auto.yml
  • automation/test-execution/ansible/requirements.yml
  • automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
  • automation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.yml
  • automation/test-execution/ansible/roles/common/tasks/detect-numa-topology.yml
  • automation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.yml
  • automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml
  • automation/test-execution/ansible/tests/README.md
  • automation/test-execution/ansible/tests/pytest.ini
  • automation/test-execution/ansible/tests/smoke/__init__.py
  • automation/test-execution/ansible/tests/smoke/test_container_config.py
  • automation/test-execution/ansible/tests/smoke/test_model_matrix.py
  • automation/test-execution/ansible/tests/smoke/test_playbook_syntax.py
  • automation/test-execution/ansible/tests/unit/test_cpu_utils.py
  • collections/requirements.yml
💤 Files with no reviewable changes (1)
  • automation/test-execution/ansible/README.md

Comment on lines +66 to +69
- **Option 2 (setup-platform.yml)**: Automatically installs Podman, Python 3,
and all performance tools on **DUT and Load Generator hosts**. Your Ansible
control machine only needs Ansible itself and the collections from
requirements.yml.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the collection install command to Quick Start.

These notes now say the control machine needs the collections from requirements.yml, but the setup steps never show how to install them. A first-time user can follow this page exactly and still hit missing collection errors on the first setup-platform.yml run.

Suggested addition near the Ansible install step
cd automation/test-execution/ansible
ansible-galaxy collection install -r requirements.yml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/ansible.md` around lines 66 - 69, Update
the Quick Start in ansible.md to instruct users to install Ansible collections
before running setup-platform.yml: add a step after the Ansible install note
that tells the user to change into the automation/test-execution/ansible folder
on the control machine and run ansible-galaxy collection install -r
requirements.yml so the collections listed in requirements.yml are present prior
to running setup-platform.yml.

Comment on lines +59 to +73
- name: Validate required parameters
ansible.builtin.assert:
that:
- >-
(requested_cores | default(0) | int > 0) or
(core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)
fail_msg: |
Missing required parameter: requested_cores or core_sweep_counts
Please provide either:
- Single core count: -e "requested_cores=<8|16|32|64|...>"
- Core sweep: -e "core_sweep_counts=[16,32,64]"

The concurrent load test requires at least one core configuration to test.
This playbook will run 3 phases with the specified core configuration(s).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the missing-parameter help text match the new single-entry behavior.

This message still tells users to pass core_sweep_counts=[16,32,64], but Lines 78-90 now reject more than one value. That makes the primary remediation path fail immediately.

Suggested fix
-            - Core sweep: -e "core_sweep_counts=[16,32,64]"
+            - Single-entry core list: -e "core_sweep_counts=[16]"
@@
-          This playbook will run 3 phases with the specified core configuration(s).
+          This playbook will run 3 phases with the specified core configuration.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Validate required parameters
ansible.builtin.assert:
that:
- >-
(requested_cores | default(0) | int > 0) or
(core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)
fail_msg: |
Missing required parameter: requested_cores or core_sweep_counts
Please provide either:
- Single core count: -e "requested_cores=<8|16|32|64|...>"
- Core sweep: -e "core_sweep_counts=[16,32,64]"
The concurrent load test requires at least one core configuration to test.
This playbook will run 3 phases with the specified core configuration(s).
- name: Validate required parameters
ansible.builtin.assert:
that:
- >-
(requested_cores | default(0) | int > 0) or
(core_sweep_counts is defined and core_sweep_counts is not none and core_sweep_counts | length > 0)
fail_msg: |
Missing required parameter: requested_cores or core_sweep_counts
Please provide either:
- Single core count: -e "requested_cores=<8|16|32|64|...>"
- Single-entry core list: -e "core_sweep_counts=[16]"
The concurrent load test requires at least one core configuration to test.
This playbook will run 3 phases with the specified core configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around
lines 59 - 73, Update the "Validate required parameters" task's fail_msg to
reflect the new single-entry behavior: mention that core_sweep_counts must be a
single-element list (e.g., core_sweep_counts=[16]) or recommend using
requested_cores=<N>, and remove the example suggesting multiple values (e.g.,
[16,32,64]); reference the variables requested_cores and core_sweep_counts in
the message so users know which inputs are accepted under the new validation in
the Validate required parameters assertion.

Comment on lines +239 to 253
- name: Stream GuideLLM container logs directly to file
ansible.builtin.shell:
cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} 2>&1"
cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"
args:
executable: /bin/bash
register: guidellm_full_logs
changed_when: false
failed_when: false
when:
- use_guidellm_container | bool
- container_exit_code.stdout is defined

- name: Save GuideLLM logs to temporary file on remote
ansible.builtin.copy:
content: "{{ guidellm_full_logs.stdout }}"
dest: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
mode: "0644"
when:
- use_guidellm_container | bool
- guidellm_full_logs is defined
- guidellm_full_logs.stdout is defined

- name: Fetch GuideLLM logs to controller
ansible.builtin.fetch:
src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"
dest: "{{ results_path }}/guidellm.log"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd automation/test-execution/ansible/roles/benchmark_guidellm/tasks && cat -n main.yml | sed -n '230,260p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1271


🏁 Script executed:

# Check how workload_type, core_cfg.name, and test_run_id are defined/used
rg -n 'workload_type|core_cfg\.name|test_run_id' automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml | head -30

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1466


🏁 Script executed:

# Check the broader context to understand variable sources
rg -n 'set_fact|register|vars:' automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml | head -20

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 729


Quote the redirected log filename to prevent shell interpretation of special characters.

The container name is quoted on line 241, but the output path is not. Since core_cfg.name, workload_type, and test_run_id are configuration-derived variables, they may contain spaces or shell metacharacters that would break the redirect or cause unintended file creation.

Suggested fix
 - name: Stream GuideLLM container logs directly to file
   ansible.builtin.shell:
-    cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"
+    cmd: >-
+      podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }}
+      > {{ ('/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log') | quote }} 2>&1
@@
 - name: Fetch GuideLLM logs to controller
   ansible.builtin.fetch:
-    src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"
+    src: "{{ '/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log' }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Stream GuideLLM container logs directly to file
ansible.builtin.shell:
cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} 2>&1"
cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"
args:
executable: /bin/bash
register: guidellm_full_logs
changed_when: false
failed_when: false
when:
- use_guidellm_container | bool
- container_exit_code.stdout is defined
- name: Save GuideLLM logs to temporary file on remote
ansible.builtin.copy:
content: "{{ guidellm_full_logs.stdout }}"
dest: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
mode: "0644"
when:
- use_guidellm_container | bool
- guidellm_full_logs is defined
- guidellm_full_logs.stdout is defined
- name: Fetch GuideLLM logs to controller
ansible.builtin.fetch:
src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log"
src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"
dest: "{{ results_path }}/guidellm.log"
- name: Stream GuideLLM container logs directly to file
ansible.builtin.shell:
cmd: >-
podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }}
> {{ ('/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log') | quote }} 2>&1
args:
executable: /bin/bash
changed_when: false
failed_when: false
when:
- use_guidellm_container | bool
- container_exit_code.stdout is defined
- name: Fetch GuideLLM logs to controller
ansible.builtin.fetch:
src: "{{ '/tmp/guidellm-' ~ workload_type ~ '-' ~ core_cfg.name ~ '-' ~ test_run_id ~ '.log' }}"
dest: "{{ results_path }}/guidellm.log"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`
around lines 239 - 253, The shell task "Stream GuideLLM container logs directly
to file" is redirecting podman output to an unquoted filename built from
workload_type, core_cfg.name and test_run_id which can break if those variables
contain spaces or metacharacters; update the ansible.builtin.shell cmd to quote
the redirection target (e.g. use the Jinja2 quote filter or wrap the generated
path in single quotes) so the >/2>&1 target is treated as a single literal path,
and ensure the subsequent ansible.builtin.fetch src uses the identical
quoted/generated filename so Fetch GuideLLM logs to controller finds the file.

Comment on lines +108 to +116
# If image doesn't exist, skip (don't auto-pull in tests)
if "unable to find image" in result.stderr.lower():
pytest.skip("vLLM image not available locally")

if result.returncode == 0:
assert (
"vllm serve" in result.stdout
or "usage:" in result.stdout.lower()
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docker-specific error message may not match Podman output.

The check for "unable to find image" is a Docker-specific error message. Podman typically uses different wording (e.g., "image not known" or "no such image"). This may cause the test to fail differently on Podman-only systems.

💡 Proposed fix for cross-runtime compatibility
         # If image doesn't exist, skip (don't auto-pull in tests)
-        if "unable to find image" in result.stderr.lower():
+        if any(msg in result.stderr.lower() for msg in [
+            "unable to find image",  # docker
+            "image not known",        # podman
+            "no such image",          # podman alternative
+        ]):
             pytest.skip("vLLM image not available locally")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# If image doesn't exist, skip (don't auto-pull in tests)
if "unable to find image" in result.stderr.lower():
pytest.skip("vLLM image not available locally")
if result.returncode == 0:
assert (
"vllm serve" in result.stdout
or "usage:" in result.stdout.lower()
)
# If image doesn't exist, skip (don't auto-pull in tests)
if any(msg in result.stderr.lower() for msg in [
"unable to find image", # docker
"image not known", # podman
"no such image", # podman alternative
]):
pytest.skip("vLLM image not available locally")
if result.returncode == 0:
assert (
"vllm serve" in result.stdout
or "usage:" in result.stdout.lower()
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/smoke/test_container_config.py`
around lines 108 - 116, The test currently checks only for the Docker-specific
"unable to find image" string in result.stderr before skipping; update the check
in the test_container_config.py test to broaden the stderr matching to include
Podman variants (e.g., "image not known", "no such image", "not found") or use a
regex/any-of-substrings approach against result.stderr.lower() before calling
pytest.skip, so the failure-to-find-image logic around result and
pytest.skip("vLLM image not available locally") correctly handles both Docker
and Podman outputs.

Comment on lines +20 to +37
# Mock pytest.raises for fallback tests
class pytest:

class raises:

def __init__(self, exc):
self.exc = exc

def __enter__(self):
return self

def __exit__(self, exc_type, exc_value, traceback):
# Check if exception matches expected type
if exc_type is not None and issubclass(exc_type, self.exc):
# Suppress the exception by returning True
return True
# Let unexpected exceptions propagate
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the file
fd -t f "test_cpu_utils.py" --exec wc -l {} \;

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 204


🏁 Script executed:

# Read lines 15-50 to see the fallback shim and context
cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '15,50p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 1288


🏁 Script executed:

# Check for `@pytest.mark.unit` decorators in the file
rg "@pytest\.mark\.unit" automation/test-execution/ansible/tests/unit/test_cpu_utils.py

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 233


🏁 Script executed:

# Find the exact lines where `@pytest.mark.unit` is used
rg -n "@pytest\.mark\.unit" automation/test-execution/ansible/tests/unit/test_cpu_utils.py | head -5

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 181


🏁 Script executed:

# Show the context around the first occurrence of `@pytest.mark.unit` to confirm the issue
cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '110,125p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 720


🏁 Script executed:

# Verify if there are any other uses of pytest.mark besides .unit
rg -n "pytest\.mark\." automation/test-execution/ansible/tests/unit/test_cpu_utils.py

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 269


Add mark.unit to the fallback pytest shim to prevent import failure.

When pytest is not installed, the module cannot be imported because @pytest.mark.unit decorators start at line 113, but the fallback shim (lines 20–37) only defines raises, not mark. This causes AttributeError during module import, so the fallback runner never runs.

Suggested fix
 except ImportError:
     HAS_PYTEST = False

     # Mock pytest.raises for fallback tests
     class pytest:
+        class mark:
+            `@staticmethod`
+            def unit(obj):
+                return obj

         class raises:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines
20 - 37, The fallback pytest shim defines class pytest with nested raises but
misses the mark attribute used by decorators like `@pytest.mark.unit`; add a mark
object to the pytest shim (e.g., add an attribute named mark on the pytest class
or module that exposes a unit attribute usable as a decorator) so imports that
reference `@pytest.mark.unit` do not raise AttributeError; ensure the unit
attribute is a callable/decorator that returns the original function (no-op) and
keep the existing raises implementation (refer to the pytest class and its
nested raises in the current shim).

Comment on lines +596 to +602
if __name__ == "__main__":
# Run tests if pytest not available
import sys

if HAS_PYTEST:
pytest.main([__file__, "-v"])
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the specific file and lines
cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '596,602p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 270


🏁 Script executed:

# Get more context around these lines to understand the full picture
cat -n automation/test-execution/ansible/tests/unit/test_cpu_utils.py | sed -n '590,610p'

Repository: redhat-et/vllm-cpu-perf-eval

Length of output: 791


🌐 Web query:

pytest.main() return value exit code

💡 Result:

pytest.main returns an exit code as either an int or a pytest.ExitCode enum value (since pytest 5.0). These correspond to the standard pytest exit codes: - 0 / ExitCode.OK: All tests passed - 1 / ExitCode.TESTS_FAILED: Some tests failed - 2 / ExitCode.INTERRUPTED: Interrupted by user - 3 / ExitCode.INTERNAL_ERROR: Internal error - 4 / ExitCode.USAGE_ERROR: Command line usage error - 5 / ExitCode.NO_TESTS_COLLECTED: No tests collected Unlike command-line invocation, pytest.main does not raise SystemExit but returns the code directly for programmatic use. The enum values have integer values matching these codes and can be imported via from pytest import ExitCode.

Citations:


Return pytest.main()'s exit status.

pytest.main() returns an exit code that should be passed to sys.exit(). Without it, python3 test_cpu_utils.py exits with status 0 even when tests fail, breaking CI/CD integration.

Suggested fix
 if __name__ == "__main__":
-    # Run tests if pytest not available
-    import sys
-
     if HAS_PYTEST:
-        pytest.main([__file__, "-v"])
+        sys.exit(pytest.main([__file__, "-v"]))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if __name__ == "__main__":
# Run tests if pytest not available
import sys
if HAS_PYTEST:
pytest.main([__file__, "-v"])
else:
if __name__ == "__main__":
# Run tests if pytest not available
import sys
if HAS_PYTEST:
sys.exit(pytest.main([__file__, "-v"]))
else:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines
596 - 602, The test runner currently calls pytest.main([...]) but does not
propagate its exit code, so update the __main__ block to pass pytest.main(...)
into sys.exit so failures yield non-zero process exit; specifically, in the if
HAS_PYTEST branch replace the standalone call to pytest.main([__file__, "-v"])
with sys.exit(pytest.main([__file__, "-v"])) (ensure sys is imported and that
the change is made inside the existing if __name__ == "__main__" / if HAS_PYTEST
block referencing HAS_PYTEST, pytest.main, and sys.exit).

maryamtahhan and others added 2 commits March 31, 2026 11:02
This commit fixes two configuration bugs found by smoke tests:

Bug #1: Remove OPT models (opt-125m, opt-1.3b)
- Context length (2048) incompatible with summarization workload (4096)
- Legacy models causing validation failures
- Reduced model count from 8 to 6 LLM models

Changes:
- Removed opt-125m and opt-1.3b from model-matrix.yaml
- Updated README.md: 8 → 6 LLM models
- Updated docs/methodology/overview.md: 8 → 6 LLM models
- Removed OPT Family section from models/models.md
- Removed all OPT entries from model tables and test scenarios
- Removed "Decode-Heavy Models" section (was OPT-only)

Bug #2: Add missing embedding model fields
- granite-embedding-english-r2: added dimensions=384, max_sequence_length=512
- granite-embedding-278m-multilingual: added dimensions=768, max_sequence_length=512

These fixes ensure all model configurations pass validation tests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant