Fix/model matrix validation errors#87
Conversation
…rdening This commit improves the existing Ansible playbook infrastructure for vLLM CPU performance evaluation with enhanced AWX compatibility, security hardening, and comprehensive testing. - Fix type normalization to handle AnsibleUnsafeText from AWX - Fix allocated_nodes to return integers instead of strings - Handle empty strings and Jinja2 None conversions properly - Simplify node eligibility checking and allocation logic - Improve error messages for better validation feedback - Add no_log: true to all tasks handling HF_TOKEN - Prevent token exposure in container start operations - Secure environment variable handling in AWX jobs - Add comprehensive unit tests for cpu_utils filter plugin (598 lines) - Test coverage for: CPU range conversion, NUMA extraction, multi-NUMA allocation, OMP binding, and real-world scenarios - Support for both pytest and standalone execution - Add collections/requirements.yml for Ansible collection dependencies - Better parameter validation for AWX jobs in concurrent load testing - AWX detection for result path handling - Improved NUMA topology detection in core sweep - Enhanced result path consistency in main benchmark - Better workload configuration handling - Simplify prerequisites section - Update examples with current best practices - Clearer workflow documentation Files changed: 13 files, 772 insertions(+), 323 deletions(-) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Remove undefined 'short_codegen' workload from validation lists to prevent KeyError during test execution. Add CPU allocation validation to detect under-allocation early with clear error messages. Fix pytest.raises fallback to properly suppress exceptions when pytest is unavailable. Pin exact collection versions for reproducible AWX/CI runs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Fix inconsistent output filename reference in ansible.md (benchmarks.html → benchmarks.json) - Improve requested_cores validation to handle non-numeric input safely - Document requirements files usage (AWX/production vs CLI/development) - Fix hardcoded log_collection_dest to use centralized local_results_base - Remove obsolete vLLM configuration fields from test metadata docs - Improve node capacity validation with sorting and per-node checks in cpu_utils.py - Refactor duplicate AWX_JOB_ID lookup in llm-core-sweep-auto.yml - Add NUMA node id type coercion for consistent integer types Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
The previous commit changed NUMA node ids from strings to integers, but the selectattr filters were still trying to match them as strings, causing "No first item, sequence was empty" errors. Remove the | string filter from selectattr comparisons to match integer ids correctly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
- Fix allocate_with_fixed_tp to filter nodes by capacity (cpu_utils.py) - Update Results summary to use local_results_base (llm-benchmark-auto.yml) - Implement core_sweep_counts parameter handling (llm-benchmark-concurrent-load.yml) - Normalize core_sweep_counts to requested_cores for single-element lists - Reject multi-element lists with helpful error directing to llm-core-sweep-auto.yml - Pass effective_requested_cores to all 3 phases - Restore model configuration facts in start-llm.yml for metadata collection - Set model_kv_cache, model_dtype, model_max_len, vllm_caching_mode - Fixes compatibility with get-vllm-config-from-dut.yml assertions - Add regression tests for serialized TP inputs (test_cpu_utils.py) - Test empty string '' behaves like auto-TP - Test string 'None' behaves like auto-TP Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
…ainer start Addresses two critical issues in vLLM configuration: 1. Per-model overrides (model_kv_cache_space, model_dtype) now properly override workload defaults instead of being ignored 2. Added preflight validation of --max-model-len against workload requirements (ISL + OSL) to fail early with clear error messages Both fixes ensure configuration errors are caught early and per-model settings take precedence over workload fallbacks. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
This commit fixes two configuration bugs found by smoke tests: Bug #1: Remove OPT models (opt-125m, opt-1.3b) - Context length (2048) incompatible with summarization workload (4096) - Legacy models causing validation failures - Reduced model count from 8 to 6 LLM models Changes: - Removed opt-125m and opt-1.3b from model-matrix.yaml - Updated README.md: 8 → 6 LLM models - Updated docs/methodology/overview.md: 8 → 6 LLM models - Removed OPT Family section from models/models.md - Removed all OPT entries from model tables and test scenarios - Removed "Decode-Heavy Models" section (was OPT-only) Bug #2: Add missing embedding model fields - granite-embedding-english-r2: added dimensions=384, max_sequence_length=512 - granite-embedding-278m-multilingual: added dimensions=768, max_sequence_length=512 These fixes ensure all model configurations pass validation tests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
📝 WalkthroughWalkthroughThis pull request updates Ansible playbooks to detect AWX runtime environments and dynamically configure result paths, removes legacy OPT models from the model matrix, refactors CPU core allocation logic with improved NUMA handling, eliminates vLLM configuration metadata collection from playbooks, and adds comprehensive unit test coverage for CPU utilities. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
models/models.md (1)
128-138:⚠️ Potential issue | 🟡 MinorModel count mismatch in documentation.
The header states "Current LLM Models (9 total)" but the table only lists 7 models:
- Llama-3.2-1B-Instruct
- Llama-3.2-3B-Instruct
- TinyLlama-1.1B-Chat
- granite-3.2-2b-instruct
- Qwen/Qwen3-0.6B
- Qwen/Qwen2.5-3B-Instruct
- openai/gpt-oss-20b
Update the header to reflect the actual count.
📝 Proposed fix
-## Current LLM Models (9 total) +## Current LLM Models (7 total)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@models/models.md` around lines 128 - 138, The header "Current LLM Models (9 total)" is incorrect relative to the table rows; update that header to "Current LLM Models (7 total)" so it matches the listed models (Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, TinyLlama-1.1B-Chat, granite-3.2-2b-instruct, Qwen/Qwen3-0.6B, Qwen/Qwen2.5-3B-Instruct, openai/gpt-oss-20b) by editing the header text in the models.md file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Around line 144-146: The display logic in allocate-cores-from-count.yml relies
on checking "requested_tensor_parallel is not defined" to decide between
"(auto-calculated)" and "(user-specified)", but llm-benchmark-auto.yml sets
requested_tensor_parallel (and related requested_omp_* variables) using
default(None), which means the variable is always defined and causes the message
to incorrectly show "(user-specified)". Fix by restoring the original omission
behavior: change the defaults for requested_tensor_parallel,
requested_omp_threads, and requested_omp_bind back to default(omit) in the
llm-benchmark-auto.yml diff so the "is not defined" check in
allocate-cores-from-count.yml works as intended (alternatively, update the
display logic to test for "is not none" if you prefer defined-but-None
semantics).
In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`:
- Around line 208-214: The wait task "Wait for GuideLLM benchmark to complete
(containerized)" currently aborts the play on podman wait errors; change it to
be non-fatal by removing/setting failed_when: false (keep changed_when: false)
while still registering guidellm_exit_code_container (using timeout {{
guidellm_wait_timeout_seconds }} podman wait guidellm-{{ workload_type }}-{{
core_cfg.name }} and when: use_guidellm_container | bool); then update the
unified failure check after the diagnostic/cleanup steps to evaluate
guidellm_exit_code_container.rc and treat nonzero as a failure only once
logs/inspect/fetch/state: absent have run.
- Around line 239-257: The "Stream GuideLLM container logs directly to file"
task builds an unsafe /tmp filename using test_run_id
(command-injection/arbitrary-write) and leaves files behind; replace this by
creating a managed temp file with ansible.builtin.tempfile (register e.g.
guidellm_log_file), update the shell task to redirect podman logs into
guidellm_log_file.path instead of the interpolated /tmp path, update the "Fetch
GuideLLM logs to controller" ansible.builtin.fetch src to use
guidellm_log_file.path, and add a cleanup task to remove the temp file on the
remote host after fetching to avoid disk leaks.
In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py`:
- Around line 20-37: The mock pytest.raises currently returns True when no
exception is raised, causing false positives; update the pytest.raises.__exit__
to raise an AssertionError (or appropriate test failure) when exc_type is None
so it fails if no exception occurred, while still suppressing the expected
exception when issubclass(exc_type, self.exc) returns True and letting
unexpected exceptions propagate; modify the __exit__ logic in the pytest.raises
class to: if exc_type is None: raise AssertionError(f"Did not raise
{self.exc}"), elif issubclass(exc_type, self.exc): return True, else return
False.
In `@docs/methodology/overview.md`:
- Line 174: The documentation line that reads "Tests 6 LLM models + 2 embedding
models" is out of sync with the test matrix which lists seven LLMs; update that
text to "Tests 7 LLM models + 2 embedding models" (or otherwise change the
sentence to reflect 7 LLM models) so it matches
models/llm-models/model-matrix.yaml and the listed model names (e.g.,
meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct,
TinyLlama/TinyLlama-1.1B-Chat-v1.0, ibm-granite/granite-3.2-2b-instruct,
Qwen/Qwen3-0.6B, Qwen/Qwen2.5-3B-Instruct, openai/gpt-oss-20b).
In `@README.md`:
- Line 161: The LLM count line "6 LLM models + 2 embedding models" is
inconsistent with the listed models because gpt-oss-20b is included; update that
summary to "7 LLM models + 2 embedding models" (or remove gpt-oss-20b from the
list if the intent is to keep six) by editing the summary string that contains
"6 LLM models + 2 embedding models" and ensure it matches the presence of
"gpt-oss-20b" in the model list.
- Around line 181-186: The bulleted LLM list and header "LLM Models (6 total):"
are inconsistent with models/llm-models/model-matrix.yaml which includes
gpt-oss-20b; either add a new bullet "gpt-oss-20b - Large open-source (20B)" to
the list and change the header to "LLM Models (7 total):", or keep the existing
bullets but update the header and any summary count (the summary near the "Large
model support" section) to "7" so the README and model-matrix.yaml match; update
the "LLM Models" header and the bulleted list (the lines containing the listed
model names) accordingly.
---
Outside diff comments:
In `@models/models.md`:
- Around line 128-138: The header "Current LLM Models (9 total)" is incorrect
relative to the table rows; update that header to "Current LLM Models (7 total)"
so it matches the listed models (Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct,
TinyLlama-1.1B-Chat, granite-3.2-2b-instruct, Qwen/Qwen3-0.6B,
Qwen/Qwen2.5-3B-Instruct, openai/gpt-oss-20b) by editing the header text in the
models.md file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5bdc56bb-5f27-46f5-89c3-67029da1c3ca
📒 Files selected for processing (20)
README.mdautomation/test-execution/ansible/README.mdautomation/test-execution/ansible/ansible.mdautomation/test-execution/ansible/filter_plugins/cpu_utils.pyautomation/test-execution/ansible/llm-benchmark-auto.ymlautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/llm-benchmark.ymlautomation/test-execution/ansible/llm-core-sweep-auto.ymlautomation/test-execution/ansible/requirements.ymlautomation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.ymlautomation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.ymlautomation/test-execution/ansible/roles/common/tasks/detect-numa-topology.ymlautomation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.ymlautomation/test-execution/ansible/roles/vllm_server/tasks/start-llm.ymlautomation/test-execution/ansible/tests/unit/test_cpu_utils.pycollections/requirements.ymldocs/methodology/overview.mdmodels/embedding-models/model-matrix.yamlmodels/llm-models/model-matrix.yamlmodels/models.md
💤 Files with no reviewable changes (2)
- automation/test-execution/ansible/README.md
- models/llm-models/model-matrix.yaml
| requested_tensor_parallel: "{{ hostvars['localhost']['requested_tensor_parallel'] | default(None) }}" | ||
| requested_omp_threads: "{{ hostvars['localhost']['requested_omp_threads'] | default(None) }}" | ||
| requested_omp_bind: "{{ hostvars['localhost']['requested_omp_bind'] | default(None) }}" |
There was a problem hiding this comment.
Display message may incorrectly indicate "user-specified" tensor parallel.
Changing from default(omit) to default(None) affects the display logic at allocate-cores-from-count.yml line 65, which uses requested_tensor_parallel is not defined to show "(auto-calculated)" vs "(user-specified)". With default(None), the variable is always defined, so the message will incorrectly show "(user-specified)" even when the user didn't provide a value.
This is a minor display issue that doesn't affect functionality, but may confuse users reviewing benchmark output.
Suggested fix in allocate-cores-from-count.yml line 65
- - "Tensor Parallel: {{ core_allocation_result.tensor_parallel }}{{ ' (auto-calculated)' if requested_tensor_parallel is not defined else ' (user-specified)' }}"
+ - "Tensor Parallel: {{ core_allocation_result.tensor_parallel }}{{ ' (auto-calculated)' if (requested_tensor_parallel is not defined or requested_tensor_parallel == None or requested_tensor_parallel == '') else ' (user-specified)' }}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/llm-benchmark-auto.yml` around lines 144 -
146, The display logic in allocate-cores-from-count.yml relies on checking
"requested_tensor_parallel is not defined" to decide between "(auto-calculated)"
and "(user-specified)", but llm-benchmark-auto.yml sets
requested_tensor_parallel (and related requested_omp_* variables) using
default(None), which means the variable is always defined and causes the message
to incorrectly show "(user-specified)". Fix by restoring the original omission
behavior: change the defaults for requested_tensor_parallel,
requested_omp_threads, and requested_omp_bind back to default(omit) in the
llm-benchmark-auto.yml diff so the "is not defined" check in
allocate-cores-from-count.yml works as intended (alternatively, update the
display logic to test for "is not none" if you prefer defined-but-None
semantics).
| - name: Wait for GuideLLM benchmark to complete (containerized) | ||
| ansible.builtin.command: | ||
| cmd: "timeout {{ guidellm_wait_timeout_seconds | default(guidellm_wait_timeout) }} podman wait guidellm-{{ workload_type }}-{{ core_cfg.name }}" | ||
| cmd: "timeout {{ guidellm_wait_timeout_seconds }} podman wait guidellm-{{ workload_type }}-{{ core_cfg.name }}" | ||
| register: guidellm_exit_code_container | ||
| changed_when: false | ||
| failed_when: guidellm_exit_code_container.rc == 124 | ||
| failed_when: guidellm_exit_code_container.rc != 0 | ||
| when: use_guidellm_container | bool |
There was a problem hiding this comment.
Don’t fail the play before the diagnostic and cleanup tasks run.
Line 213 makes any podman wait error terminal, so the later inspect, log capture, fetch, and state: absent tasks never execute on the cases you most need them. Keep this step non-fatal, then fold guidellm_exit_code_container.rc into the unified failure path after cleanup finishes.
🩹 Minimal change
register: guidellm_exit_code_container
changed_when: false
- failed_when: guidellm_exit_code_container.rc != 0
+ failed_when: falseAs per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Wait for GuideLLM benchmark to complete (containerized) | |
| ansible.builtin.command: | |
| cmd: "timeout {{ guidellm_wait_timeout_seconds | default(guidellm_wait_timeout) }} podman wait guidellm-{{ workload_type }}-{{ core_cfg.name }}" | |
| cmd: "timeout {{ guidellm_wait_timeout_seconds }} podman wait guidellm-{{ workload_type }}-{{ core_cfg.name }}" | |
| register: guidellm_exit_code_container | |
| changed_when: false | |
| failed_when: guidellm_exit_code_container.rc == 124 | |
| failed_when: guidellm_exit_code_container.rc != 0 | |
| when: use_guidellm_container | bool | |
| - name: Wait for GuideLLM benchmark to complete (containerized) | |
| ansible.builtin.command: | |
| cmd: "timeout {{ guidellm_wait_timeout_seconds }} podman wait guidellm-{{ workload_type }}-{{ core_cfg.name }}" | |
| register: guidellm_exit_code_container | |
| changed_when: false | |
| failed_when: false | |
| when: use_guidellm_container | bool |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`
around lines 208 - 214, The wait task "Wait for GuideLLM benchmark to complete
(containerized)" currently aborts the play on podman wait errors; change it to
be non-fatal by removing/setting failed_when: false (keep changed_when: false)
while still registering guidellm_exit_code_container (using timeout {{
guidellm_wait_timeout_seconds }} podman wait guidellm-{{ workload_type }}-{{
core_cfg.name }} and when: use_guidellm_container | bool); then update the
unified failure check after the diagnostic/cleanup steps to evaluate
guidellm_exit_code_container.rc and treat nonzero as a failure only once
logs/inspect/fetch/state: absent have run.
| - name: Stream GuideLLM container logs directly to file | ||
| ansible.builtin.shell: | ||
| cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} 2>&1" | ||
| cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1" | ||
| args: | ||
| executable: /bin/bash | ||
| register: guidellm_full_logs | ||
| changed_when: false | ||
| failed_when: false | ||
| when: | ||
| - use_guidellm_container | bool | ||
| - container_exit_code.stdout is defined | ||
|
|
||
| - name: Save GuideLLM logs to temporary file on remote | ||
| ansible.builtin.copy: | ||
| content: "{{ guidellm_full_logs.stdout }}" | ||
| dest: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log" | ||
| mode: "0644" | ||
| when: | ||
| - use_guidellm_container | bool | ||
| - guidellm_full_logs is defined | ||
| - guidellm_full_logs.stdout is defined | ||
|
|
||
| - name: Fetch GuideLLM logs to controller | ||
| ansible.builtin.fetch: | ||
| src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}.log" | ||
| src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log" | ||
| dest: "{{ results_path }}/guidellm.log" | ||
| flat: true | ||
| when: | ||
| - use_guidellm_container | bool | ||
| - guidellm_full_logs is defined | ||
| - guidellm_full_logs.stdout is defined | ||
|
|
There was a problem hiding this comment.
Use a managed temp file here instead of building /tmp/...{{ test_run_id }}.log by hand.
test_run_id is caller-overridable in automation/test-execution/ansible/llm-benchmark-auto.yml, then passed into this role from automation/test-execution/ansible/llm-benchmark.yml. Interpolating it into an unquoted root shell redirection path creates a command-injection/arbitrary-write risk, and the file is never removed after fetch, so repeated runs will leak disk on the DUT/AWX worker.
🔐 Safer pattern
- cmd: "podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }} > /tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log 2>&1"
+ cmd: >-
+ podman logs {{ ('guidellm-' ~ workload_type ~ '-' ~ core_cfg.name) | quote }}
+ > {{ guidellm_log_file.path | quote }} 2>&1
...
- src: "/tmp/guidellm-{{ workload_type }}-{{ core_cfg.name }}-{{ test_run_id }}.log"
+ src: "{{ guidellm_log_file.path }}"- name: Create GuideLLM temp log file
ansible.builtin.tempfile:
state: file
prefix: "guidellm-"
suffix: ".log"
register: guidellm_log_fileAs per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`
around lines 239 - 257, The "Stream GuideLLM container logs directly to file"
task builds an unsafe /tmp filename using test_run_id
(command-injection/arbitrary-write) and leaves files behind; replace this by
creating a managed temp file with ansible.builtin.tempfile (register e.g.
guidellm_log_file), update the shell task to redirect podman logs into
guidellm_log_file.path instead of the interpolated /tmp path, update the "Fetch
GuideLLM logs to controller" ansible.builtin.fetch src to use
guidellm_log_file.path, and add a cleanup task to remove the temp file on the
remote host after fetching to avoid disk leaks.
| # Mock pytest.raises for fallback tests | ||
| class pytest: | ||
|
|
||
| class raises: | ||
|
|
||
| def __init__(self, exc): | ||
| self.exc = exc | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc_value, traceback): | ||
| # Check if exception matches expected type | ||
| if exc_type is not None and issubclass(exc_type, self.exc): | ||
| # Suppress the exception by returning True | ||
| return True | ||
| # Let unexpected exceptions propagate | ||
| return False |
There was a problem hiding this comment.
Mock pytest.raises doesn't fail when no exception is raised.
The mock silently passes when the expected exception isn't thrown. Real pytest.raises fails if no exception occurs. While the fallback smoke tests (lines 596-624) avoid exception-testing methods, this could cause false passes if someone extends them.
Proposed fix
class pytest:
class raises:
def __init__(self, exc):
self.exc = exc
+ self.value = None
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
- # Check if exception matches expected type
- if exc_type is not None and issubclass(exc_type, self.exc):
- # Suppress the exception by returning True
+ if exc_type is None:
+ raise AssertionError(
+ f"Expected {self.exc.__name__} but no exception was raised"
+ )
+ if issubclass(exc_type, self.exc):
+ self.value = exc_value
return True
- # Let unexpected exceptions propagate
return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines
20 - 37, The mock pytest.raises currently returns True when no exception is
raised, causing false positives; update the pytest.raises.__exit__ to raise an
AssertionError (or appropriate test failure) when exc_type is None so it fails
if no exception occurred, while still suppressing the expected exception when
issubclass(exc_type, self.exc) returns True and letting unexpected exceptions
propagate; modify the __exit__ logic in the pytest.raises class to: if exc_type
is None: raise AssertionError(f"Did not raise {self.exc}"), elif
issubclass(exc_type, self.exc): return True, else return False.
| **Goal:** Measure P95 latency and throughput scaling under concurrent load | ||
|
|
||
| - Tests 8 LLM models + 2 embedding models | ||
| - Tests 6 LLM models + 2 embedding models |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify actual LLM model count in model-matrix.yaml
echo "=== LLM models in model-matrix.yaml ==="
rg -n "full_name:" models/llm-models/model-matrix.yaml | head -20
echo ""
echo "=== Model counts mentioned in docs ==="
rg -n "LLM models" --type=md -C1Repository: redhat-et/vllm-cpu-perf-eval
Length of output: 1552
Update model count: documentation claims 6 LLM models but model-matrix.yaml contains 7.
The file states "Tests 6 LLM models" (matching README.md), but models/llm-models/model-matrix.yaml lists 7 LLM models:
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
- ibm-granite/granite-3.2-2b-instruct
- Qwen/Qwen3-0.6B
- Qwen/Qwen2.5-3B-Instruct
- openai/gpt-oss-20b
Update the count to 7 to match the actual models in the test matrix.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/methodology/overview.md` at line 174, The documentation line that reads
"Tests 6 LLM models + 2 embedding models" is out of sync with the test matrix
which lists seven LLMs; update that text to "Tests 7 LLM models + 2 embedding
models" (or otherwise change the sentence to reflect 7 LLM models) so it matches
models/llm-models/model-matrix.yaml and the listed model names (e.g.,
meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct,
TinyLlama/TinyLlama-1.1B-Chat-v1.0, ibm-granite/granite-3.2-2b-instruct,
Qwen/Qwen3-0.6B, Qwen/Qwen2.5-3B-Instruct, openai/gpt-oss-20b).
|
|
||
| - Concurrency levels: 1, 2, 4, 8, 16, 32 | ||
| - 8 LLM models + 2 embedding models | ||
| - 6 LLM models + 2 embedding models |
There was a problem hiding this comment.
Count appears inconsistent with model list.
The "6 LLM models" count at line 161 should match the actual models listed. If gpt-oss-20b is included (as referenced at line 130), this should be 7 LLM models.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` at line 161, The LLM count line "6 LLM models + 2 embedding
models" is inconsistent with the listed models because gpt-oss-20b is included;
update that summary to "7 LLM models + 2 embedding models" (or remove
gpt-oss-20b from the list if the intent is to keep six) by editing the summary
string that contains "6 LLM models + 2 embedding models" and ensure it matches
the presence of "gpt-oss-20b" in the model list.
| **LLM Models (6 total):** | ||
|
|
||
| - Llama-3.2 (1B, 3B) - Prefill-heavy | ||
| - TinyLlama-1.1B - Balanced small-scale | ||
| - OPT (125M, 1.3B) - Decode-heavy legacy baseline | ||
| - Granite-3.2-2B - Balanced enterprise | ||
| - Qwen3-0.6B, Qwen2.5-3B - High-efficiency balanced |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if gpt-oss-20b is referenced elsewhere in README
echo "=== gpt-oss-20b references in README ==="
rg -n "gpt-oss" README.md
echo ""
echo "=== All LLM models in model-matrix.yaml ==="
rg "full_name:" models/llm-models/model-matrix.yamlRepository: redhat-et/vllm-cpu-perf-eval
Length of output: 574
Add gpt-oss-20b to the LLM models list or correct the count.
The README mentions gpt-oss-20b in the "Large model support" feature (line 130) and includes it in models/llm-models/model-matrix.yaml, but it's missing from the model list at lines 183-186. The header claims "(6 total)" but the model matrix contains 7 models. Either include gpt-oss-20b in the bulleted list or update the count and summary at line 161 to reflect the actual number.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 181 - 186, The bulleted LLM list and header "LLM
Models (6 total):" are inconsistent with models/llm-models/model-matrix.yaml
which includes gpt-oss-20b; either add a new bullet "gpt-oss-20b - Large
open-source (20B)" to the list and change the header to "LLM Models (7 total):",
or keep the existing bullets but update the header and any summary count (the
summary near the "Large model support" section) to "7" so the README and
model-matrix.yaml match; update the "LLM Models" header and the bulleted list
(the lines containing the listed model names) accordingly.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation