feat: add benchmark repeatability analysis with Coefficient of Variation by maryamtahhan · Pull Request #112 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-04-28T08:03:38Z

Add comprehensive repeatability analysis tooling to measure benchmark consistency using Coefficient of Variation (CV) metrics.

New features:

analyze_repeatability.py: Calculate CV for metrics across multiple runs
Automatic grouping by platform, model, cores, TP, vLLM version, concurrency
Repeatability grading (Excellent/Good/Acceptable/Poor) based on CV thresholds
Markdown and JSON report generation with detailed CV tables and rankings
Dashboard utilities for integrating CV metrics into visualizations

CV interpretation:

CV < 1%: Excellent (ideal for regression testing)
CV 1-3%: Good (suitable for performance comparisons)
CV 3-5%: Acceptable (moderate variance)
CV > 5%: Poor (high variance, unreliable results)

Metrics analyzed:

Request latency (mean, P90, P95)
TTFT (mean, P90, P95)
TPoT/ITL (mean, P90, P95)
Output throughput

Add comprehensive repeatability analysis tooling to measure benchmark consistency using Coefficient of Variation (CV) metrics. New features: - analyze_repeatability.py: Calculate CV for metrics across multiple runs - Automatic grouping by platform, model, cores, TP, vLLM version, concurrency - Repeatability grading (Excellent/Good/Acceptable/Poor) based on CV thresholds - Markdown and JSON report generation with detailed CV tables and rankings - Dashboard utilities for integrating CV metrics into visualizations CV interpretation: - CV < 1%: Excellent (ideal for regression testing) - CV 1-3%: Good (suitable for performance comparisons) - CV 3-5%: Acceptable (moderate variance) - CV > 5%: Poor (high variance, unreliable results) Metrics analyzed: - Request latency (mean, P90, P95) - TTFT (mean, P90, P95) - TPoT/ITL (mean, P90, P95) - Output throughput Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

coderabbitai · 2026-04-28T08:03:46Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: fbdf597a-7fc6-4d9c-99b6-1e8342a5ff37

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add benchmark repeatability analysis with Coefficient of Variation#112

feat: add benchmark repeatability analysis with Coefficient of Variation#112
maryamtahhan wants to merge 1 commit intoredhat-et:mainfrom
maryamtahhan:feat/cv

maryamtahhan commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maryamtahhan commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant