ci: enable performance quality gates by igoragoli · Pull Request #5571 · DataDog/dd-trace-rb

igoragoli · 2026-04-09T14:50:14Z

What does this PR do?

Enables pre-release performance quality gates on dd-trace-rb.

Microbenchmarks: microbenchmarks-check-big-regressions job (20% threshold via fail_on_regression)
Macrobenchmarks: macrobenchmarks-check-slo-breaches + macrobenchmarks-notify-slo-breaches jobs with SLO thresholds via fail_on_breach
- 36 scenarios, 66 thresholds
- normal_operation: p50/p99 latency
- high_load: throughput
- utilization monitors: CPU% and RSS
- baseline scenarios excluded (not actionable)

Motivation:

Catch performance regressions before release. Aligns dd-trace-rb with dd-trace-go and dd-trace-py.

Change log entry

None.

Additional Notes:

SLO generation:

Generated with benchmark_analyzer generate slos --strategy tight --significant-impact-threshold 0.10 (T=10%)
Source: single pipeline run of all 8 macrobenchmark configurations
One RSS threshold manually bumped (high_load--profiling-and-tracing-and-appsec--puma-utilization: 2.73 GB → 3.25 GB) due to cross-run variance

Quality gates setup:

All gate jobs use allow_failure: true until thresholds are validated
Slack notifications go to apm-dcs-performance-alerts (TODO: switch to #guild-dd-ruby)
tracing-and-appsec macrobenchmark produced no k6 results, so it has no SLO thresholds yet
Depends on ci: add dd-octo-sts policy for GitLab SLO change tracking #5570

How to test the change?

CI pipeline validates gate jobs run correctly after benchmarks complete.

Add macrobenchmarks-gates and macrobenchmarks-notify stages. Include check-slo-breaches and notify-slo-breaches templates from benchmarking-platform-tools. Add placeholder check-slo-breaches job that depends on all 8 macrobenchmark jobs. Temporarily set macrobenchmarks to auto-trigger on all branches to collect baseline artifacts for SLO threshold generation.

Adds a quality gate that fails on microbenchmark regressions exceeding 20%. Uses bp-runner fail_on_regression step from benchmarking-platform. Runs after microbenchmarks with when: always to catch failures too. Set to allow_failure: true until thresholds are validated.

github-actions · 2026-04-09T14:50:27Z

Thank you for updating Change log entry section 👏

^{Visited at: 2026-04-09 14:50:40 UTC}

igoragoli · 2026-04-09T14:50:32Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

ci: enable performance quality gates #5571 👈 (View in Graphite)
ci: add dd-octo-sts policy for GitLab SLO change tracking #5570
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

pr-commenter · 2026-04-09T15:15:41Z

Benchmarks

Benchmark execution time: 2026-04-10 11:41:04

Comparing candidate commit d1e7605 in PR branch augusto/enable-perf-quality-gates with baseline commit 1595023 in branch augusto/add-perf-quality-gate-dd-octo-sts-policy.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Replace check-slo-breaches placeholder with real fail_on_breach implementation. Add notify-slo-breaches job to alert on apm-dcs-performance-alerts. Generate 209 SLO thresholds across 42 scenarios using tight strategy (T=5%). Revert macrobenchmarks to manual trigger on non-master branches.

Move microbenchmarks before macrobenchmarks so macro gates and notify stages are adjacent. Restrict check-slo-breaches and notify-slo-breaches to master only since non-master branches use manual macrobenchmarks.

Drop rules: block from check-slo-breaches and notify-slo-breaches. GitLab ignores top-level when: when rules: is present. Follow dd-trace-py pattern: use when: always with no rules.

Use rules: with when: always on master, default on_success on branches. Remove conflicting top-level when: always which GitLab ignores when rules: is present.

Remove baseline scenarios (not actionable). Keep only: - normal_operation: agg_http_req_duration p50/p99 - high_load: throughput - utilization monitors: cpu_usage_percentage, rss Drop data_received, data_sent, dropped_iterations, http_req_duration. Reduces from 209 to 66 thresholds across 36 scenarios.

Fix macrobenchmarks-notify-slo-breaches referencing wrong job name. Move when: always into rules for microbenchmarks-check-big-regressions since GitLab ignores top-level when: when rules: is present.

Single-run SLO generation produced a tight RSS threshold (2.73 GB) that doesn't account for cross-run variance. Bump to 3.25 GB based on observed values across multiple runs.

igoragoli added 2 commits April 9, 2026 16:45

igoragoli changed the title ~~ci: scaffold macrobenchmark quality gates and auto-trigger benchmarks~~ ci: enable performance quality gates Apr 9, 2026

igoragoli mentioned this pull request Apr 9, 2026

ci: add dd-octo-sts policy for GitLab SLO change tracking #5570

Open

igoragoli added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 9, 2026

igoragoli added 2 commits April 9, 2026 19:29

ci: restrict SLO checks to master, reorder stages

efb574d

Move microbenchmarks before macrobenchmarks so macro gates and notify stages are adjacent. Restrict check-slo-breaches and notify-slo-breaches to master only since non-master branches use manual macrobenchmarks.

igoragoli force-pushed the augusto/enable-perf-quality-gates branch from 2e12e39 to efb574d Compare April 9, 2026 17:38

igoragoli added 6 commits April 10, 2026 06:26

ci: fix check-slo-breaches rules, use when: always without rules

26d35a5

Drop rules: block from check-slo-breaches and notify-slo-breaches. GitLab ignores top-level when: when rules: is present. Follow dd-trace-py pattern: use when: always with no rules.

ci: fix quality gates rules to match dd-trace-go pattern

4a7ac77

Use rules: with when: always on master, default on_success on branches. Remove conflicting top-level when: always which GitLab ignores when rules: is present.

docs: add docs for microbenchmarks-check-big-regressions

0622f88

ci: simplify microbenchmarks-check-big-regressions

cd19ad7

ci: improve docs and naming for pre-release gates

b9cb0ce

ci: move SLO check jobs to a better place

aedcb21

igoragoli force-pushed the augusto/enable-perf-quality-gates branch from 0568f25 to c866a4b Compare April 10, 2026 08:29

igoragoli force-pushed the augusto/enable-perf-quality-gates branch from c866a4b to c3caecc Compare April 10, 2026 08:31

igoragoli added 4 commits April 10, 2026 10:32

ci: SLO check doesn't need the baseline

0843197

ci: make SLO file more human readable

73579df

ci: fix notify needs reference and check-big-regressions rules

cdf0fcf

Fix macrobenchmarks-notify-slo-breaches referencing wrong job name. Move when: always into rules for microbenchmarks-check-big-regressions since GitLab ignores top-level when: when rules: is present.

ci: bump puma RSS threshold for profiling-and-tracing-and-appsec

d1e7605

Single-run SLO generation produced a tight RSS threshold (2.73 GB) that doesn't account for cross-run variance. Bump to 3.25 GB based on observed values across multiple runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: enable performance quality gates#5571

ci: enable performance quality gates#5571
igoragoli wants to merge 15 commits intoaugusto/add-perf-quality-gate-dd-octo-sts-policyfrom
augusto/enable-perf-quality-gates

igoragoli commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

igoragoli commented Apr 9, 2026

Uh oh!

pr-commenter bot commented Apr 9, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igoragoli commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igoragoli commented Apr 9, 2026

Uh oh!

pr-commenter bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

igoragoli commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

pr-commenter bot commented Apr 9, 2026 •

edited

Loading