Skip to content

Record the total time spent waiting for the GVL as a metric#5569

Draft
eregon wants to merge 5 commits intomasterfrom
bd/gvl-metrics
Draft

Record the total time spent waiting for the GVL as a metric#5569
eregon wants to merge 5 commits intomasterfrom
bd/gvl-metrics

Conversation

@eregon
Copy link
Copy Markdown
Member

@eregon eregon commented Apr 9, 2026

What does this PR do?

Record the total time spent waiting for the GVL as a metric.
See #3433.

Motivation:

That issue.

Change log entry

Yes. Added metric for total time spent waiting for the GVL.

Additional Notes:

Yes, I'm experimenting with Cursor.

How to test the change?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Thank you for updating Change log entry section 👏

Visited at: 2026-04-09 14:13:34 UTC

@github-actions github-actions bot added the profiling Involves Datadog profiling label Apr 9, 2026
@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 bot commented Apr 9, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 95.36% (+0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a125c7d | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@eregon eregon force-pushed the bd/gvl-metrics branch 2 times, most recently from 5aee7cb to 472c7ee Compare April 9, 2026 13:51
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Apr 9, 2026

Benchmarks

Benchmark execution time: 2026-04-09 14:16:50

Comparing candidate commit 472c7ee in PR branch bd/gvl-metrics with baseline commit e191a14 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +73 to +75
metrics = {
gvl_waiting_time_ns_total: worker_stats.delete(:gvl_waiting_time_ns_total),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks to me the metric can be nil in e.g. old Rubies or when DD_PROFILING_GVL_ENABLED is set to false.

I believe in the backend this maps to JsonMetricsComputer.java and at a quick glance I'm not sure it likes null. Maybe better to not emit it at all if it's nil?

(Or try submitting to staging / poking at the backend code / asking in #profiling-backend what's the best option here)

Comment on lines 54 to +63
)
expect(JSON.parse(flush.metrics, symbolize_names: true)).to eq(gvl_waiting_time_ns_total: nil)
end

context "when metrics include a numeric gvl_waiting_time_ns_total" do
let(:metrics) { {gvl_waiting_time_ns_total: 12_345} }

it "stores them as metrics.json payload" do
expect(JSON.parse(flush.metrics, symbolize_names: true)).to eq(gvl_waiting_time_ns_total: 12_345)
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This suspiciously looks like "AI doing verbose tests" haha, I would recommend just changing the have_attributes above to have a metrics: '{"gvl_waiting_time_ns_total": 12345}' or something like that? Same coverage, less code ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants