Skip profile samples for threads without the GVL since the last sample by eregon · Pull Request #5751 · DataDog/dd-trace-rb

eregon · 2026-05-13T13:00:29Z

What does this PR do?

Motivation:

Change log entry

Additional Notes:

How to test the change?

…ample Hook RUBY_INTERNAL_THREAD_EVENT_SUSPENDED to record when a thread last released the GVL into a new gvl_suspended_at TLS slot. In update_metrics_and_sample, return early without advancing wall_time_at_previous_sample_ns when the thread has been continuously suspended since its previous sample (currently suspended with the same timestamp as last time). The accumulated wall-time gets picked up by an extended after-resume sample: on_gvl_running_with_threshold now also returns ON_GVL_RUNNING_SAMPLE when gvl_suspended_at exceeds the threshold, so the existing postponed-job pipeline emits a sample with a near-idle stack covering the whole idle window. For `bundle exec ruby benchmarks/profiling_sample_overhead.rb` (sleep 10), this skips ~3 per-thread samples per tick (the idle background threads), tracked via the new inactive_thread_samples_skipped counter surfaced in worker_stats. Gated to Ruby 3.3+ (the existing GVL profiling stack works there without the 3.2 workarounds) and to gvl_profiling_enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-05-13T13:00:45Z

👋 Hey @eregon, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

^{Visited at: 2026-05-13 13:00:44 UTC}

The skip optimization relies on the after-resume sample to deposit the accumulated wall-time of a long idle period into the profile. If a thread stays suspended across the entire reporting period (e.g. `sleep` longer than the export interval), RESUMED never fires within that period and the wall-time is missing from the profile — and from every subsequent profile until the thread eventually wakes up. Fix by adding thread_context_collector_flush_inactive_threads, called from Exporter#flush before pprof_recorder.serialize. It walks the per-thread context map and force-samples any thread still in the would-be-skipped state (gvl_suspended_at > 0 and equal to the stored value), so the accumulated wall-time lands in the active slot before the slot flip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pr-commenter · 2026-05-13T13:38:30Z

Benchmarks

Benchmark execution time: 2026-05-13 19:22:41

Comparing candidate commit fed2240 in PR branch benchmark-sampling-overhead with baseline commit 52edc02 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

On Rubies where the GVL profiling APIs aren't available (NO_GVL_INSTRUMENTATION / USE_GVL_PROFILING_3_2_WORKAROUNDS), the body of _native_flush_inactive_threads is preprocessed out and `instance` goes unused, which `-Werror=unused-parameter` rejects. Mark it DDTRACE_UNUSED — the attribute is a "may be unused" hint, so it remains valid on Rubies where the parameter is used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

datadog-prod-us1-3 · 2026-05-13T15:52:55Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 80.00%
• Overall Coverage: 97.07% (-0.09%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: fed2240 | Docs | Datadog PR Page | Give us feedback!}

The SUSPENDED internal-thread-event hook can fire very frequently (every GVL release: sleep, IO, Thread.pass, lock contention, ...). Calling monotonic_wall_time_now_ns() there — even though it's a vDSO call — adds up. The skip optimization only needed two pieces of information out of that timestamp ("did the thread acquire the GVL since the last sample" and "is it currently suspended"), so a counter is enough. The TLS slot now holds: - low bit: current state (1 = suspended, 0 = running) - bits 1+: monotonic event counter, incremented on every SUSPENDED or RESUMED for a tracked thread SUSPENDED sets the bit and bumps the counter; RESUMED clears the bit and bumps the counter. The sampler reads the word once: the low bit is the current state, equality with its per-thread snapshot means no transitions since the last sample. Together that's exactly "did not have the GVL last sample and did not acquire it since then" — no wall-time read in the hot path. The previous duration-based after-resume trigger goes away with the timestamp (we don't track when the suspension started anymore). To keep the catch-up sample at end-of-period, per_thread_context gains a `was_skipped_at_last_sample` flag that flush_inactive_threads now keys off — it also handles the case where a long-suspended thread RESUMES just before serialize, which the old timestamp equality check would have missed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SUSPENDED and RESUMED alternate for any given thread (the VM never RESUMES a thread without a prior SUSPENDED and vice-versa), so a single counter that bumps on each transition is enough: its low bit naturally encodes the current state (1 after SUSPENDED, 0 after RESUMED). No more explicit state-bit + counter packing. The test helpers were the one place that didn't honor the invariant — the simulated `on_gvl_running` was called without a prior simulated `on_gvl_suspended`, which would have left the counter parity off by one. Add a matching `_native_on_gvl_suspended` test entry point and call it from the "thread is ready to run again" before-block so the test path mirrors the real lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…flake The TLS slot persists across worker instances within the same process, so a spec that exercised the SUSPENDED hook would leave the counter in a "suspended" state for the test-runner thread, and a subsequent spec running without GVL profiling would see counter == per_thread_context's freshly-zeroed snapshot AND the low bit set, and skip almost every sample for that thread. cpu_and_wall_time_worker_spec's "sampling from signal handler" was the visible victim: it expects ~20 samples but only sees ~1. Clear the slot in initialize_context next to the existing gvl_profiling_state reset, so each worker starts each thread from a known-zero state. Restore the explicit-state-bit encoding (low bit = current state, bits 1+ = monotonic event counter) so the hooks can correct the state on the very next event after the clear, even for threads that happened to be mid-suspension when initialize_context cleared them — the pure- counter-parity scheme would have ended up off by one in that window. With explicit state bits the simulated test helpers no longer need a matching on_gvl_suspended call before on_gvl_running, so drop that. Flaky spec passes 8/8 runs after the fix; full thread_context + worker specs green on Ruby 3.2 and 4.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eregon and others added 2 commits May 13, 2026 14:55

Add benchmark for profiling overhead

957df34

dd-octo-sts Bot added the profiling Involves Datadog profiling label May 13, 2026

eregon and others added 3 commits May 13, 2026 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip profile samples for threads without the GVL since the last sample#5751

Skip profile samples for threads without the GVL since the last sample#5751
eregon wants to merge 7 commits into
masterfrom
benchmark-sampling-overhead

eregon commented May 13, 2026

Uh oh!

dd-octo-sts Bot commented May 13, 2026

Uh oh!

pr-commenter Bot commented May 13, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

datadog-prod-us1-3 Bot commented May 13, 2026 •

edited by datadog-prod-us1-4 Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eregon commented May 13, 2026

Uh oh!

dd-octo-sts Bot commented May 13, 2026

Uh oh!

pr-commenter Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

datadog-prod-us1-3 Bot commented May 13, 2026 • edited by datadog-prod-us1-4 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pr-commenter Bot commented May 13, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented May 13, 2026 •

edited by datadog-prod-us1-4 Bot

Loading