Skip to content

Validate fix for flaky: shutdown 'executes only once'#5586

Draft
p-datadog wants to merge 4 commits intomasterfrom
validate-flaky-shutdown-executes-only-once
Draft

Validate fix for flaky: shutdown 'executes only once'#5586
p-datadog wants to merge 4 commits intomasterfrom
validate-flaky-shutdown-executes-only-once

Conversation

@p-datadog
Copy link
Copy Markdown
Member

What does this PR do?

Validates that the fix neutralizes the forced failure. This branch merges:

If CI passes, the fix addresses the exact failure mode the reproducer forces.
If CI fails, the fix does not solve the problem.

Do not merge — this PR exists for validation only.

Change log entry

None.

How to test the change?

CI passes = fix works against the forced failure. CI fails = fix is insufficient.

p-ddsign and others added 4 commits April 10, 2026 20:33
Hypothesis: the test fails when the HTTP round-trip to the agent takes
longer than DEFAULT_SHUTDOWN_TIMEOUT (1 second). The worker thread is
mid-HTTP-call when shutdown! calls join(1), the join times out, and
traces_flushed is still 0 when the assertion runs.

This commit forces the race condition by stubbing the transport's
send_traces to add a 2-second sleep, exceeding the 1-second shutdown
timeout. The test should fail deterministically in CI.

Co-Authored-By: Claude <noreply@anthropic.com>
Root cause: the test asserts traces_flushed: 1 immediately after
concurrent shutdown! calls join. When the HTTP round-trip to the agent
takes longer than DEFAULT_SHUTDOWN_TIMEOUT (1s), the worker thread's
join times out and the flush is still in-progress. The assertion sees
traces_flushed: 0 and fails.

Wait for traces_flushed >= 1 AFTER the shutdown threads join, not
before. This preserves the test's coverage: the concurrent shutdowns
still race with an in-flight flush (the buffer has data when shutdowns
begin), so the shutdown guard and worker stop logic are exercised under
contention. The wait just gives the orphaned worker thread time to
complete its HTTP call before the assertion checks final stats.

Co-Authored-By: Claude <noreply@anthropic.com>
* reproduce-flaky-shutdown-executes-only-once:
  Add reproducer for flaky shutdown 'executes only once' test
* fix-flaky-shutdown-executes-only-once:
  Fix flaky shutdown 'executes only once' test
@p-datadog p-datadog added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Apr 11, 2026
@github-actions github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Apr 11, 2026
@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 bot commented Apr 11, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 95.34% (-0.03%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 8434186 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos dev/testing Involves testing processes (e.g. RSpec)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants