symbol_database: add upload metadata fields to upload event message by andreimatei · Pull Request #5720 · DataDog/dd-trace-rb

andreimatei · 2026-05-08T17:53:16Z

Add the following fields to the SymDB upload event message that accompanies each multipart upload:

"version" (top-level): the service version
"language" (top-level): "ruby"
"upload_id" (top-level): a UUID generated once per process and shared by all batches uploaded by the process. Detection is by Process.pid comparison, so a forked child observes a fresh PID and gets a new upload_id and batch counter.
"batch_num" (top-level): 1-indexed counter incremented per upload, reset alongside upload_id when the PID changes.
"final" (top-level): always false; the Ruby tracer continuously uploads new code as files are loaded, so there is no defined end-of-upload point.
"attachment_size" (top-level): size in bytes of the gzipped attachment.

build_event_metadata is now called per upload with the attachment size and a freshly-incremented batch number.

Some of these fields are new, to be used by the backend in the future. Others duplicate info that was already included in the attachment; by duplicating some metadata out of the SymDB attachment body into the EvP event body, the backend can populate per-attachment bookkeeping without downloading the attachment.

dd-octo-sts · 2026-05-08T17:53:28Z

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

^{Visited at: 2026-05-08 17:53:28 UTC}

dd-octo-sts · 2026-05-08T17:53:42Z

Typing analysis

Note: Ignored files are excluded from the next sections.

`steep:ignore` comments

This PR introduces 1 steep:ignore comment, and clears 1 steep:ignore comment.

steep:ignore comments (+1-1)

❌ Introduced:

lib/datadog/symbol_database/uploader.rb:201

✅ Cleared:

lib/datadog/symbol_database/uploader.rb:148

Untyped methods

This PR introduces 2 partially typed methods, and clears 2 partially typed methods. It increases the percentage of typed methods from 62.24% to 62.26% (+0.02%).

Partially typed methods (+2-2)

❌ Introduced:

sig/datadog/symbol_database/service_version.rbs:46
└── def to_h: () -> Hash[::Symbol, untyped]
sig/datadog/symbol_database/service_version.rbs:48
└── def to_json: (?untyped _state) -> String

✅ Cleared:

sig/datadog/symbol_database/service_version.rbs:31
└── def to_h: () -> Hash[::Symbol, untyped]
sig/datadog/symbol_database/service_version.rbs:33
└── def to_json: (?untyped _state) -> String

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08b3dc0c2c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T17:56:57Z

+      def next_upload_metadata
+        if @upload_pid != Process.pid
+          @upload_pid = Process.pid
+          @upload_id = SecureRandom.uuid.freeze
+          @batch_num = 0
+        end
+        @batch_num += 1
+        [@upload_id, @batch_num]


Serialize upload metadata updates

When two batch flushes call upload_scopes concurrently (for example, a size-triggered flush and the timer/shutdown path, since ScopeBatcher performs uploads outside its mutex), this lazy state update can race: both threads can observe an uninitialized or old @upload_pid/@batch_num, generate different upload_ids for the same process, or emit the same batch_num. That violates the new contract that all batches in a process share one upload id with monotonically increasing batch numbers, so protect this section with a mutex or initialize the process metadata eagerly.

Useful? React with 👍 / 👎.

fixed by adding a mutex

datadog-official · 2026-05-08T18:13:03Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 97.15% (-0.01%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 10bedf5 | Docs | Datadog PR Page | Give us feedback!}

pr-commenter · 2026-05-08T18:18:14Z

Benchmarks

Benchmark execution time: 2026-05-13 16:08:10

Comparing candidate commit 10bedf5 in PR branch andrei/symdb-upload-fields with baseline commit 0da179b in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

vpellan · 2026-05-11T10:02:50Z

+          uploadId: upload_id,
+          batchNum: batch_num,
+          # Always false: the Ruby tracer continuously uploads new code
+          # as files are loaded; there is no defined end-of-upload point.
+          final: false,
+          attachmentSize: attachment_size,


according to the PR description this should be in snake_case?

The commit message was wrong; we want camelCase in the event, and snake_case in the attachment. Fixed it now, thanks.

…nd attachment Add the following fields to the SymDB upload event message that accompanies each multipart upload (camelCase, matching the rest of the EvP event schema): - "version" (top-level): the service version - "language" (top-level): "ruby" - "uploadId" (top-level): a UUID generated once per process and shared by all batches uploaded by the process. Detection is by Process.pid comparison, so a forked child observes a fresh PID and gets a new uploadId and batch counter. - "batchNum" (top-level): 1-indexed counter incremented per upload, reset alongside uploadId when the PID changes. - "final" (top-level): always false; the Ruby tracer continuously uploads new code as files are loaded, so there is no defined end-of-upload point. - "attachmentSize" (top-level): size in bytes of the gzipped attachment. Also add the same metadata to the gzipped attachment body via the ServiceVersion wrapper (snake_case to match the rest of the attachment scope schema): - "upload_id" - "batch_num" - "final" uploadId/batchNum are computed once per upload_scopes call so both the attachment and the event JSON carry the same values. Some of these fields are new, to be used by the backend in the future. Others duplicate info that was already included in the attachment; by duplicating some metadata out of the SymDB attachment body into the EvP event body, the backend can populate per-attachment bookkeeping without downloading the attachment.

andreimatei requested a review from p-datadog May 8, 2026 17:53

andreimatei requested a review from a team as a code owner May 8, 2026 17:53

dd-octo-sts Bot added the debugger Live Debugger (+Dynamic Instrumentation, +Symbol Database) label May 8, 2026

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

andreimatei force-pushed the andrei/symdb-upload-fields branch from 08b3dc0 to 3d78f32 Compare May 8, 2026 22:58

vpellan reviewed May 11, 2026

View reviewed changes

andreimatei force-pushed the andrei/symdb-upload-fields branch 3 times, most recently from e1aee62 to abad187 Compare May 12, 2026 19:43

p-datadog self-assigned this May 12, 2026

andreimatei force-pushed the andrei/symdb-upload-fields branch from abad187 to 10bedf5 Compare May 13, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

symbol_database: add upload metadata fields to upload event message#5720

symbol_database: add upload metadata fields to upload event message#5720
andreimatei wants to merge 1 commit into
masterfrom
andrei/symdb-upload-fields

andreimatei commented May 8, 2026

Uh oh!

dd-octo-sts Bot commented May 8, 2026

Uh oh!

dd-octo-sts Bot commented May 8, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

andreimatei May 11, 2026

Uh oh!

datadog-official Bot commented May 8, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented May 8, 2026 •

edited

Loading

Explanation

More details about the CI and significant changes

Uh oh!

vpellan May 11, 2026

Uh oh!

andreimatei May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andreimatei commented May 8, 2026

Uh oh!

dd-octo-sts Bot commented May 8, 2026

Uh oh!

dd-octo-sts Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

steep:ignore comments

Untyped methods

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

andreimatei May 11, 2026

Choose a reason for hiding this comment

Uh oh!

datadog-official Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Uh oh!

vpellan May 11, 2026

Choose a reason for hiding this comment

Uh oh!

andreimatei May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dd-octo-sts Bot commented May 8, 2026 •

edited

Loading

`steep:ignore` comments

datadog-official Bot commented May 8, 2026 •

edited

Loading

pr-commenter Bot commented May 8, 2026 •

edited

Loading