Skip to content

symbol_database: add upload metadata fields to upload event message#5720

Open
andreimatei wants to merge 1 commit into
masterfrom
andrei/symdb-upload-fields
Open

symbol_database: add upload metadata fields to upload event message#5720
andreimatei wants to merge 1 commit into
masterfrom
andrei/symdb-upload-fields

Conversation

@andreimatei
Copy link
Copy Markdown

Add the following fields to the SymDB upload event message that accompanies each multipart upload:

  • "version" (top-level): the service version
  • "language" (top-level): "ruby"
  • "upload_id" (top-level): a UUID generated once per process and shared by all batches uploaded by the process. Detection is by Process.pid comparison, so a forked child observes a fresh PID and gets a new upload_id and batch counter.
  • "batch_num" (top-level): 1-indexed counter incremented per upload, reset alongside upload_id when the PID changes.
  • "final" (top-level): always false; the Ruby tracer continuously uploads new code as files are loaded, so there is no defined end-of-upload point.
  • "attachment_size" (top-level): size in bytes of the gzipped attachment.

build_event_metadata is now called per upload with the attachment size and a freshly-incremented batch number.

Some of these fields are new, to be used by the backend in the future. Others duplicate info that was already included in the attachment; by duplicating some metadata out of the SymDB attachment body into the EvP event body, the backend can populate per-attachment bookkeeping without downloading the attachment.

@andreimatei andreimatei requested a review from p-datadog May 8, 2026 17:53
@andreimatei andreimatei requested a review from a team as a code owner May 8, 2026 17:53
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 8, 2026

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2026-05-08 17:53:28 UTC

@dd-octo-sts dd-octo-sts Bot added the debugger Live Debugger (+Dynamic Instrumentation, +Symbol Database) label May 8, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 8, 2026

Typing analysis

Note: Ignored files are excluded from the next sections.

steep:ignore comments

This PR introduces 1 steep:ignore comment, and clears 1 steep:ignore comment.

steep:ignore comments (+1-1)Introduced:
lib/datadog/symbol_database/uploader.rb:201
Cleared:
lib/datadog/symbol_database/uploader.rb:148

Untyped methods

This PR introduces 2 partially typed methods, and clears 2 partially typed methods. It increases the percentage of typed methods from 62.24% to 62.26% (+0.02%).

Partially typed methods (+2-2)Introduced:
sig/datadog/symbol_database/service_version.rbs:46
└── def to_h: () -> Hash[::Symbol, untyped]
sig/datadog/symbol_database/service_version.rbs:48
└── def to_json: (?untyped _state) -> String
Cleared:
sig/datadog/symbol_database/service_version.rbs:31
└── def to_h: () -> Hash[::Symbol, untyped]
sig/datadog/symbol_database/service_version.rbs:33
└── def to_json: (?untyped _state) -> String

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept on the line before the definition to remove it from the stats.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08b3dc0c2c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/datadog/symbol_database/uploader.rb Outdated
Comment on lines +155 to +162
def next_upload_metadata
if @upload_pid != Process.pid
@upload_pid = Process.pid
@upload_id = SecureRandom.uuid.freeze
@batch_num = 0
end
@batch_num += 1
[@upload_id, @batch_num]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize upload metadata updates

When two batch flushes call upload_scopes concurrently (for example, a size-triggered flush and the timer/shutdown path, since ScopeBatcher performs uploads outside its mutex), this lazy state update can race: both threads can observe an uninitialized or old @upload_pid/@batch_num, generate different upload_ids for the same process, or emit the same batch_num. That violates the new contract that all batches in a process share one upload id with monotonically increasing batch numbers, so protect this section with a mutex or initialize the process metadata eagerly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by adding a mutex

@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 8, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 97.15% (-0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 10bedf5 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 8, 2026

Benchmarks

Benchmark execution time: 2026-05-13 16:08:10

Comparing candidate commit 10bedf5 in PR branch andrei/symdb-upload-fields with baseline commit 0da179b in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@andreimatei andreimatei force-pushed the andrei/symdb-upload-fields branch from 08b3dc0 to 3d78f32 Compare May 8, 2026 22:58
Comment on lines +156 to +161
uploadId: upload_id,
batchNum: batch_num,
# Always false: the Ruby tracer continuously uploads new code
# as files are loaded; there is no defined end-of-upload point.
final: false,
attachmentSize: attachment_size,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the PR description this should be in snake_case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message was wrong; we want camelCase in the event, and snake_case in the attachment. Fixed it now, thanks.

@andreimatei andreimatei force-pushed the andrei/symdb-upload-fields branch 3 times, most recently from e1aee62 to abad187 Compare May 12, 2026 19:43
@p-datadog p-datadog self-assigned this May 12, 2026
…nd attachment

Add the following fields to the SymDB upload event message that
accompanies each multipart upload (camelCase, matching the rest of
the EvP event schema):

- "version" (top-level): the service version
- "language" (top-level): "ruby"
- "uploadId" (top-level): a UUID generated once per process and
  shared by all batches uploaded by the process. Detection is by
  Process.pid comparison, so a forked child observes a fresh PID and
  gets a new uploadId and batch counter.
- "batchNum" (top-level): 1-indexed counter incremented per upload,
  reset alongside uploadId when the PID changes.
- "final" (top-level): always false; the Ruby tracer continuously
  uploads new code as files are loaded, so there is no defined
  end-of-upload point.
- "attachmentSize" (top-level): size in bytes of the gzipped
  attachment.

Also add the same metadata to the gzipped attachment body via the
ServiceVersion wrapper (snake_case to match the rest of the
attachment scope schema):

- "upload_id"
- "batch_num"
- "final"

uploadId/batchNum are computed once per upload_scopes call so both
the attachment and the event JSON carry the same values.

Some of these fields are new, to be used by the backend in the future.
Others duplicate info that was already included in the attachment; by
duplicating some metadata out of the SymDB attachment body into the EvP
event body, the backend can populate per-attachment bookkeeping without
downloading the attachment.
@andreimatei andreimatei force-pushed the andrei/symdb-upload-fields branch from abad187 to 10bedf5 Compare May 13, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

debugger Live Debugger (+Dynamic Instrumentation, +Symbol Database)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants