symbol_database: add upload metadata fields to upload event message#5720
symbol_database: add upload metadata fields to upload event message#5720andreimatei wants to merge 1 commit into
Conversation
|
👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description. If changes need to be present in CHANGELOG.md you can state it this way **Change log entry**
Yes. A brief summary to be placed into the CHANGELOG.md(possible answers Yes/Yep/Yeah) Or you can opt out like that **Change log entry**
None.(possible answers No/Nope/None) Visited at: 2026-05-08 17:53:28 UTC |
Typing analysisNote: Ignored files are excluded from the next sections.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08b3dc0c2c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def next_upload_metadata | ||
| if @upload_pid != Process.pid | ||
| @upload_pid = Process.pid | ||
| @upload_id = SecureRandom.uuid.freeze | ||
| @batch_num = 0 | ||
| end | ||
| @batch_num += 1 | ||
| [@upload_id, @batch_num] |
There was a problem hiding this comment.
Serialize upload metadata updates
When two batch flushes call upload_scopes concurrently (for example, a size-triggered flush and the timer/shutdown path, since ScopeBatcher performs uploads outside its mutex), this lazy state update can race: both threads can observe an uninitialized or old @upload_pid/@batch_num, generate different upload_ids for the same process, or emit the same batch_num. That violates the new contract that all batches in a process share one upload id with monotonically increasing batch numbers, so protect this section with a mutex or initialize the process metadata eagerly.
Useful? React with 👍 / 👎.
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 10bedf5 | Docs | Datadog PR Page | Give us feedback! |
BenchmarksBenchmark execution time: 2026-05-13 16:08:10 Comparing candidate commit 10bedf5 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.
|
08b3dc0 to
3d78f32
Compare
| uploadId: upload_id, | ||
| batchNum: batch_num, | ||
| # Always false: the Ruby tracer continuously uploads new code | ||
| # as files are loaded; there is no defined end-of-upload point. | ||
| final: false, | ||
| attachmentSize: attachment_size, |
There was a problem hiding this comment.
according to the PR description this should be in snake_case?
There was a problem hiding this comment.
The commit message was wrong; we want camelCase in the event, and snake_case in the attachment. Fixed it now, thanks.
e1aee62 to
abad187
Compare
…nd attachment Add the following fields to the SymDB upload event message that accompanies each multipart upload (camelCase, matching the rest of the EvP event schema): - "version" (top-level): the service version - "language" (top-level): "ruby" - "uploadId" (top-level): a UUID generated once per process and shared by all batches uploaded by the process. Detection is by Process.pid comparison, so a forked child observes a fresh PID and gets a new uploadId and batch counter. - "batchNum" (top-level): 1-indexed counter incremented per upload, reset alongside uploadId when the PID changes. - "final" (top-level): always false; the Ruby tracer continuously uploads new code as files are loaded, so there is no defined end-of-upload point. - "attachmentSize" (top-level): size in bytes of the gzipped attachment. Also add the same metadata to the gzipped attachment body via the ServiceVersion wrapper (snake_case to match the rest of the attachment scope schema): - "upload_id" - "batch_num" - "final" uploadId/batchNum are computed once per upload_scopes call so both the attachment and the event JSON carry the same values. Some of these fields are new, to be used by the backend in the future. Others duplicate info that was already included in the attachment; by duplicating some metadata out of the SymDB attachment body into the EvP event body, the backend can populate per-attachment bookkeeping without downloading the attachment.
abad187 to
10bedf5
Compare
Add the following fields to the SymDB upload event message that accompanies each multipart upload:
build_event_metadata is now called per upload with the attachment size and a freshly-incremented batch number.
Some of these fields are new, to be used by the backend in the future. Others duplicate info that was already included in the attachment; by duplicating some metadata out of the SymDB attachment body into the EvP event body, the backend can populate per-attachment bookkeeping without downloading the attachment.