Skip to content

fix: increase OP_MSG overhead margin in BufferedBulkInserter#981

Open
jslay-webflow wants to merge 2 commits intomongodb:masterfrom
jslay-webflow:fix-max-message-size-overhead
Open

fix: increase OP_MSG overhead margin in BufferedBulkInserter#981
jslay-webflow wants to merge 2 commits intomongodb:masterfrom
jslay-webflow:fix-max-message-size-overhead

Conversation

@jslay-webflow
Copy link
Copy Markdown

BufferedBulkInserter subtracts 100 bytes from maxMessageSizeBytes (48000000) to leave room for the OP_MSG wire overhead. this margin is too small.

we hit this doing mongorestore on a ~27GB mongo 7.0 instance. two collections consistently produce wire messages of exactly 48000149 bytes -- 149 bytes over the limit. the server rejects them:

recv(): message msgLen 48000149 is invalid. Min 16 Max: 48000000

the byteCount in addModel tracks raw BSON document sizes, but the actual wire message includes the OP_MSG header (16 bytes), flagBits (4), the insert command body with collection name and options, and document sequence section headers. for collections with longer namespaces or write concern options, this overhead easily exceeds 100 bytes.

bumped the margin from 100 to 1024 bytes. this is conservative but still allows batches very close to the theoretical max. the alternative is tracking the exact overhead per-message, but that would require awareness of the collection namespace length and write concern at the BufferedBulkInserter level, which seems like overkill for a safety margin.

found this while setting up nightly mongodump/mongorestore backups for a dev cluster. both archive-format and directory-format restores failed reliably on certain collections with mongo 7.0 tools. mongo 5.0 tools worked against the same dump, presumably because of smaller default batch sizes that never got close enough to the limit. the 100-byte margin has probably been insufficient for a while but only surfaces when you have enough large documents to fill a batch right up to the limit.

how to reproduce

any collection at a namespace like testdb.large_documents (total namespace > ~11 chars) with enough ~48KB documents to fill a batch to the byte limit will trigger this. the OP_MSG overhead for that namespace is ~107 bytes, exceeding the 100-byte margin.

tests

added unit tests that verify the 1024-byte margin covers the OP_MSG overhead for namespaces of various lengths, and that the old 100-byte margin was insufficient for realistic namespaces.

The previous margin of 100 bytes subtracted from
MAX_MESSAGE_SIZE_BYTES was insufficient to account for the full
OP_MSG wire protocol overhead. The overhead includes:

- OP_MSG header (16 bytes)
- flagBits (4 bytes)
- Section kind 0: insert command body with collection name,
  ordered flag, write concern (~200+ bytes depending on names)
- Section kind 1: document sequence header (4 + 4 + identifier)

When restoring collections with many large documents (e.g. ~48KB
each), the accumulated overhead pushed the wire message to
48000149 bytes, exceeding the server's maxMessageSizeBytes limit
of 48000000. This caused mongorestore to fail with:

  recv(): message msgLen 48000149 is invalid. Max: 48000000

Increase the margin from 100 to 1024 bytes to safely accommodate
the worst-case overhead for any collection name length and write
concern configuration.
Verify that the byte limit margin (1024 bytes) is sufficient to
accommodate the OP_MSG wire protocol overhead for any realistic
namespace length. Also demonstrates that the previous 100-byte
margin was too small for namespaces longer than ~11 characters
combined (db + collection).
@jslay-webflow jslay-webflow requested a review from a team as a code owner April 3, 2026 09:31
@jslay-webflow jslay-webflow requested review from mankawal and removed request for a team April 3, 2026 09:31
@mmcclimon mmcclimon requested review from mmcclimon and removed request for mankawal April 3, 2026 19:34
@mmcclimon
Copy link
Copy Markdown
Contributor

Hi @jslay-webflow, thanks for this! We require that external contributors sign the MongoDB Contributor Agreement. This will allow us to review and accept your contributions. Can you let us know when you've signed this? Then we'll see about reviewing it properly.

@jslay-webflow
Copy link
Copy Markdown
Author

Completed and signed.

@mmcclimon
Copy link
Copy Markdown
Contributor

mmcclimon commented Apr 6, 2026

Hi @jslay-webflow, thanks again for this! Before I see about merging it, can you confirm what version of the tools you were testing with? I ask because we released 100.16.0 last week, which had a fix for TOOLS-4145 in it; I wonder if that fixes the problem you were seeing.

Copy link
Copy Markdown
Contributor

@mmcclimon mmcclimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Waiting to do anything more on this because I think this will have been fixed by #952.)

@tdq45gj
Copy link
Copy Markdown
Contributor

tdq45gj commented Apr 7, 2026

@jslay-webflow Thanks for the PR! I think the size of the overhead depends on length of the namespace (255 bytes at max) and number of documents in the buffer. In case there are many documents in the buffer, a margin of 1024 bytes might not be enough. I would suggest increasing the margin more aggressively to 1MB since it's not easy to predict the message size in the tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants