fix: increase OP_MSG overhead margin in BufferedBulkInserter#981
fix: increase OP_MSG overhead margin in BufferedBulkInserter#981jslay-webflow wants to merge 2 commits intomongodb:masterfrom
Conversation
The previous margin of 100 bytes subtracted from MAX_MESSAGE_SIZE_BYTES was insufficient to account for the full OP_MSG wire protocol overhead. The overhead includes: - OP_MSG header (16 bytes) - flagBits (4 bytes) - Section kind 0: insert command body with collection name, ordered flag, write concern (~200+ bytes depending on names) - Section kind 1: document sequence header (4 + 4 + identifier) When restoring collections with many large documents (e.g. ~48KB each), the accumulated overhead pushed the wire message to 48000149 bytes, exceeding the server's maxMessageSizeBytes limit of 48000000. This caused mongorestore to fail with: recv(): message msgLen 48000149 is invalid. Max: 48000000 Increase the margin from 100 to 1024 bytes to safely accommodate the worst-case overhead for any collection name length and write concern configuration.
Verify that the byte limit margin (1024 bytes) is sufficient to accommodate the OP_MSG wire protocol overhead for any realistic namespace length. Also demonstrates that the previous 100-byte margin was too small for namespaces longer than ~11 characters combined (db + collection).
|
Hi @jslay-webflow, thanks for this! We require that external contributors sign the MongoDB Contributor Agreement. This will allow us to review and accept your contributions. Can you let us know when you've signed this? Then we'll see about reviewing it properly. |
|
Completed and signed. |
|
Hi @jslay-webflow, thanks again for this! Before I see about merging it, can you confirm what version of the tools you were testing with? I ask because we released 100.16.0 last week, which had a fix for TOOLS-4145 in it; I wonder if that fixes the problem you were seeing. |
|
@jslay-webflow Thanks for the PR! I think the size of the overhead depends on length of the namespace (255 bytes at max) and number of documents in the buffer. In case there are many documents in the buffer, a margin of 1024 bytes might not be enough. I would suggest increasing the margin more aggressively to 1MB since it's not easy to predict the message size in the tools. |
BufferedBulkInsertersubtracts 100 bytes frommaxMessageSizeBytes(48000000) to leave room for the OP_MSG wire overhead. this margin is too small.we hit this doing mongorestore on a ~27GB mongo 7.0 instance. two collections consistently produce wire messages of exactly 48000149 bytes -- 149 bytes over the limit. the server rejects them:
the
byteCountinaddModeltracks raw BSON document sizes, but the actual wire message includes the OP_MSG header (16 bytes), flagBits (4), the insert command body with collection name and options, and document sequence section headers. for collections with longer namespaces or write concern options, this overhead easily exceeds 100 bytes.bumped the margin from 100 to 1024 bytes. this is conservative but still allows batches very close to the theoretical max. the alternative is tracking the exact overhead per-message, but that would require awareness of the collection namespace length and write concern at the
BufferedBulkInserterlevel, which seems like overkill for a safety margin.found this while setting up nightly mongodump/mongorestore backups for a dev cluster. both archive-format and directory-format restores failed reliably on certain collections with mongo 7.0 tools. mongo 5.0 tools worked against the same dump, presumably because of smaller default batch sizes that never got close enough to the limit. the 100-byte margin has probably been insufficient for a while but only surfaces when you have enough large documents to fill a batch right up to the limit.
how to reproduce
any collection at a namespace like
testdb.large_documents(total namespace > ~11 chars) with enough ~48KB documents to fill a batch to the byte limit will trigger this. the OP_MSG overhead for that namespace is ~107 bytes, exceeding the 100-byte margin.tests
added unit tests that verify the 1024-byte margin covers the OP_MSG overhead for namespaces of various lengths, and that the old 100-byte margin was insufficient for realistic namespaces.