-
Notifications
You must be signed in to change notification settings - Fork 33
DOCSP-47835 Clarify batchsize behavior #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
4d29070
bf79b9e
b02a39a
248115f
6f46523
a7bd71c
269819f
76d9bd5
69d3ed4
c18ef44
e9aab32
6183d86
89b304c
635e787
debcfe1
380a16e
892f7ea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -180,7 +180,9 @@ you can set in the array: | |||||||||||||||||||
- Description | ||||||||||||||||||||
|
||||||||||||||||||||
* - ``batchSize`` | ||||||||||||||||||||
- | The number of documents to return per batch. The default value is ``101``. | ||||||||||||||||||||
- | The limit for each batch returned in a query result. The ``find`` method has an initial batch size of ``101`` documents by default and returns a maximum of 16 | ||||||||||||||||||||
megabytes per batch. Subsequent ``getMore`` operations issued against the resulting cursor have no default batch size, | ||||||||||||||||||||
so they are limited only by the ``16`` megabyte message size. batchSize can enforce a smaller limit than ``16`` megabytes, but not a larger one. | ||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. S: this revision seems to address the issue most clearly. Please also install the VS code extension for line wrapping
Suggested change
|
||||||||||||||||||||
| **Type**: ``integer`` | ||||||||||||||||||||
|
||||||||||||||||||||
* - ``collation`` | ||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,8 +52,8 @@ Parameters | |
|
||
* - batchSize | ||
- integer | ||
- The number of documents to return in the first batch. Defaults to | ||
``101``. A batchSize of ``0`` means that the cursor will be | ||
- Sets the number of documents to return in the first batch if the number of documents to return is over ``101``. | ||
Defaults to ``101``. A batchSize of ``0`` means that the cursor will be | ||
established, but no documents will be returned in the first batch. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The special behavior of |
||
|
||
Unlike the previous wire protocol version, a batchSize of ``1`` for the | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,8 +56,16 @@ Parameters | |
- integer | ||
- Specifies the batch size for the cursor, which will apply to both the | ||
initial ``aggregate`` command and any subsequent ``getMore`` commands. | ||
This determines the maximum number of documents to return in each | ||
response from the server. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can discuss There is also a special directive in the CRUD spec for drivers to intentionally omit |
||
|
||
The ``aggregate`` command has an initial batch size of 101 documents by default. | ||
Subsequent ``getMore`` operations issued against the resulting cursor have no default batch size, | ||
so they are limited only by the ``16`` megabyte message size. | ||
|
||
If batchSize is set, ``aggregate`` and any subsequent ``getMore`` commands returns the smaller of 16 megabytes of data | ||
or batchSize documents. Operations of type ``aggregate`` return a maximum of ``16`` megabytes per batch. | ||
batchSize can enforce a smaller limit than ``16`` megabytes, but not a larger one. | ||
|
||
If batchSize is not set, ``aggregate`` and ``getMore`` commands return up to ``16`` megabytes of data. | ||
|
||
A batchSize of ``0`` is special in that and will only apply to the | ||
initial ``aggregate`` command; subsequent ``getMore`` commands will use | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S: these middle sentences dont communicate the information effectively.
Mu understanding is that this description should explain the following points:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: to my knowledge, the default size that applies to the initial command (101 documents) and the default size (up to 16 mb) for requests for more batches is different. Should I clarify that as well? I was a little confused on the sections about batchsize in the find() and aggregate() function pages especially, because those to my knowledge are only initial commands (as getmore is the request for more batches). So in the find() and aggregate() pages, do you think it makes sense to only talk about the initial command and not the requests for more batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it works like this -
When using the driver, users do not need to manually run a
getMore
command. The driver takes care of this internally. Thus, even though it is under the hood, the batch size would still apply to both the first and subsequent batches AFAIK. @jmikola do you have any insight into this clarification?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, so getMore is not applicable at all in drivers contexts? Good to know.
Also, by total data size, do you mean added up the total size of all of the batches, or that the size of each individual batch has to be under 16 mb but the total size of batches combined can be > 16 mb? I interpreted it as each individual batch has to be under 16 mb but the total size doesn't matter.
I also thought that if we had a custom setting it would apply only to documents after the first batch, given this line from the engineer discussing issues in the batchSize docs in the slack thread linked in the ticket: "batchSize is about setting a ceiling for all batches after the first (which is 101)". However, would love clarification from Jeremy as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, there is no documented restriction on the total size of all batches. The 16 MiB limit applies to each batch, and is discussed in the
batchSize
docs for thegetMore
command.I can confirm that applications do not manually run
getMore
commands. Drivers take care of that when iterating a cursor.@rustagir's understanding about the default and custom setting in this comment is correct.
Relevant docs within drivers for this would
batchSize
within the AggregateOptions and FindOptions sections of our CRUD spec, which talk about inheriting the option for subsequentgetMore
commands.Additionally, our find/getMore/killCursors spec goes into more detail about how
batchSize
interacts withlimit
andskip
. For example, when alimit
is specified drivers may use a calculated value forbatchSize
to ensure we don't return more documents than necessary and guarantee that the server closes the cursor. This isn't something we need to rehash in driver docs, though, since it's handled internally. Furthermore, the server docs don't seem to talk about that either (incursor.limit()
andcursor.batchSize()
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jeremy! Followup about MiB vs MB: is that difference salient in the docs? The internal engineering slack convo/docs refer to MiB rather than MB, but the php driver docs all refer to MB (16 MB vs 16 MiB). In your knowledge, would that difference be important to engineers using the batchsize command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the references to "MB" in the existing PHP docs are just old, and could pre-date when the server manual started using "MiB" consistently. There is no reason these can't all be changed to "MiB", although that needn't be done in this PR.
Anything relevant to this PR should use "MiB", though, so we stop introducing more instances of "MB".