Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCSP-47835 Clarify batchsize behavior #215

Merged
merged 17 commits into from
Mar 11, 2025
16 changes: 8 additions & 8 deletions source/includes/extracts-watch-option.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
ref: watch-option-batchSize
content: |
Specifies the batch size for the cursor, which will apply to both the initial
``aggregate`` command and any subsequent ``getMore`` commands. This determines
the maximum number of change events to return in each response from the
The maximum number of documents within each batch returned in a query result, which applies
to the ``aggregate`` command. By default, the ``aggregate`` command has an initial batch size of
``101`` documents and a maximum size of 16 mebibytes for each subsequent batch. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``101`` documents and a maximum size of 16 mebibytes for each subsequent batch. This
``101`` documents and a maximum size of 16 mebibytes (MiB) for each subsequent batch. This

option can enforce a smaller limit than 16 mebibytes, but not a larger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
option can enforce a smaller limit than 16 mebibytes, but not a larger
option can enforce a smaller limit than 16 MiB, but not a larger

one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.

This command determines the maximum number of change events to return in each response from the
server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should you also remove this sentence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out, I'll remove!


Irrespective of the ``batchSize`` option, the initial ``aggregate`` command
response for a change stream generally does not include any documents
unless another option is used to configure its starting point (e.g.
``startAfter``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this earlier, but this paragraph seems like a relevant bit of information to preserve.

---
ref: watch-option-fullDocument
content: |
Expand Down
7 changes: 6 additions & 1 deletion source/read/retrieve.txt
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,12 @@ you can set in the array:
- Description

* - ``batchSize``
- | The number of documents to return per batch. The default value is ``101``.
- | The maximum number of documents within each batch returned in a query result. By default,
the ``find()`` method has an initial batch size of ``101`` documents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this section is talking about both find() and findOne(), I think this can just refer to "the find command" (used by both helpers).

Therefore, this snippet will just duplicate the first paragraph from the batchSize docs in MongoDBCollection-find.txt.

and a maximum size of 16 mebibytes for each subsequent batch. This
option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and a maximum size of 16 mebibytes for each subsequent batch. This
option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.
and a maximum size of 16 mebibytes (MiB) for each subsequent batch. This
option can enforce a smaller limit than 16 MiB, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.

| **Type**: ``integer``

* - ``collation``
Expand Down
12 changes: 9 additions & 3 deletions source/reference/method/MongoDBCollection-find.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,15 @@ Parameters

* - batchSize
- integer
- The number of documents to return in the first batch. Defaults to
``101``. A batchSize of ``0`` means that the cursor will be
established, but no documents will be returned in the first batch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special behavior of batchSize: 0 is notable, and also discussed in the find command docs. Particularly with aggregate operations, there is a legitimate use case for wanting to start the aggregation on the server without waiting for the first result to become available. Allowing a cursor to be created immediately and iterated later (via batchSize: 0 satisfies that).

- The maximum number of documents to return in the first batch. By default, the ``find()``
command has an initial batch size of ``101`` documents. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely specify:

...and a maximum size of 16 mebibytes for each subsequent batch

As-is, it's inconsistent with the other docs snippets and could mislead users into thinking this has no effect on getMore for subsequent batches.

option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command has an initial batch size of ``101`` documents. This
option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.
command has an initial batch size of ``101`` documents. This
option can enforce a smaller limit than 16 mebibytes (MiB), but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.


A batchSize of ``0`` means that the cursor will be established, but no documents
will be returned in the first batch. This may be useful for quickly returning a cursor
or failure from ``aggregate`` without doing significant server-side work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: remove aggregate from this description as its about the find method. I think JM meant that setting this option is useful for aggregation contexts but that users might use it for finds too


Unlike the previous wire protocol version, a batchSize of ``1`` for the
:dbcommand:`find` command does not close the cursor.
Expand Down
20 changes: 10 additions & 10 deletions source/reference/method/MongoDBCollection-listSearchIndexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,16 @@ Parameters

* - batchSize
- integer
- Specifies the batch size for the cursor, which will apply to both the
initial ``aggregate`` command and any subsequent ``getMore`` commands.
This determines the maximum number of documents to return in each
response from the server.

A batchSize of ``0`` is special in that and will only apply to the
initial ``aggregate`` command; subsequent ``getMore`` commands will use
the server's default batch size. This may be useful for quickly
returning a cursor or failure from ``aggregate`` without doing
significant server-side work.
- The maximum number of documents within each batch returned in a query result, which applies
to the ``aggregate`` command. By default, the ``aggregate`` command has an initial batch size of
``101`` documents and a maximum size of 16 mebibytes for each subsequent batch. This
option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.

A batchSize of ``0`` means that the cursor will be established, but no documents
will be returned in the first batch. This may be useful for quickly returning a cursor
or failure from ``aggregate`` without doing significant server-side work.

* - codec
- MongoDB\\Codec\\DocumentCodec
Expand Down
20 changes: 10 additions & 10 deletions source/reference/method/MongoDBDatabase-aggregate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,16 @@ Parameters

* - batchSize
- integer
- Specifies the batch size for the cursor, which will apply to both the
initial ``aggregate`` command and any subsequent ``getMore`` commands.
This determines the maximum number of documents to return in each
response from the server.
Copy link
Member

@jmikola jmikola Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can discuss batchSize: 0 here as we do for the find() option, since there is a legitimate use case for doing so. I'm OK with keeping this in the API reference and out of the prose docs (e.g. read/retrieve.txt), since it's more technical.

There is also a special directive in the CRUD spec for drivers to intentionally omit batchSize for pipelines that include $out or $merge, since it might prevent the pipeline from executing (and writing output). I don't think we need to mention that, though, as it is enforced internally (see: Aggregate.php).


A batchSize of ``0`` is special in that and will only apply to the
initial ``aggregate`` command; subsequent ``getMore`` commands will use
the server's default batch size. This may be useful for quickly
returning a cursor or failure from ``aggregate`` without doing
significant server-side work.
- The maximum number of documents within each batch returned in a query result.
By default, the ``aggregate`` command has an initial batch size of
``101`` documents and a maximum size of 16 mebibytes for each subsequent batch. This
option can enforce a smaller limit than 16 mebibytes, but not a larger
one. If you set ``batchSize`` to a limit that results in batches larger than
16 MiB, this option has no effect.

A batchSize of ``0`` means that the cursor will be established, but no documents
will be returned in the first batch. This may be useful for quickly returning a cursor
or failure from ``aggregate`` without doing significant server-side work.

* - bypassDocumentValidation
- boolean
Expand Down
Loading