Skip to content

Commit 4c30e46

Browse files
mongoKartgithub-actions[bot]
authored andcommitted
DOCSP-41381 - Compound Keys (#210)
(cherry picked from commit 615defc)
1 parent 2b0f1a0 commit 4c30e46

File tree

1 file changed

+50
-1
lines changed

1 file changed

+50
-1
lines changed

source/batch-mode/batch-read-config.txt

+50-1
Original file line numberDiff line numberDiff line change
@@ -151,13 +151,14 @@ Partitioners change the read behavior of batch reads that use the {+connector-sh
151151
dividing the data into partitions, you can run transformations in parallel.
152152

153153
This section contains configuration information for the following
154-
partitioners:
154+
partitioner:
155155

156156
- :ref:`SamplePartitioner <conf-samplepartitioner>`
157157
- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
158158
- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
159159
- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
160160
- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
161+
- :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
161162

162163
.. note:: Batch Reads Only
163164

@@ -302,6 +303,54 @@ The ``SinglePartitionPartitioner`` configuration creates a single partition.
302303
To use this configuration, set the ``partitioner`` configuration option to
303304
``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
304305

306+
.. _conf-autobucketpartitioner:
307+
308+
``AutoBucketPartitioner`` Configuration
309+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
310+
311+
The ``AutoBucketPartitioner`` configuration is similar to the
312+
:ref:`SamplePartitioner <conf-samplepartitioner>`
313+
configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
314+
aggregation stage to paginate the data. By using this configuration,
315+
you can partition the data across single or multiple fields, including nested fields.
316+
317+
To use this configuration, set the ``partitioner`` configuration option to
318+
``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
319+
320+
.. list-table::
321+
:header-rows: 1
322+
:widths: 35 65
323+
324+
* - Property name
325+
- Description
326+
327+
* - ``partitioner.options.partition.fieldList``
328+
- The list of fields to use for partitioning. The value can be either a single field
329+
name or a list of comma-separated fields.
330+
331+
**Default:** ``_id``
332+
333+
* - ``partitioner.options.partition.chunkSize``
334+
- The average size (MB) for each partition. Smaller partition sizes
335+
create more partitions containing fewer documents.
336+
Because this configuration uses the average document size to determine the number of
337+
documents per partition, partitions might not be the same size.
338+
339+
**Default:** ``64``
340+
341+
* - ``partitioner.options.partition.samplesPerPartition``
342+
- The number of samples to take per partition.
343+
344+
**Default:** ``100``
345+
346+
* - ``partitioner.options.partition.partitionKeyProjectionField``
347+
- The field name to use for a projected field that contains all the
348+
fields used to partition the collection.
349+
We recommend changing the value of this property only if each document already
350+
contains the ``__idx`` field.
351+
352+
**Default:** ``__idx``
353+
305354
Specifying Properties in ``connection.uri``
306355
-------------------------------------------
307356

0 commit comments

Comments
 (0)