@@ -151,13 +151,14 @@ Partitioners change the read behavior of batch reads that use the {+connector-sh
151
151
dividing the data into partitions, you can run transformations in parallel.
152
152
153
153
This section contains configuration information for the following
154
- partitioners :
154
+ partitioner :
155
155
156
156
- :ref:`SamplePartitioner <conf-samplepartitioner>`
157
157
- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
158
158
- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
159
159
- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
160
160
- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
161
+ - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
161
162
162
163
.. note:: Batch Reads Only
163
164
@@ -302,6 +303,54 @@ The ``SinglePartitionPartitioner`` configuration creates a single partition.
302
303
To use this configuration, set the ``partitioner`` configuration option to
303
304
``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
304
305
306
+ .. _conf-autobucketpartitioner:
307
+
308
+ ``AutoBucketPartitioner`` Configuration
309
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
310
+
311
+ The ``AutoBucketPartitioner`` configuration is similar to the
312
+ :ref:`SamplePartitioner <conf-samplepartitioner>`
313
+ configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
314
+ aggregation stage to paginate the data. By using this configuration,
315
+ you can partition the data across single or multiple fields, including nested fields.
316
+
317
+ To use this configuration, set the ``partitioner`` configuration option to
318
+ ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
319
+
320
+ .. list-table::
321
+ :header-rows: 1
322
+ :widths: 35 65
323
+
324
+ * - Property name
325
+ - Description
326
+
327
+ * - ``partitioner.options.partition.fieldList``
328
+ - The list of fields to use for partitioning. The value can be either a single field
329
+ name or a list of comma-separated fields.
330
+
331
+ **Default:** ``_id``
332
+
333
+ * - ``partitioner.options.partition.chunkSize``
334
+ - The average size (MB) for each partition. Smaller partition sizes
335
+ create more partitions containing fewer documents.
336
+ Because this configuration uses the average document size to determine the number of
337
+ documents per partition, partitions might not be the same size.
338
+
339
+ **Default:** ``64``
340
+
341
+ * - ``partitioner.options.partition.samplesPerPartition``
342
+ - The number of samples to take per partition.
343
+
344
+ **Default:** ``100``
345
+
346
+ * - ``partitioner.options.partition.partitionKeyProjectionField``
347
+ - The field name to use for a projected field that contains all the
348
+ fields used to partition the collection.
349
+ We recommend changing the value of this property only if each document already
350
+ contains the ``__idx`` field.
351
+
352
+ **Default:** ``__idx``
353
+
305
354
Specifying Properties in ``connection.uri``
306
355
-------------------------------------------
307
356
0 commit comments