Skip to content

DOCSP-48557 Update Spark streaming write configuration to include all batch options #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 25 additions & 23 deletions source/batch-mode/batch-write-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ You can configure the following properties when writing data to MongoDB in batch
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``

* - ``convertJson``
- | Specifies whether the connector parses the string and converts extended JSON
- | Specifies if the connector parses string values and converts extended JSON
into BSON.
|
| This setting accepts the following values:
Expand All @@ -85,7 +85,7 @@ You can configure the following properties when writing data to MongoDB in batch
| **Default:** ``false``

* - ``idFieldList``
- | Field or list of fields by which to split the collection data. To
- | Specifies a field or list of fields by which to split the collection data. To
specify more than one field, separate them using a comma as shown
in the following example:

Expand Down Expand Up @@ -131,41 +131,43 @@ You can configure the following properties when writing data to MongoDB in batch
| **Default:** ``true``

* - ``upsertDocument``
- | When ``true``, replace and update operations will insert the data
- | When ``true``, replace and update operations insert the data
if no match exists.
|
| For time series collections, you must set ``upsertDocument`` to
``false``.
|
| **Default:** ``true``

* - ``writeConcern.journal``
- | Specifies ``j``, a write-concern option to enable request for
acknowledgment that the data is confirmed on on-disk journal for
the criteria specified in the ``w`` option. You can specify
either ``true`` or ``false``.
|
| For more information on ``j`` values, see the MongoDB server
guide on the
:manual:`WriteConcern j option </reference/write-concern/#j-option>`.

* - ``writeConcern.w``
- | Specifies ``w``, a write-concern option to request acknowledgment
that the write operation has propagated to a specified number of
MongoDB nodes. For a list
of allowed values for this option, see :manual:`WriteConcern
</reference/write-concern/#w-option>` in the MongoDB manual.
- | Specifies ``w``, a write-concern option requesting acknowledgment that
the write operation has propagated to a specified number of MongoDB
nodes.
|
| For a list of allowed values for this option, see :manual:`WriteConcern
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
manual.
|
| **Default:** ``majority`` or ``1``

* - ``writeConcern.journal``
- | Specifies ``j``, a write-concern option requesting acknowledgment that
the data has been written to the on-disk journal for the criteria
specified in the ``w`` option. You can specify either ``true`` or
``false``.
|
| **Default:** ``1``
| For more information on ``j`` values, see :manual:`WriteConcern j
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
manual.

* - ``writeConcern.wTimeoutMS``
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
when a write operation exceeds the number of milliseconds. If you
when a write operation exceeds the specified number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.
|
| For more information on ``wTimeoutMS`` values, see the MongoDB server
guide on the
:manual:`WriteConcern wtimeout option </reference/write-concern/#wtimeout>`.
| For more information on ``wTimeoutMS`` values, see
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
the {+mdb-server+} manual.

Specifying Properties in ``connection.uri``
-------------------------------------------
Expand Down
119 changes: 116 additions & 3 deletions source/streaming-mode/streaming-write-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,126 @@ You can configure the following properties when writing data to MongoDB in strea
interface.
|
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``

* - ``convertJson``
- | Specifies if the connector parses string values and converts extended JSON
into BSON.
|
| This setting accepts the following values:

- ``any``: The connector converts all JSON values to BSON.

- ``"{a: 1}"`` becomes ``{a: 1}``.
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
- ``"true"`` becomes ``true``.
- ``"01234"`` becomes ``1234``.
- ``"{a:b:c}"`` doesn't change.

- ``objectOrArrayOnly``: The connector converts only JSON objects and arrays to
BSON.

- ``"{a: 1}"`` becomes ``{a: 1}``.
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
- ``"true"`` doesn't change.
- ``"01234"`` doesn't change.
- ``"{a:b:c}"`` doesn't change.

- ``false``: The connector leaves all values as strings.

| **Default:** ``false``

* - ``idFieldList``
- | Specifies a field or list of fields by which to split the collection data. To
specify more than one field, separate them using a comma as shown
in the following example:

.. code-block:: none
:copyable: false

"fieldName1,fieldName2"

| **Default:** ``_id``

* - ``ignoreNullValues``
- | When ``true``, the connector ignores any ``null`` values when writing,
including ``null`` values in arrays and nested documents.
|
| **Default:** ``false``

* - ``maxBatchSize``
- | Specifies the maximum number of operations to batch in bulk
operations.
|
| **Default:** ``512``

* - ``operationType``
- | Specifies the type of write operation to perform. You can set
this to one of the following values:

- ``insert``: Insert the data.
- ``replace``: Replace an existing document that matches the
``idFieldList`` value with the new data. If no match exists, the
value of ``upsertDocument`` indicates whether the connector
inserts a new document.
- ``update``: Update an existing document that matches the
``idFieldList`` value with the new data. If no match exists, the
value of ``upsertDocument`` indicates whether the connector
inserts a new document.

|
| **Default:** ``replace``

* - ``ordered``
- | Specifies whether to perform ordered bulk operations.
|
| **Default:** ``true``

* - ``upsertDocument``
- | When ``true``, replace and update operations insert the data
if no match exists.
|
| For time series collections, you must set ``upsertDocument`` to
``false``.
|
| **Default:** ``true``

* - ``writeConcern.w``
- | Specifies ``w``, a write-concern option requesting acknowledgment that
the write operation has propagated to a specified number of MongoDB
nodes.
|
| For a list of allowed values for this option, see :manual:`WriteConcern
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
manual.
|
| **Default:** ``majority`` or ``1``

* - ``writeConcern.journal``
- | Specifies ``j``, a write-concern option requesting acknowledgment that
the data has been written to the on-disk journal for the criteria
specified in the ``w`` option. You can specify either ``true`` or
``false``.
|
| For more information on ``j`` values, see :manual:`WriteConcern j
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
manual.

* - ``writeConcern.wTimeoutMS``
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
when a write operation exceeds the specified number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.
|
| For more information on ``wTimeoutMS`` values, see
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
the {+mdb-server+} manual.

* - ``checkpointLocation``
- | The absolute file path of the directory to which the connector writes checkpoint
- | The absolute file path of the directory where the connector writes checkpoint
information.
|
| For more information about checkpoints, see the
`Spark Structured Streaming Programming Guide <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing>`__
| For more information about checkpoints, see the `Spark Structured
Streaming Programming Guide
<https://spark.apache.org/docs/latest/streaming/index.html>`__
|
| **Default:** None

Expand Down
Loading