Skip to content

[Backport v10.4] DOCSP-41467 - Handle bad reads #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions source/batch-mode/batch-read-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,31 @@ You can configure the following properties when reading data from MongoDB in bat
|
| **Default:** None

* - ``mode``
- | The parsing strategy to use when handling documents that don't match the
expected schema. This option accepts the following values:

- ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
doesn't match the schema.
- ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
the schema. To store each invalid document as an extended JSON string,
combine this value with the ``columnNameOfCorruptRecord`` option.
- ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
the schema.

|
| **Default:** ``ReadConfig.ParseMode.FAILFAST``

* - ``columnNameOfCorruptRecord``
- | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
this option specifies the name of the new column that stores the invalid
document as extended JSON. If you're using an explicit schema, it must
include the name of the new column. If you're
using an inferred schema, the {+connector-short+} adds the new column to the
end of the schema.
|
| **Default:** None

* - ``mongoClientFactory``
- | MongoClientFactory configuration key.
| You can specify a custom implementation which must implement the
Expand Down
27 changes: 26 additions & 1 deletion source/streaming-mode/streaming-read-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,38 @@ You can configure the following properties when reading data from MongoDB in str
with a comma.
|
| To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`.

* - ``comment``
- | The comment to append to the read operation. Comments appear in the
:manual:`output of the Database Profiler. </reference/database-profiler>`
|
| **Default:** None

* - ``mode``
- | The parsing strategy to use when handling documents that don't match the
expected schema. This option accepts the following values:

- ``ReadConfig.ParseMode.FAILFAST``: Throws an exception when parsing a document that
doesn't match the schema.
- ``ReadConfig.ParseMode.PERMISSIVE``: Sets fields to ``null`` when data types don't match
the schema. To store each invalid document as an extended JSON string,
combine this value with the ``columnNameOfCorruptRecord`` option.
- ``ReadConfig.ParseMode.DROPMALFORMED``: Ignores any document that doesn't match
the schema.

|
| **Default:** ``ReadConfig.ParseMode.FAILFAST``

* - ``columnNameOfCorruptRecord``
- | If you set the ``mode`` option to ``ReadConfig.ParseMode.PERMISSIVE``,
this option specifies the name of the new column that stores the invalid
document as extended JSON. If you're using an explicit schema, it must
include the name of the new column. If you're
using an inferred schema, the {+connector-short+} adds the new column to the
end of the schema.
|
| **Default:** None

* - ``mongoClientFactory``
- | MongoClientFactory configuration key.
| You can specify a custom implementation, which must implement the
Expand Down
Loading