From a5087203013c399719cbed8d678da6f86536829d Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 13 Feb 2024 12:38:19 -0800 Subject: [PATCH 01/13] first draft --- source/release-notes.txt | 30 +++++++++++++++++++ .../streaming-mode/streaming-read-config.txt | 20 ++++++++++++- source/streaming-mode/streaming-read.txt | 14 +++++---- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/source/release-notes.txt b/source/release-notes.txt index ba5a8354..7983ff81 100644 --- a/source/release-notes.txt +++ b/source/release-notes.txt @@ -2,6 +2,36 @@ Release Notes ============= +MongoDB Connector for Spark 10.3 +-------------------------------- + +The 10.2 connector release includes the following new features: + +- Added support for scanning multiple collections when using micro-batch or + continuous streaming modes. + + .. warning:: Breaking Change + + Support for scanning multiple collections introduces the following breaking + changes: + + - If the name of a collection used in your ``collection`` configuration + option contains a comma (","), the + {+connector-short+} treats it as two different collections. To avoid + this, you must escape the comma by preceding it with a backslash ("\"). + + - If the name of a collection used in your ``collection`` configuration + option is "*", the {+connector-short+} interprets it as a specification + to scan all collections. To avoid this, you must escape the asterisk by preceding it + with a backslash ("\"). + + - If the name of a collection used in your ``collection`` configuration + option contains a backslash ("\"), the + {+connector-short+} treats the backslash as an escape character, which + may change how the value is interpreted. To avoid this, you must escape + the backslash by preceding it with another backslash. + + MongoDB Connector for Spark 10.2 -------------------------------- diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 621a412a..6e849a1c 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -46,6 +46,24 @@ You can configure the following properties when reading data from MongoDB in str * - ``collection`` - | **Required.** | The collection name configuration. + | You can specify multiple collections by separating each collection name + with a comma (","). Do not add a space between the collections + unless the space is a part of the collection name. + | Specify multiple collections as follows: + + .. code-block:: java + + ... + .option("spark.mongodb.collection", "collectionOne,collectionTwo") + + | You can specify all collections in the database by using an asterisk + ("*"). + | Specify all collections as follows: + + .. code-block:: java + + ... + .option("spark.mongodb.collection", "*") * - ``comment`` - | The comment to append to the read operation. Comments appear in the @@ -168,7 +186,7 @@ You can configure the following properties when reading a change stream from Mon omit the ``fullDocument`` field and publishes only the value of the field. - If you don't specify a schema, the connector infers the schema - from the change stream document rather than from the underlying collection. + from the change stream document. **Default**: ``false`` diff --git a/source/streaming-mode/streaming-read.txt b/source/streaming-mode/streaming-read.txt index d7433cc7..547072bf 100644 --- a/source/streaming-mode/streaming-read.txt +++ b/source/streaming-mode/streaming-read.txt @@ -344,12 +344,14 @@ The following example shows how to stream data from MongoDB to your console. .. important:: Inferring the Schema of a Change Stream - When the {+connector-short+} infers the schema of a DataFrame - read from a change stream, by default, - it uses the schema of the underlying collection rather than that - of the change stream. If you set the ``change.stream.publish.full.document.only`` - option to ``true``, the connector uses the schema of the - change stream instead. + If you set the ``change.stream.publish.full.document.only`` + option to ``true``, the {+connector-short+} infers the schema of a DataFrame + read from a change stream by using the schema of the + scanned documents. If you set the option to ``false`` you must specify a + schema. + + Schema inference happens at the beginning of scanning, and does not take into + account collections that are created while scanning. For more information about this setting, and to see a full list of change stream configuration options, see the From 2f9efbd3210ccdf422b08e1504d175913e927011 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 13 Feb 2024 13:21:10 -0800 Subject: [PATCH 02/13] clarifications and rendering issues --- source/release-notes.txt | 18 ++++++++++-------- .../streaming-mode/streaming-read-config.txt | 8 +++++--- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/source/release-notes.txt b/source/release-notes.txt index 7983ff81..bcf292c6 100644 --- a/source/release-notes.txt +++ b/source/release-notes.txt @@ -5,32 +5,34 @@ Release Notes MongoDB Connector for Spark 10.3 -------------------------------- -The 10.2 connector release includes the following new features: +The 10.3 connector release includes the following new features: -- Added support for scanning multiple collections when using micro-batch or +- Added support for reading multiple collections when using micro-batch or continuous streaming modes. .. warning:: Breaking Change - Support for scanning multiple collections introduces the following breaking + Support for reading multiple collections introduces the following breaking changes: - If the name of a collection used in your ``collection`` configuration - option contains a comma (","), the + option contains a comma (,), the {+connector-short+} treats it as two different collections. To avoid - this, you must escape the comma by preceding it with a backslash ("\"). + this, you must escape the comma by preceding it with a backslash (\\). - If the name of a collection used in your ``collection`` configuration option is "*", the {+connector-short+} interprets it as a specification to scan all collections. To avoid this, you must escape the asterisk by preceding it - with a backslash ("\"). + with a backslash (\\). - If the name of a collection used in your ``collection`` configuration - option contains a backslash ("\"), the + option contains a backslash (\\), the {+connector-short+} treats the backslash as an escape character, which - may change how the value is interpreted. To avoid this, you must escape + might change how the value is interpreted. To avoid this, you must escape the backslash by preceding it with another backslash. + To learn more about scanning multiple collections, see the :ref:`collection + configuration property ` description. MongoDB Connector for Spark 10.2 -------------------------------- diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 6e849a1c..028e9c82 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -47,7 +47,7 @@ You can configure the following properties when reading data from MongoDB in str - | **Required.** | The collection name configuration. | You can specify multiple collections by separating each collection name - with a comma (","). Do not add a space between the collections + with a comma (,). Do not add a space between the collections unless the space is a part of the collection name. | Specify multiple collections as follows: @@ -56,8 +56,7 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "collectionOne,collectionTwo") - | You can specify all collections in the database by using an asterisk - ("*"). + | You can scan all collections in the database by using an asterisk (*). | Specify all collections as follows: .. code-block:: java @@ -65,6 +64,9 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "*") + If you create a collection while scanning from all collections, it is + automatically picked up for scanning. + * - ``comment`` - | The comment to append to the read operation. Comments appear in the :manual:`output of the Database Profiler. ` From ada0f3eae8361533f8fdec30b2669c75defb3d95 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 13 Feb 2024 13:36:23 -0800 Subject: [PATCH 03/13] typos --- source/streaming-mode/streaming-read-config.txt | 4 ++-- source/streaming-mode/streaming-read.txt | 7 +++---- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 028e9c82..47a31375 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -64,8 +64,8 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "*") - If you create a collection while scanning from all collections, it is - automatically picked up for scanning. + If you create a collection while scanning from all collections, it is + automatically picked up for scanning. * - ``comment`` - | The comment to append to the read operation. Comments appear in the diff --git a/source/streaming-mode/streaming-read.txt b/source/streaming-mode/streaming-read.txt index 547072bf..45ec6133 100644 --- a/source/streaming-mode/streaming-read.txt +++ b/source/streaming-mode/streaming-read.txt @@ -345,10 +345,9 @@ The following example shows how to stream data from MongoDB to your console. .. important:: Inferring the Schema of a Change Stream If you set the ``change.stream.publish.full.document.only`` - option to ``true``, the {+connector-short+} infers the schema of a DataFrame - read from a change stream by using the schema of the - scanned documents. If you set the option to ``false`` you must specify a - schema. + option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame`` + by using the schema of the scanned documents. If you set the option to + ``false`` you must specify a schema. Schema inference happens at the beginning of scanning, and does not take into account collections that are created while scanning. From 4b725e8a3875400c42b115b60de32e981cef1cf0 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 13 Feb 2024 13:44:42 -0800 Subject: [PATCH 04/13] typo --- source/streaming-mode/streaming-read-config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 47a31375..3ebc3ecf 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -56,7 +56,7 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "collectionOne,collectionTwo") - | You can scan all collections in the database by using an asterisk (*). + | You can scan from all collections in the database by using an asterisk (*). | Specify all collections as follows: .. code-block:: java From a2f5b51fd02c0af899e7c4379792fb9d61bd6976 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 13 Feb 2024 13:48:03 -0800 Subject: [PATCH 05/13] taxonomy --- source/streaming-mode/streaming-read.txt | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/source/streaming-mode/streaming-read.txt b/source/streaming-mode/streaming-read.txt index 45ec6133..e5b664b7 100644 --- a/source/streaming-mode/streaming-read.txt +++ b/source/streaming-mode/streaming-read.txt @@ -15,6 +15,13 @@ Read from MongoDB in Streaming Mode :depth: 1 :class: singlecol +.. facet:: + :name: genre + :values: reference + +.. meta:: + :keywords: change stream + Overview -------- From eddf93c3d5ebd1f8af38afe29cd6c4ec152c454a Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Wed, 14 Feb 2024 07:11:52 -0800 Subject: [PATCH 06/13] Mike feedback --- source/release-notes.txt | 4 ++-- source/streaming-mode/streaming-read-config.txt | 13 +++++++------ source/streaming-mode/streaming-read.txt | 6 +++--- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/source/release-notes.txt b/source/release-notes.txt index bcf292c6..bc0d6533 100644 --- a/source/release-notes.txt +++ b/source/release-notes.txt @@ -16,7 +16,7 @@ The 10.3 connector release includes the following new features: changes: - If the name of a collection used in your ``collection`` configuration - option contains a comma (,), the + option contains a comma, the {+connector-short+} treats it as two different collections. To avoid this, you must escape the comma by preceding it with a backslash (\\). @@ -28,7 +28,7 @@ The 10.3 connector release includes the following new features: - If the name of a collection used in your ``collection`` configuration option contains a backslash (\\), the {+connector-short+} treats the backslash as an escape character, which - might change how the value is interpreted. To avoid this, you must escape + might change how it interprets the value. To avoid this, you must escape the backslash by preceding it with another backslash. To learn more about scanning multiple collections, see the :ref:`collection diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 3ebc3ecf..b7fa099d 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -46,9 +46,9 @@ You can configure the following properties when reading data from MongoDB in str * - ``collection`` - | **Required.** | The collection name configuration. - | You can specify multiple collections by separating each collection name - with a comma (,). Do not add a space between the collections - unless the space is a part of the collection name. + | You can specify multiple collections by separating the collection names + with a comma. The collections must be in the same database. Do not add + a space between the collections unless the space is a part of the collection name. | Specify multiple collections as follows: .. code-block:: java @@ -56,7 +56,8 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "collectionOne,collectionTwo") - | You can scan from all collections in the database by using an asterisk (*). + | You can stream from all collections in the database by passing an + asterisk (*) as a string for the collection name. | Specify all collections as follows: .. code-block:: java @@ -64,8 +65,8 @@ You can configure the following properties when reading data from MongoDB in str ... .option("spark.mongodb.collection", "*") - If you create a collection while scanning from all collections, it is - automatically picked up for scanning. + If you create a collection while streaming from all collections, the new + collection is automatically included in the stream. * - ``comment`` - | The comment to append to the read operation. Comments appear in the diff --git a/source/streaming-mode/streaming-read.txt b/source/streaming-mode/streaming-read.txt index e5b664b7..16e22d8b 100644 --- a/source/streaming-mode/streaming-read.txt +++ b/source/streaming-mode/streaming-read.txt @@ -354,10 +354,10 @@ The following example shows how to stream data from MongoDB to your console. If you set the ``change.stream.publish.full.document.only`` option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame`` by using the schema of the scanned documents. If you set the option to - ``false`` you must specify a schema. + ``false``, you must specify a schema. - Schema inference happens at the beginning of scanning, and does not take into - account collections that are created while scanning. + Schema inference happens at the beginning of streaming, and does not take into + account collections that are created during streaming. For more information about this setting, and to see a full list of change stream configuration options, see the From fa8bd86c4dae4194dbacbc32f62207f08c07e065 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 20 Feb 2024 10:57:35 -0800 Subject: [PATCH 07/13] Add missing info and move collection info to its own section --- .../streaming-mode/streaming-read-config.txt | 83 ++++++++++++++----- 1 file changed, 62 insertions(+), 21 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index b7fa099d..a82fe6af 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -47,26 +47,8 @@ You can configure the following properties when reading data from MongoDB in str - | **Required.** | The collection name configuration. | You can specify multiple collections by separating the collection names - with a comma. The collections must be in the same database. Do not add - a space between the collections unless the space is a part of the collection name. - | Specify multiple collections as follows: - - .. code-block:: java - - ... - .option("spark.mongodb.collection", "collectionOne,collectionTwo") - - | You can stream from all collections in the database by passing an - asterisk (*) as a string for the collection name. - | Specify all collections as follows: - - .. code-block:: java - - ... - .option("spark.mongodb.collection", "*") - - If you create a collection while streaming from all collections, the new - collection is automatically included in the stream. + with a comma. + | To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`. * - ``comment`` - | The comment to append to the read operation. Comments appear in the @@ -224,4 +206,63 @@ You can configure the following properties when reading a change stream from Mon Specifying Properties in ``connection.uri`` ------------------------------------------- -.. include:: /includes/connection-read-config.rst \ No newline at end of file +.. include:: /includes/connection-read-config.rst + +.. _spark-specify-multiple-collections: + +Specifying Multiple Collections in the ``collection`` Property +-------------------------------------------------------------- + +You can specify multiple collections in the ``collection`` change stream +configuration property by separating the collection names +with a comma. Do not add a space between the collections unless the space is a +part of the collection name. + +.. note:: Performance Considerations + +When streaming from multiple collections, the connector reads from +each collection sequentially. Streaming from a large number of +collections cause slower performance. + +Specify multiple collections as follows: + +.. code-block:: java + +... +.option("spark.mongodb.collection", "collectionOne,collectionTwo") + +.. note:: + + If a collection name is "*", or if the name includes a comma or a backslash (\\), + you must escape the character as follows: + + - If the name of a collection used in your ``collection`` configuration + option contains a comma, the {+connector-short+} treats it as two different + collections. To avoid this, you must escape the comma by preceding it with + a backslash (\\). + + - If the name of a collection used in your ``collection`` configuration + option is "*", the {+connector-short+} interprets it as a specification + to scan all collections. To avoid this, you must escape the asterisk by preceding it + with a backslash (\\). + + - If the name of a collection used in your ``collection`` configuration + option contains a backslash (\\), the + {+connector-short+} treats the backslash as an escape character, which + might change how it interprets the value. To avoid this, you must escape + the backslash by preceding it with another backslash. + +You can stream from all collections in the database by passing an +asterisk (*) as a string for the collection name. + +Specify all collections as follows: + +.. code-block:: java + +... +.option("spark.mongodb.collection", "*") + +If you create a collection while streaming from all collections, the new +collection is automatically included in the stream. + +You can drop collections while streaming from multiple collections. From e81bc1e2421572498499ca7beaa90e5249ba0cd1 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 20 Feb 2024 11:11:11 -0800 Subject: [PATCH 08/13] indentation errors --- source/streaming-mode/streaming-read-config.txt | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index a82fe6af..e49caaa5 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -220,16 +220,16 @@ part of the collection name. .. note:: Performance Considerations -When streaming from multiple collections, the connector reads from -each collection sequentially. Streaming from a large number of -collections cause slower performance. + When streaming from multiple collections, the connector reads from + each collection sequentially. Streaming from a large number of + collections cause slower performance. Specify multiple collections as follows: .. code-block:: java -... -.option("spark.mongodb.collection", "collectionOne,collectionTwo") + ... + .option("spark.mongodb.collection", "collectionOne,collectionTwo") .. note:: @@ -259,10 +259,10 @@ Specify all collections as follows: .. code-block:: java -... -.option("spark.mongodb.collection", "*") + ... + .option("spark.mongodb.collection", "*") If you create a collection while streaming from all collections, the new collection is automatically included in the stream. -You can drop collections while streaming from multiple collections. +You can drop collections at any time while streaming from multiple collections. From 6f80c56bf309f664c7ca2a842708c1448718a1fa Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 20 Feb 2024 11:14:56 -0800 Subject: [PATCH 09/13] wording --- source/streaming-mode/streaming-read-config.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index e49caaa5..b8b86e0f 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -48,6 +48,7 @@ You can configure the following properties when reading data from MongoDB in str | The collection name configuration. | You can specify multiple collections by separating the collection names with a comma. + | | To learn more about specifying multiple collections, see :ref:`spark-specify-multiple-collections`. * - ``comment`` @@ -224,7 +225,7 @@ part of the collection name. each collection sequentially. Streaming from a large number of collections cause slower performance. -Specify multiple collections as follows: +Specify multiple collections as shown in the following example: .. code-block:: java @@ -255,7 +256,7 @@ Specify multiple collections as follows: You can stream from all collections in the database by passing an asterisk (*) as a string for the collection name. -Specify all collections as follows: +Specify all collections as shown in the following example: .. code-block:: java From b3e46c1f3d8ef603990013ce944e4a416faead9f Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Tue, 20 Feb 2024 11:18:17 -0800 Subject: [PATCH 10/13] typo --- source/streaming-mode/streaming-read-config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index b8b86e0f..cea48495 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -223,7 +223,7 @@ part of the collection name. When streaming from multiple collections, the connector reads from each collection sequentially. Streaming from a large number of - collections cause slower performance. + collections can cause slower performance. Specify multiple collections as shown in the following example: From 55ed099ef412e15bcdaaa18c166579edcce8f737 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Mon, 26 Feb 2024 08:59:18 -0800 Subject: [PATCH 11/13] adding additional information and examples --- .../streaming-mode/streaming-read-config.txt | 73 +++++++++++++------ source/streaming-mode/streaming-read.txt | 3 - 2 files changed, 51 insertions(+), 25 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index cea48495..b7d1e407 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -219,12 +219,6 @@ configuration property by separating the collection names with a comma. Do not add a space between the collections unless the space is a part of the collection name. -.. note:: Performance Considerations - - When streaming from multiple collections, the connector reads from - each collection sequentially. Streaming from a large number of - collections can cause slower performance. - Specify multiple collections as shown in the following example: .. code-block:: java @@ -232,26 +226,46 @@ Specify multiple collections as shown in the following example: ... .option("spark.mongodb.collection", "collectionOne,collectionTwo") -.. note:: +If a collection name is "*", or if the name includes a comma or a backslash (\\), +you must escape the character as follows: + +- If the name of a collection used in your ``collection`` configuration + option contains a comma, the {+connector-short+} treats it as two different + collections. To avoid this, you must escape the comma by preceding it with + a backslash (\\). Escape a collection named "my,collection" as follows: + + .. code-block:: java + + "my\\,collection" + +- If the name of a collection used in your ``collection`` configuration + option is "*", the {+connector-short+} interprets it as a specification + to scan all collections. To avoid this, you must escape the asterisk by preceding it + with a backslash (\\). Escape a collection named "*" as follows: + + .. code-block:: java - If a collection name is "*", or if the name includes a comma or a backslash (\\), - you must escape the character as follows: + "\\*" - - If the name of a collection used in your ``collection`` configuration - option contains a comma, the {+connector-short+} treats it as two different - collections. To avoid this, you must escape the comma by preceding it with - a backslash (\\). +- If the name of a collection used in your ``collection`` configuration + option contains a backslash (\\), the + {+connector-short+} treats the backslash as an escape character, which + might change how it interprets the value. To avoid this, you must escape + the backslash by preceding it with another backslash. Escape a collection named "\\collection" as follows: - - If the name of a collection used in your ``collection`` configuration - option is "*", the {+connector-short+} interprets it as a specification - to scan all collections. To avoid this, you must escape the asterisk by preceding it - with a backslash (\\). + .. code-block:: java - - If the name of a collection used in your ``collection`` configuration - option contains a backslash (\\), the - {+connector-short+} treats the backslash as an escape character, which - might change how it interprets the value. To avoid this, you must escape - the backslash by preceding it with another backslash. + "\\\\collection" + + .. note:: + + When specifying the collection name as a string literal in Java, you must + further escape each backslash with another one. For example, escape a collection + named "\\collection" as follows: + + .. code-block:: java + + "\\\\\\\\collection" You can stream from all collections in the database by passing an asterisk (*) as a string for the collection name. @@ -267,3 +281,18 @@ If you create a collection while streaming from all collections, the new collection is automatically included in the stream. You can drop collections at any time while streaming from multiple collections. + +.. important:: Inferring the Schema of a Change Stream + + If you set the ``change.stream.publish.full.document.only`` + option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame`` + by using the schema of the scanned documents. If you set the option to + ``false``, you must specify a schema. + + Schema inference happens at the beginning of streaming, and does not take into + account collections that are created during streaming. + + When streaming from multiple collections, the connector samples + each collection sequentially. Streaming from a large number of + collections can cause the schema inference to have noticeably slower + performance. This performance impact occurs only while inferring the schema. diff --git a/source/streaming-mode/streaming-read.txt b/source/streaming-mode/streaming-read.txt index 16e22d8b..ac8fb7ba 100644 --- a/source/streaming-mode/streaming-read.txt +++ b/source/streaming-mode/streaming-read.txt @@ -355,9 +355,6 @@ The following example shows how to stream data from MongoDB to your console. option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame`` by using the schema of the scanned documents. If you set the option to ``false``, you must specify a schema. - - Schema inference happens at the beginning of streaming, and does not take into - account collections that are created during streaming. For more information about this setting, and to see a full list of change stream configuration options, see the From 2553f1370e896645b7c261f55a3da2528af45564 Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Mon, 26 Feb 2024 09:19:44 -0800 Subject: [PATCH 12/13] fix escaping examples --- source/streaming-mode/streaming-read-config.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index b7d1e407..0b56c127 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -236,7 +236,7 @@ you must escape the character as follows: .. code-block:: java - "my\\,collection" + "my\,collection" - If the name of a collection used in your ``collection`` configuration option is "*", the {+connector-short+} interprets it as a specification @@ -245,7 +245,7 @@ you must escape the character as follows: .. code-block:: java - "\\*" + "\*" - If the name of a collection used in your ``collection`` configuration option contains a backslash (\\), the @@ -255,7 +255,7 @@ you must escape the character as follows: .. code-block:: java - "\\\\collection" + "\\collection" .. note:: @@ -265,7 +265,7 @@ you must escape the character as follows: .. code-block:: java - "\\\\\\\\collection" + "\\\\collection" You can stream from all collections in the database by passing an asterisk (*) as a string for the collection name. From c737f065c51403784ed24a912191bf946687d4ec Mon Sep 17 00:00:00 2001 From: Jordan Smith Date: Wed, 28 Feb 2024 09:16:56 -0800 Subject: [PATCH 13/13] small fixes --- source/streaming-mode/streaming-read-config.txt | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 0b56c127..997d175d 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -282,17 +282,16 @@ collection is automatically included in the stream. You can drop collections at any time while streaming from multiple collections. -.. important:: Inferring the Schema of a Change Stream +.. important:: Inferring the Schema with Multiple Collections If you set the ``change.stream.publish.full.document.only`` option to ``true``, the {+connector-short+} infers the schema of a ``DataFrame`` - by using the schema of the scanned documents. If you set the option to - ``false``, you must specify a schema. + by using the schema of the scanned documents. - Schema inference happens at the beginning of streaming, and does not take into - account collections that are created during streaming. + Schema inference happens at the beginning of streaming, and does not take + into account collections that are created during streaming. - When streaming from multiple collections, the connector samples + When streaming from multiple collections and inferring the schema, the connector samples each collection sequentially. Streaming from a large number of collections can cause the schema inference to have noticeably slower performance. This performance impact occurs only while inferring the schema.