From e00a5f520cf16188f30d8b2c4b9a7eadf66fed68 Mon Sep 17 00:00:00 2001 From: Michael Morisi Date: Wed, 29 May 2024 16:05:37 -0400 Subject: [PATCH 01/10] DOCSP-29861: Cleanup unused files (#200) (cherry picked from commit 05f312577a0d7ec29ea00ab33b37954d81baecf5) --- source/includes/data-source.rst | 5 ----- source/includes/scala-java-explicit-schema.rst | 13 ------------- 2 files changed, 18 deletions(-) delete mode 100644 source/includes/data-source.rst delete mode 100644 source/includes/scala-java-explicit-schema.rst diff --git a/source/includes/data-source.rst b/source/includes/data-source.rst deleted file mode 100644 index 2f18028e..00000000 --- a/source/includes/data-source.rst +++ /dev/null @@ -1,5 +0,0 @@ -.. note:: - - The empty argument ("") refers to a file to use as a data source. - In this case our data source is a MongoDB collection, so the data - source argument is empty. \ No newline at end of file diff --git a/source/includes/scala-java-explicit-schema.rst b/source/includes/scala-java-explicit-schema.rst deleted file mode 100644 index 3b682cb1..00000000 --- a/source/includes/scala-java-explicit-schema.rst +++ /dev/null @@ -1,13 +0,0 @@ -By default, reading from MongoDB in a ``SparkSession`` infers the -schema by sampling documents from the collection. You can also use a -|class| to define the schema explicitly, thus removing the extra -queries needed for sampling. - -.. note:: - - If you provide a case class for the schema, MongoDB returns **only - the declared fields**. This helps minimize the data sent across the - wire. - -The following statement creates a ``Character`` |class| and then -uses it to define the schema for the DataFrame: From 621324901f3a6e9f73d43d87b90f815b0ddd5480 Mon Sep 17 00:00:00 2001 From: Mike Woofter <108414937+mongoKart@users.noreply.github.com> Date: Wed, 5 Jun 2024 14:09:49 -0500 Subject: [PATCH 02/10] DOCSP-40130 - Note on Sharded Partitioner (#201) Co-authored-by: Nora Reidy (cherry picked from commit e25e13fb24dd1d0834c216dfe7b3e3df1b9f6ff7) --- source/batch-mode/batch-read-config.txt | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/source/batch-mode/batch-read-config.txt b/source/batch-mode/batch-read-config.txt index 7233fb2f..d97de93b 100644 --- a/source/batch-mode/batch-read-config.txt +++ b/source/batch-mode/batch-read-config.txt @@ -10,6 +10,13 @@ Batch Read Configuration Options :depth: 1 :class: singlecol +.. facet:: + :name: genre + :values: reference + +.. meta:: + :keywords: partitioner, customize, settings + .. _spark-batch-input-conf: Overview @@ -212,9 +219,12 @@ based on your shard configuration. To use this configuration, set the ``partitioner`` configuration option to ``com.mongodb.spark.sql.connector.read.partitioner.ShardedPartitioner``. -.. warning:: - - This partitioner is not compatible with hashed shard keys. +.. important:: ShardedPartitioner Restrictions + + 1. In MongoDB Server v6.0 and later, the sharding operation creates one large initial + chunk to cover all shard key values, making the sharded partitioner inefficient. + We do not recommend using the sharded partitioner when connected to MongoDB v6.0 and later. + 2. The sharded partitioner is not compatible with hashed shard keys. .. _conf-mongopaginatebysizepartitioner: .. _conf-paginatebysizepartitioner: From 0c729ac900c3b4ecd296c0306c8ed8b53e654a47 Mon Sep 17 00:00:00 2001 From: anabellabuckvar <41971124+anabellabuckvar@users.noreply.github.com> Date: Fri, 16 Aug 2024 14:30:13 -0400 Subject: [PATCH 03/10] Add Netlify config files via upload --- build.sh | 7 +++++++ netlify.toml | 6 ++++++ 2 files changed, 13 insertions(+) create mode 100644 build.sh create mode 100644 netlify.toml diff --git a/build.sh b/build.sh new file mode 100644 index 00000000..a5e15032 --- /dev/null +++ b/build.sh @@ -0,0 +1,7 @@ +# ensures that we always use the latest version of the script +if [ -f build-site.sh ]; then + rm build-site.sh +fi + +curl https://raw.githubusercontent.com/mongodb/docs-worker-pool/netlify-poc/scripts/build-site.sh -o build-site.sh +sh build-site.sh diff --git a/netlify.toml b/netlify.toml new file mode 100644 index 00000000..d0c89040 --- /dev/null +++ b/netlify.toml @@ -0,0 +1,6 @@ +[[integrations]] +name = "snooty-cache-plugin" + +[build] +publish = "snooty/public" +command = ". ./build.sh" From 649c6f6874d67569ad7a00345bdc277f142b6456 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 4 Sep 2024 08:53:17 -0500 Subject: [PATCH 04/10] DOCSP-42969 - remove nested admonitions (#204) (#208) (cherry picked from commit b37226e457978e048953f89f93f897c2ab6235b1) Co-authored-by: Mike Woofter <108414937+mongoKart@users.noreply.github.com> --- source/includes/note-trigger-method.rst | 4 --- .../streaming-mode/streaming-read-config.txt | 36 ++++++++----------- source/streaming-mode/streaming-write.txt | 15 ++++---- 3 files changed, 20 insertions(+), 35 deletions(-) delete mode 100644 source/includes/note-trigger-method.rst diff --git a/source/includes/note-trigger-method.rst b/source/includes/note-trigger-method.rst deleted file mode 100644 index f9ad2d1d..00000000 --- a/source/includes/note-trigger-method.rst +++ /dev/null @@ -1,4 +0,0 @@ -.. note:: - - Call the ``trigger()`` method on the ``DataStreamWriter`` you create - from the ``DataStreamReader`` you configure. diff --git a/source/streaming-mode/streaming-read-config.txt b/source/streaming-mode/streaming-read-config.txt index 997d175d..dd185fe1 100644 --- a/source/streaming-mode/streaming-read-config.txt +++ b/source/streaming-mode/streaming-read-config.txt @@ -82,12 +82,10 @@ You can configure the following properties when reading data from MongoDB in str [{"$match": {"closed": false}}, {"$project": {"status": 1, "name": 1, "description": 1}}] - .. important:: - - Custom aggregation pipelines must be compatible with the - partitioner strategy. For example, aggregation stages such as - ``$group`` do not work with any partitioner that creates more than - one partition. + Custom aggregation pipelines must be compatible with the + partitioner strategy. For example, aggregation stages such as + ``$group`` do not work with any partitioner that creates more than + one partition. * - ``aggregation.allowDiskUse`` - | Specifies whether to allow storage to disk when running the @@ -135,14 +133,12 @@ You can configure the following properties when reading a change stream from Mon original document and updated document, but it also includes a copy of the entire updated document. + For more information on how this change stream option works, + see the MongoDB server manual guide + :manual:`Lookup Full Document for Update Operation `. + **Default:** "default" - .. tip:: - - For more information on how this change stream option works, - see the MongoDB server manual guide - :manual:`Lookup Full Document for Update Operation `. - * - ``change.stream.micro.batch.max.partition.count`` - | The maximum number of partitions the {+connector-short+} divides each micro-batch into. Spark workers can process these partitions in parallel. @@ -151,11 +147,9 @@ You can configure the following properties when reading a change stream from Mon | | **Default**: ``1`` - .. warning:: Event Order - - Specifying a value larger than ``1`` can alter the order in which - the {+connector-short+} processes change events. Avoid this setting - if out-of-order processing could create data inconsistencies downstream. + :red:`WARNING:` Specifying a value larger than ``1`` can alter the order in which + the {+connector-short+} processes change events. Avoid this setting + if out-of-order processing could create data inconsistencies downstream. * - ``change.stream.publish.full.document.only`` - | Specifies whether to publish the changed document or the full @@ -174,12 +168,10 @@ You can configure the following properties when reading a change stream from Mon - If you don't specify a schema, the connector infers the schema from the change stream document. - **Default**: ``false`` + This setting overrides the ``change.stream.lookup.full.document`` + setting. - .. note:: - - This setting overrides the ``change.stream.lookup.full.document`` - setting. + **Default**: ``false`` * - ``change.stream.startup.mode`` - | Specifies how the connector starts up when no offset is available. diff --git a/source/streaming-mode/streaming-write.txt b/source/streaming-mode/streaming-write.txt index 60a6aa3f..815c1f27 100644 --- a/source/streaming-mode/streaming-write.txt +++ b/source/streaming-mode/streaming-write.txt @@ -51,7 +51,8 @@ Write to MongoDB in Streaming Mode * - ``writeStream.trigger()`` - Specifies how often the {+connector-short+} writes results - to the streaming sink. + to the streaming sink. Call this method on the ``DataStreamWriter`` object + you create from the ``DataStreamReader`` you configure. To use continuous processing, pass ``Trigger.Continuous(