fix(docs): correct snowflake options for bulk ingest (#2004)

It was brought up in #1997 that the currently published options for snowflake bulk ingestion are incorrect in the docs. This corrects them to the proper values that are consistent with the values in the implementation. This also adds new constants to the python `StatementOptions` enum for the snowflake driver for users to reference.
apache · Jul 12, 2024 · a5f8474 · a5f8474
1 parent 6c7ad99
commit a5f8474
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 4 deletions.
diff --git a/docs/source/driver/snowflake.rst b/docs/source/driver/snowflake.rst
@@ -285,23 +285,23 @@ The following informal benchmark demonstrates expected performance using default
 The default settings for ingestion should be well balanced for many real-world configurations. If required, performance
 and resource usage may be tuned with the following options on the :cpp:class:`AdbcStatement` object:
 
-``adbc.snowflake.rpc.ingest_writer_concurrency``
+``adbc.snowflake.statement.ingest_writer_concurrency``
     Number of Parquet files to write in parallel. Default attempts to maximize workers based on logical cores detected,
     but may need to be adjusted if running in a constrained environment. If set to 0, default value is used. Cannot be negative.
 
-``adbc.snowflake.rpc.ingest_upload_concurrency``
+``adbc.snowflake.statement.ingest_upload_concurrency``
     Number of Parquet files to upload in parallel. Greater concurrency can smooth out TCP congestion and help make
     use of available network bandwith, but will increase memory utilization. Default is 8. If set to 0, default value is used.
     Cannot be negative.
 
-``adbc.snowflake.rpc.ingest_copy_concurrency``
+``adbc.snowflake.statement.ingest_copy_concurrency``
     Maximum number of COPY operations to run concurrently. Bulk ingestion performance is optimized by executing COPY
     queries as files are still being uploaded. Snowflake COPY speed scales with warehouse size, so smaller warehouses
     may benefit from setting this value higher to ensure long-running COPY queries do not block newly uploaded files
     from being loaded. Default is 4. If set to 0, only a single COPY query will be executed as part of ingestion,
     once all files have finished uploading. Cannot be negative.
 
-``adbc.snowflake.rpc.ingest_target_file_size``
+``adbc.snowflake.statement.ingest_target_file_size``
     Approximate size of Parquet files written during ingestion. Actual size will be slightly larger, depending on
     size of footer/metadata. Default is 10 MB. If set to 0, file size has no limit. Cannot be negative.
 

diff --git a/python/adbc_driver_snowflake/adbc_driver_snowflake/__init__.py b/python/adbc_driver_snowflake/adbc_driver_snowflake/__init__.py
@@ -112,6 +112,24 @@ class StatementOptions(enum.Enum):
     #: Number of concurrent streams being prefetched for a result set.
     #: Defaults to 10.
     PREFETCH_CONCURRENCY = "adbc.snowflake.rpc.prefetch_concurrency"
+    #: Number of parquet files to write in parallel for bulk ingestion
+    #: Defaults to NumCPU
+    INGEST_WRITER_CONCURRENCY = "adbc.snowflake.statement.ingest_writer_concurrency"
+    #: Number of parquet files to upload in parallel. Greater concurrency can
+    #: smooth out congestion and make use of available network bandwidth but will
+    #: increase memory utilization. Cannot be negative. Defaults to 8
+    INGEST_UPLOAD_CONCURRENCY = "adbc.snowflake.statement.ingest_upload_concurrency"
+    #: Maximum number of COPY operations to run concurrently for bulk ingestion.
+    #: Bulk ingestion performance is optimized by executing COPY queries as files are
+    #: still being uploaded, Snowflake COPY speed scales with warehouse size. So smaller
+    #: warehouses might benefit from a higher setting to prevent a long-running COPY
+    #: query from blocking others from being loaded. Default is 4.
+    INGEST_COPY_CONCURRENCY = "adbc.snowflake.statement.ingest_copy_concurrency"
+    #: Approximate size of Parquet files written during ingestion. Actual size will be
+    #: slightly larger due to size of footer/metadata. Does not account for batch size,
+    #: so if the input stream produces very large batches, you'll get similar sized
+    #: parquet files. Default is 10MB
+    INGEST_TARGET_FILE_SIZE = "adbc.snowflake.statement.ingest_target_file_size"
 
 
 def connect(