Snowflake export update #3124

chngpe · 2025-11-26T03:12:52Z

-add support for specify storage integration
-update partition logic for S3Export path
-Introduce sql string method in SQL builder, remove duplicate logic in snowflake SQL string builder.
-fix query failed when contains predicate
-fix query return empty result when multi sources within same query
-fix support for datetime, timestamp and datetimemilli predicate on s3Export path

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2025-11-26T03:17:59Z

Codecov Report

❌ Patch coverage is 66.93548% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.99%. Comparing base (fe0a5d9) to head (b1f5640).
⚠️ Report is 113 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...connectors/snowflake/SnowflakeMetadataHandler.java	68.75%	29 Missing and 11 partials ⚠️
...nectors/snowflake/SnowflakeQueryStringBuilder.java	39.39%	17 Missing and 3 partials ⚠️
...a/connectors/snowflake/SnowflakeRecordHandler.java	80.95%	6 Missing and 2 partials ⚠️
...s/snowflake/utils/SnowflakeArrowTypeConverter.java	75.86%	2 Missing and 5 partials ⚠️
...or/lambda/exceptions/AthenaConnectorException.java	0.00%	6 Missing ⚠️
...onnectors/snowflake/SnowflakeCompositeHandler.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #3124      +/-   ##
============================================
+ Coverage     63.67%   68.99%   +5.32%     
- Complexity     4344     5020     +676     
============================================
  Files           621      638      +17     
  Lines         23286    24015     +729     
  Branches       2859     2965     +106     
============================================
+ Hits          14827    16569    +1742     
+ Misses         7070     6015    -1055     
- Partials       1389     1431      +42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

-update partition logic for S3Export path -Introduce sql string method in SQL builder, remove duplicate logic in snowflake SQL string builder. -fix query failed when contains predicate -fix query return empty result when multi sources within same query - Use Zero copy for reading export file into our spill file - remove row restriction to a block - fix snowflake where we compute environment vaiable each time we call isS3Enable - add vector converter from Timestamp to datetime milli vector

AbdulR3hman · 2025-12-02T17:48:06Z

...ake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeQueryStringBuilder.java

-                    conjuncts.add(toPredicate(column.getName(), valueSet, type, accumulator));
-                }
-            }
+        if (sql == null) {


nit; shouldn't this be null or empty?

...ake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeQueryStringBuilder.java

AbdulR3hman · 2025-12-02T17:54:35Z

...-federation-sdk/src/main/java/com/amazonaws/athena/connector/lambda/data/SupportedTypes.java

    VARCHAR(Types.MinorType.VARCHAR),
    STRUCT(Types.MinorType.STRUCT),
    LIST(Types.MinorType.LIST),
+    TIMESTAMPNTZ(Types.MinorType.TIMESTAMPMILLI),


q: is this going to be true across all connectors? can we test other connectors to make sure before/after is reflecting correctly?

no, we don't have extractor support for this, it was coming as separate CR. but i agree, let me separate this out

AbdulR3hman · 2025-12-02T18:15:42Z

...eration-sdk/src/main/java/com/amazonaws/athena/connector/lambda/data/BlockAllocatorImpl.java

            for (Field next : schema.getFields()) {
-                vectors.add(next.createVector(rootAllocator));
+                FieldVector vector = next.createVector(rootAllocator);
+                vector.allocateNew();


discussing this offline; but just for the sake of keeping this PR in the loop; is allocateNew() required now in the newer version? Since we never actually had to do this before? and is the default allocation good enough initially?

AbdulR3hman · 2025-12-02T18:16:08Z

...-federation-sdk/src/main/java/com/amazonaws/athena/connector/lambda/data/S3BlockSpiller.java

            throw (ex instanceof RuntimeException) ? (RuntimeException) ex : new RuntimeException(ex);
        }

-        if (rows > maxRowsPerCall) {


why is this no longer needed? or better question; why did we need maxRowsPerCall initially and now we don't?

and it seems to be used only within this method; if not needed should we clean up the rest of the class and the constructor? but only after we answer the first question; i.e. why we no longer need max rows per call

why did we need maxRowsPerCall initially and now we don't?

The initial was to prevent blocks that exceed the max block size. We should not have this to begin with, from engine we don't care about the block size (s3blockSpillReader). I think this was use to protect the block size being too large hence create issue on s3 PutObject. However, with sdkv2 we can use high level Multi-part upload. This is less a concern now and Multi-part upload is coming next

AbdulR3hman · 2025-12-02T18:21:03Z

...c/main/java/com/amazonaws/athena/connectors/snowflake/utils/SnowflakeArrowTypeConverter.java

+     * @param scale Decimal scale.
+     * @return Arrow type. See {@link ArrowType}.
+     */
+    public static Optional<ArrowType> toArrowType(String name, final int jdbcType, final int precision, final int scale, java.util.Map<String, String> configOptions)


this seems to be more or less the exact copy of athena-jdbc/src/main/java/com/amazonaws/athena/connectors/jdbc/manager/JdbcArrowTypeConverter.java

I know we are trying to handle special snow flake type; can't we just add it in the original JDBC class instead? that way we don't have to maintain two?

I can try to deduplicate this, but the main idea is we are treating the decimal completely different as Snowflake tread number as decimal(38,x,128) hence doing the slight adjustment here with UTC timestamp

I meant to say that that one line that treats Snowflake's differently; we can just move it to the original file; it should still work no? I can talk to you offline to explain what I mean.

AbdulR3hman · 2025-12-02T18:25:05Z

...wflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeMetadataHandler.java

-    public static final String SEPARATOR = "/";
-    static final String BLOCK_PARTITION_COLUMN_NAME = "partition";
    private static final int MAX_SPLITS_PER_REQUEST = 1000_000;
+    private static final String STORAGE_INTEGRATION_CONFIG_KEY = "snowflake_storage_integration_name";


should all these be moved to the Constants for the connector?

AbdulR3hman · 2025-12-02T18:30:36Z

...wflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeMetadataHandler.java

+     * Get Snowflake storage integration name from config
+     * @return
+     */
+    private String getStorageIntegrationName()


qq; what exactly is this used for? *(still going through the PR so it might be answered in this class);

AbdulR3hman · 2025-12-02T18:45:37Z

...wflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeMetadataHandler.java

-            GetFunctionResponse response = lambdaClient.getFunction(request);
-            return response.configuration().role();
+    @VisibleForTesting
+    Map<String, String> getStorageIntegrationProperties(Connection connection, String integrationName) throws SQLException


Just couple of thoughts on getStorageIntegrationProperties and isSFStorageIntegrationExistAndValid;

I think we can make isSFStorageIntegrationExistAndValid return an optional; that we we don't have to try to get the same environment properties twice calling this in both methods;

Optional.ofNullable(properties.get(STORAGE_INTEGRATION_BUCKET_KEY))

I'd write it something like this;

private String requireProperty(Map<String, String> properties, String key) { return Optional.ofNullable(properties.get(key)) .orElseThrow(() -> new IllegalArgumentException( String.format("Snowflake Storage Integration, field:%s cannot be null", key))); }

this should check any missing field, and the following call should make the two methods simpler to read:

private String resolveAndValidateS3BucketPath(Connection connection, String integrationName) throws SQLException { Map<String, String> properties = getStorageIntegrationProperties(connection, integrationName); if (properties.isEmpty()) { throw new IllegalArgumentException( String.format("Snowflake Storage Integration: name:%s not found", integrationName)); } String bucketPath = requireProperty(properties, STORAGE_INTEGRATION_BUCKET_KEY); String provider = requireProperty(properties, STORAGE_INTEGRATION_STORAGE_PROVIDER_KEY); if (!"S3".equalsIgnoreCase(provider)) { throw new IllegalArgumentException( String.format("Snowflake Storage Integration, field:%s must be S3", STORAGE_INTEGRATION_STORAGE_PROVIDER_KEY)); } // Single path only if (bucketPath.split(",").length != 1) { throw new IllegalArgumentException( String.format("Snowflake Storage Integration, field:%s must be a single S3 path", STORAGE_INTEGRATION_BUCKET_KEY)); } // Validate it's an S3 path if (!bucketPath.startsWith("s3://")) { throw new IllegalArgumentException( String.format("Storage integration bucket path must be an S3 path: %s", bucketPath)); } // Normalize trailing slash if (bucketPath.endsWith("/")) { bucketPath = bucketPath.substring(0, bucketPath.length() - 1); } return bucketPath; }

AbdulR3hman · 2025-12-02T19:41:05Z

...wflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeMetadataHandler.java

-            LOGGER.error("Error checking for integration {}: {}", integrationName, e.getMessage());
-            throw new SQLException("Failed to check for integration existence: " + e.getMessage(), e);
-        }
+        return constraints.getQueryPassthroughArguments().get(JdbcQueryPassthrough.QUERY);


should we add a check to see if its QPT; and S3 integration enabled; we should throw an error? meaning users shouldn't be able to use S3 integration directly with QPT, given that we need to do good amount of work prior.

AbdulR3hman · 2025-12-02T23:30:09Z

...nowflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeRecordHandler.java

-        ScanOptions options = new ScanOptions(/*batchSize*/ 32768);
+
+        // do a scan projection, only getting the column we want
+        ScanOptions options = new ScanOptions(/*batchSize*/ 32768,


where is this 32768 comes from? and can we make it static?

AbdulR3hman · 2025-12-02T23:57:03Z

...nowflake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeRecordHandler.java

-                mapOfNamesAndTypes.put(field.getName(), minorTypeForArrowType);
-                mapOfCols.put(field.getName(), null);
-            }
+        if (s3ObjectKey.isEmpty()) {


maybe a warning is warranted here; just in case Cx want to look at their logs. however it might be confusing; so maybe a debug?

-update teradata unit test -code clean up -add qpt support -remove unnecessary call

chngpe force-pushed the snowflake_export_update branch from 622ac37 to f84587f Compare December 2, 2025 14:30

AbdulR3hman self-requested a review December 2, 2025 17:35

AbdulR3hman self-assigned this Dec 2, 2025

AbdulR3hman reviewed Dec 2, 2025

View reviewed changes

...ake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeQueryStringBuilder.java Outdated Show resolved Hide resolved

AbdulR3hman reviewed Dec 2, 2025

View reviewed changes

...ake/src/main/java/com/amazonaws/athena/connectors/snowflake/SnowflakeQueryStringBuilder.java Outdated Show resolved Hide resolved

AbdulR3hman reviewed Dec 2, 2025

View reviewed changes

macohen requested a review from samarsar December 4, 2025 15:17

chngpe added 2 commits December 4, 2025 18:40

-update bigquery unit test

70d028a

-update teradata unit test -code clean up -add qpt support -remove unnecessary call

-fix simple predicate not pushing down in s3 export path

b1f5640

chngpe force-pushed the snowflake_export_update branch from 0a616b0 to b1f5640 Compare December 5, 2025 02:20

Snowflake export update #3124

Are you sure you want to change the base?

Snowflake export update #3124

Uh oh!

Conversation

chngpe commented Nov 26, 2025

Uh oh!

codecov bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AbdulR3hman Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 26, 2025 •

edited

Loading

AbdulR3hman Dec 2, 2025 •

edited

Loading