Skip to content

BinaryType Column Not Supported when use_parquet_in_write Argument Set #602

@barino-carvana

Description

@barino-carvana

I am using Pyspark with Spark 3.4.1
And the 3.1.1 version of the Snowflake connector.

I am trying to write a DataFrame with a BinaryType Column from Spark to Snowflake and keep running into the following error:

ERROR StageWriter$: Error occurred while loading files to Snowflake: java.sql.SQLException: Status of query associated with resultSet is FAILED_WITH_ERROR. SQL compilation error:
Expression type does not match column data type, expecting BINARY(8388608) but got VARIANT for column KEY_HASH Results not generated.

This happens whether I use Append / Overwrite, or usestagingtable = true / false. I also tried to use the BINARY_INPUT_FORMAT = "BASE64" Session parameter.

I believe I see the issue, The COPY INTO queries that it's writing don't perform any to_binary conversion:

copy into "AZURE_SQL_ADS_RAW"."DBO"."_RAW_TBL_NAME_staging_95236350"
( "MIGRATION_ID", "MIGRATION_JOB_RUN_ID", "DATA", "KEY_HASH", "LAST_MODIFIED_DATETIME", "METADATA" )
from (
    select $1:"_MIGRATION_ID_",
        $1:"_MIGRATION_JOB_RUN_ID_",
        $1:"_DATA_",
        $1:"_KEY_HASH_",
        $1:"_LAST_MODIFIED_DATETIME_",
        $1:"_METADATA_"
     FROM @spark_connector_load_stage_buhsX4u9oa/m6q9yLC6Do/ tmp
) 
FILE_FORMAT = (
    TYPE=PARQUET
    USE_VECTORIZED_SCANNER=TRUE
  )

In the above, KEY_HASH is a BinaryType field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions