Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files #50215

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ganeshashree
Copy link

What changes were proposed in this pull request?

Changes done to set INT64 as the default timestamp type for Parquet files.

Why are the changes needed?

The INT96 timestamp type has been deprecated as part of PARQUET-323. However, Apache Spark still uses INT96 as the default outputTimestampType for Parquet files (code link). This could create incompatibilities when Parquet data written by Spark is read by readers that do not support the INT96 type. We should consider changing the default timestamp type to INT64 in future versions.

Does this PR introduce any user-facing change?

The default timestamp type for Parquet files will be changed to INT64. Older versions of applications that support INT96 should enable the INT96 type by setting spark.sql.parquet.int96AsTimestamp to true and spark.sql.parquet.outputTimestampType to INT96.

How was this patch tested?

Updated the unit tests in ParquetSchemaSuite and SQLConfSuite to reflect INT64 as the default timestamp type for Parquet files.

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants