[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files #50215
+27
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Changes done to set INT64 as the default timestamp type for Parquet files.
Why are the changes needed?
The INT96 timestamp type has been deprecated as part of PARQUET-323. However, Apache Spark still uses INT96 as the default outputTimestampType for Parquet files (code link). This could create incompatibilities when Parquet data written by Spark is read by readers that do not support the INT96 type. We should consider changing the default timestamp type to INT64 in future versions.
Does this PR introduce any user-facing change?
The default timestamp type for Parquet files will be changed to INT64. Older versions of applications that support INT96 should enable the INT96 type by setting
spark.sql.parquet.int96AsTimestamp
totrue
andspark.sql.parquet.outputTimestampType
toINT96
.How was this patch tested?
Updated the unit tests in ParquetSchemaSuite and SQLConfSuite to reflect INT64 as the default timestamp type for Parquet files.
Was this patch authored or co-authored using generative AI tooling?