Skip to content

Apache Iceberg 1.4.0

Compare
Choose a tag to compare
  • API
    • Implement bound expression sanitization (#8149)
    • Remove overflow checks in DefaultCounter causing performance issues (#8297)
    • Support incremental scanning with branch (#5984)
    • Add a validation API to DeleteFiles which validates files exist (#8525)
  • Core
    • Use V2 format by default in new tables (#8381)
    • Use zstd compression for Parquet by default in new tables (#8593)
    • Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
    • Avoid generating huge manifests during commits (#6335)
    • Add a writer for unordered position deletes (#7692)
    • Optimize DeleteFileIndex (#8157)
    • Optimize lookup in DeleteFileIndex without useful bounds (#8278)
    • Optimize split offsets handling (#8336)
    • Optimize computing user-facing state in data tasks (#8346)
    • Don't persist useless file and position bounds for deletes (#8360)
    • Don't persist counts for paths and positions in position delete files (#8590)
    • Support setting system-level properties via environmental variables (#5659)
    • Add JSON parser for ContentFile and FileScanTask (#6934)
    • Add REST spec and request for commits to multiple tables (#7741)
    • Add REST API for committing changes against multiple tables (#7569)
    • Default to exponential retry strategy in REST client (#8366)
    • Support registering tables with REST session catalog (#6512)
    • Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
    • Add total data size to partitions metadata table (#7920)
    • Extend ResolvingFileIO to support bulk operations (#7976)
    • Key metadata in Avro format (#6450)
    • Add AES GCM encryption stream (#3231)
    • Fix a connection leak in streaming delete filters (#8132)
    • Fix lazy snapshot loading history (#8470)
    • Fix unicode handling in HTTPClient (#8046)
    • Fix paths for unpartitioned specs in writers (#7685)
    • Fix OOM caused by Avro decoder caching (#7791)
  • Spark
    • Added support for Spark 3.5
      • Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
      • Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
      • Column pruning in merge-on-read operations.
      • Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
    • Dropped support for Spark 3.1
    • Deprecated support for Spark 3.2
    • Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
    • Increase default advisory partition size for writes in Spark 3.5 (#8660)
    • Support distributed planning in Spark 3.4 and 3.5 (#8123)
    • Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
    • Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
    • Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
    • Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
    • Output net changes across snapshots for carryover rows in CDC (#7326)
    • Display read metrics on Spark SQL UI (#7447) (#8445)
    • Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
    • Add fast_forward procedure (#8081)
    • Support filters when rewriting position deletes (#7582)
    • Support setting current snapshot with ref (#8163)
    • Make backup table name configurable during migration (#8227)
    • Add write and SQL options to override compression config (#8313)
    • Correct partition transform functions to match the spec (#8192)
    • Enable extra commit properties with metadata delete (#7649)
  • Flink
    • Add possibility of ordering the splits based on the file sequence number (#7661)
    • Fix serialization in TableSink with anonymous object (#7866)
    • Switch to FileScanTaskParser for JSON serialization of IcebergSourceSplit (#7978)
    • Custom partitioner for bucket partitions (#7161)
    • Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
    • Support alter table column (#7628)
  • Parquet
    • Add encryption config to read and write builders (#2639)
    • Skip writing bloom filters for deletes (#7617)
    • Cache codecs by name and level (#8182)
    • Fix decimal data reading from ParquetAvroValueReaders (#8246)
    • Handle filters with transforms by assuming data must be scanned (#8243)
  • ORC
    • Handle filters with transforms by assuming the filter matches (#8244)
  • Vendor Integrations
    • GCP: Fix single byte read in GCSInputStream (#8071)
    • GCP: Add properties for OAtuh2 and update library (#8073)
    • GCP: Add prefix and bulk operations to GCSFileIO (#8168)
    • GCP: Add bundle jar for GCP-related dependencies (#8231)
    • GCP: Add range reads to GCSInputStream (#8301)
    • AWS: Add bundle jar for AWS-related dependencies (#8261)
    • AWS: support config storage class for S3FileIO (#8154)
    • AWS: Add FileIO tracker/closer to Glue catalog (#8315)
    • AWS: Update S3 signer spec to allow an optional string body in S3SignRequest (#8361)
    • Azure: Add FileIO that supports ADLSv2 storage (#8303)
    • Azure: Make ADLSFileIO implement DelegateFileIO (#8563)
    • Nessie: Provide better commit message on table registration (#8385)
  • Dependencies
    • Bump Nessie to 0.71.0
    • Bump ORC to 1.9.1
    • Bump Arrow to 12.0.1
    • Bump AWS Java SDK to 2.20.131