Skip to content

Conversation

@shangxinli
Copy link
Contributor

Add SnapshotUpdate interface that extends PendingUpdateTyped to provide common methods for all updates that create a new table snapshot. This follows the Java Iceberg API pattern where SnapshotUpdate provides a fluent API for operations like AppendFiles, DeleteFiles, and OverwriteFiles.

Key features:

  • Uses CRTP pattern for type-safe fluent API and method chaining
  • Set() method to add summary properties to snapshots
  • StageOnly() method to stage snapshots without updating current
  • Extends PendingUpdateTyped to inherit Apply() and Commit()
  • Protected members allow derived classes to access summary and stage_only state

This interface will be extended by concrete snapshot operations like:

  • AppendFiles: Add new data files to the table
  • DeleteFiles: Remove data files from the table
  • OverwriteFiles: Replace data files in the table
  • RewriteFiles: Compact and optimize data files

Changes:

  • Add src/iceberg/snapshot_update.h with SnapshotUpdate interface
  • Add forward declaration to type_fwd.h
  • Add comprehensive unit tests in snapshot_update_test.cc (10 test cases)
  • Update CMakeLists.txt to include new test file (alphabetically sorted)

Add SnapshotUpdate<Derived> interface that extends PendingUpdateTyped<Snapshot>
to provide common methods for all updates that create a new table snapshot.
This follows the Java Iceberg API pattern where SnapshotUpdate<ThisT> provides
a fluent API for operations like AppendFiles, DeleteFiles, and OverwriteFiles.

Methods implemented:
- Set(): Set summary properties on the snapshot
- StageOnly(): Stage snapshot without updating table's current snapshot
- DeleteWith(): Set custom file deletion callback for tracking or custom retention
- ToBranch(): Commit snapshot to a specific branch (for WAP workflows)

Method deferred:
- ScanManifestsWith(): Deferred until executor/thread pool infrastructure is
  available in the codebase. Documented in class comments for future addition.

Key features:
- Uses CRTP pattern for type-safe fluent API and method chaining
- Extends PendingUpdateTyped<Snapshot> to inherit Apply() and Commit()
- Protected members allow derived classes to access state
- All methods return Derived& for method chaining

Testing approach:
- Tests verify behavior through public API only (Apply/Commit)
- Avoids exposing internal state (no getter methods for protected members)
- Mock implementation returns Snapshot with configured properties
- 11 comprehensive test cases covering all methods and error paths

This interface will be extended by concrete snapshot operations like:
- AppendFiles: Add new data files to the table
- DeleteFiles: Remove data files from the table
- OverwriteFiles: Replace data files in the table
- RewriteFiles: Compact and optimize data files
Add detailed comparison between C++ and Java SnapshotUpdate implementations.
Documents all 6 Java methods and their implementation status in C++.

Implementation status:
- 4/6 methods implemented (Set, DeleteWith, StageOnly, ToBranch)
- 2/6 methods deferred:
  * ScanManifestsWith: requires executor/thread pool infrastructure
  * ValidateWith: requires SnapshotAncestryValidator infrastructure

Both missing methods have default implementations in Java that throw
UnsupportedOperationException, so they are not critical for initial usage.

The documentation serves as a reference for:
- Current implementation completeness
- Future work items
- Design decisions and trade-offs
- Comparison with Java API patterns
Fix two CI failures:

1. cpp-linter: Replace push_back with emplace_back
   - Line 116: deleted_files.emplace_back(path) instead of push_back
   - Line 156: deleted_files.emplace_back(path) instead of push_back
   - emplace_back is more efficient as it constructs in place

2. License check: Add Apache license header to markdown file
   - Add proper HTML comment-style license header to
     SNAPSHOT_UPDATE_API_COVERAGE.md
   - Matches the format used in other markdown files in the project

All tests still pass (11/11).
clang-format prefers single-line lambdas for simple callbacks.
Reformatted DeleteWith() lambda calls to match project style.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant