Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Iceberg sink implementation #46

Merged
merged 1 commit into from
May 6, 2024

Conversation

sauliusvl
Copy link
Contributor

Adding a sink for Apache Iceberg tables.

The implementation achieves exactly-once delivery by committing files and storing offsets as table properties in a single Iceberg transaction. Because Iceberg only supports table level transactions we can only support sinking to individual tables, so e.g. it's not possible to branch out a single Kafka topic to multiple tables.

The integration test Iceberg loader uses a local Hadoop catalog, i.e. writes parquet files to local disk. To validate output we use DuckDB to query the table.

@sauliusvl sauliusvl force-pushed the iceberg_sink branch 5 times, most recently from 5f9a90d to d2b3203 Compare April 25, 2024 19:38
@sauliusvl
Copy link
Contributor Author

@shivam247 could you please take a look? Once merged this should probably be released as 0.3.0, an iceberg topic tag next to the repo description would also be nice :)

@sauliusvl sauliusvl force-pushed the iceberg_sink branch 8 times, most recently from d6d0aee to 8676d01 Compare April 26, 2024 19:38
@shivam247 shivam247 merged commit 7f82a10 into adform:master May 6, 2024
1 check passed
@sauliusvl sauliusvl deleted the iceberg_sink branch May 20, 2024 12:21
@sauliusvl sauliusvl restored the iceberg_sink branch July 29, 2024 07:55
@sauliusvl sauliusvl deleted the iceberg_sink branch July 29, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants