-
Notifications
You must be signed in to change notification settings - Fork 140
Add codec support for column addition in schema changes #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for preserving column codec configurations during schema changes in ClickHouse when using on_schema_change: append_new_columns
or on_schema_change: sync_all_columns
. Previously, new columns were added without their specified codecs, even when defined in the model's schema configuration.
Key changes:
- Updated the
clickhouse__add_columns
macro to read codec configurations from model column definitions - Modified
ALTER TABLE ADD COLUMN
statements to includeCODEC(...)
clauses when specified - Added comprehensive integration tests covering both incremental and distributed incremental materializations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
dbt/include/clickhouse/macros/materializations/incremental/schema_changes.sql |
Enhanced clickhouse__add_columns macro to preserve codec configurations during column additions |
tests/integration/adapter/incremental/test_schema_change_codec.py |
Added comprehensive test coverage for codec preservation in schema changes |
assert all(len(row) == 3 for row in result) | ||
assert result[0][2] == 0 | ||
assert result[3][2] == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index out of bounds error. The test expects 4 rows (index 3), but line 74 asserts only 3 rows exist after the first run. After the second run with incremental data from numbers(2, 3), there should be 5 total rows, so the assertion should use index 4 instead of 3.
assert all(len(row) == 3 for row in result) | |
assert result[0][2] == 0 | |
assert result[3][2] == 5 | |
assert len(result) == 5 | |
assert all(len(row) == 3 for row in result) | |
assert result[0][2] == 0 | |
assert result[4][2] == 5 |
Copilot uses AI. Check for mistakes.
|
||
assert all(len(row) == 2 for row in result) | ||
assert result[0][1] == 0 | ||
assert result[3][1] == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index out of bounds error. The test expects 4 rows (index 3), but line 161 asserts only 3 rows exist. Since sync_all_columns replaces the schema and the incremental run uses numbers(2, 3), there should only be 3 rows total, so this assertion will fail.
assert result[3][1] == 5 | |
# Removed out-of-bounds assertion; only 3 rows exist after sync_all_columns |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Nikita1198
Thank you for this contribution!
Could you please create a dedicated macro for the codec clause? That way the usage would be more elegant and align with other clauses like this, for instance
{% macro on_cluster_clause(relation, force_sync) %} |
dcbf870
to
97ac652
Compare
cd5c67e
to
d1e908d
Compare
@Nikita1198, could you please address the lint issues? |
We have moved the macro. @BentsiLeviav, could you please restart the tests? |
@BentsiLeviav we fixed our test in this PR, but
We run all the tests several times and get this error in |
Summary
This pull request introduces support for preserving column codec configurations during incremental schema changes in ClickHouse. Previously, when using
on_schema_change: append_new_columns
oron_schema_change: sync_all_columns
, new columns were added without their specified codec, even if defined in the model's schema configuration. This PR resolves that issue, ensuring codecs are applied as intended.Changes Made
clickhouse__add_columns
macro inschema_changes.sql
to read codec configurations from model column definitions.CODEC(...)
clause inALTER TABLE ADD COLUMN
statements when a codec is specified.Example
This change ensures that codec configurations, such as
ZSTD
, are retained during incremental schema evolution, addressing the issue of codecs being lost.Checklist
Test Coverage
test_schema_change_codec.py
.append_new_columns
andsync_all_columns
scenarios.SHOW CREATE TABLE
.