Skip to content

Conversation

Nikita1198
Copy link

@Nikita1198 Nikita1198 commented Aug 7, 2025

Summary

This pull request introduces support for preserving column codec configurations during incremental schema changes in ClickHouse. Previously, when using on_schema_change: append_new_columns or on_schema_change: sync_all_columns, new columns were added without their specified codec, even if defined in the model's schema configuration. This PR resolves that issue, ensuring codecs are applied as intended.

Changes Made

  • Updated the clickhouse__add_columns macro in schema_changes.sql to read codec configurations from model column definitions.
  • Added logic to include the CODEC(...) clause in ALTER TABLE ADD COLUMN statements when a codec is specified.
  • Ensured backward compatibility: columns without codec specifications function as before.

Example

# schema.yml
models:
  - name: my_model
    columns:
      - name: my_column
        data_type: String
        codec: ZSTD  # This codec is now preserved during schema changes

This change ensures that codec configurations, such as ZSTD, are retained during incremental schema evolution, addressing the issue of codecs being lost.

Checklist

  • Added unit and integration tests covering common scenarios.
  • Provided a human-readable description of the changes for inclusion in the CHANGELOG.
  • Updated documentation in ClickHouse Docs with detailed explanations and tutorials for significant changes.

Test Coverage

  • Added comprehensive integration tests in test_schema_change_codec.py.
  • Tests cover both append_new_columns and sync_all_columns scenarios.
  • Verified codec presence in the table structure using SHOW CREATE TABLE.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for preserving column codec configurations during schema changes in ClickHouse when using on_schema_change: append_new_columns or on_schema_change: sync_all_columns. Previously, new columns were added without their specified codecs, even when defined in the model's schema configuration.

Key changes:

  • Updated the clickhouse__add_columns macro to read codec configurations from model column definitions
  • Modified ALTER TABLE ADD COLUMN statements to include CODEC(...) clauses when specified
  • Added comprehensive integration tests covering both incremental and distributed incremental materializations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dbt/include/clickhouse/macros/materializations/incremental/schema_changes.sql Enhanced clickhouse__add_columns macro to preserve codec configurations during column additions
tests/integration/adapter/incremental/test_schema_change_codec.py Added comprehensive test coverage for codec preservation in schema changes

Comment on lines +80 to +82
assert all(len(row) == 3 for row in result)
assert result[0][2] == 0
assert result[3][2] == 5
Copy link
Preview

Copilot AI Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index out of bounds error. The test expects 4 rows (index 3), but line 74 asserts only 3 rows exist after the first run. After the second run with incremental data from numbers(2, 3), there should be 5 total rows, so the assertion should use index 4 instead of 3.

Suggested change
assert all(len(row) == 3 for row in result)
assert result[0][2] == 0
assert result[3][2] == 5
assert len(result) == 5
assert all(len(row) == 3 for row in result)
assert result[0][2] == 0
assert result[4][2] == 5

Copilot uses AI. Check for mistakes.


assert all(len(row) == 2 for row in result)
assert result[0][1] == 0
assert result[3][1] == 5
Copy link
Preview

Copilot AI Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index out of bounds error. The test expects 4 rows (index 3), but line 161 asserts only 3 rows exist. Since sync_all_columns replaces the schema and the incremental run uses numbers(2, 3), there should only be 3 rows total, so this assertion will fail.

Suggested change
assert result[3][1] == 5
# Removed out-of-bounds assertion; only 3 rows exist after sync_all_columns

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@BentsiLeviav BentsiLeviav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nikita1198

Thank you for this contribution!

Could you please create a dedicated macro for the codec clause? That way the usage would be more elegant and align with other clauses like this, for instance

{% macro on_cluster_clause(relation, force_sync) %}

@CLAassistant
Copy link

CLAassistant commented Aug 12, 2025

CLA assistant check
All committers have signed the CLA.

@Romano-arist Romano-arist force-pushed the fix/codec-add-columns branch from dcbf870 to 97ac652 Compare August 12, 2025 13:17
@BentsiLeviav
Copy link
Contributor

@Nikita1198, could you please address the lint issues?
I would also encourage you to move your new macro to the table materialization file (it is more relevant there)

@Nikita1198
Copy link
Author

We have moved the macro. @BentsiLeviav, could you please restart the tests?

@Nikita1198
Copy link
Author

@BentsiLeviav we fixed our test in this PR, but test_snapshot_check_cols may or may not fall at launch, which is not part of our contribution (We don't see where we were related to this error).

FAILED tests/integration/adapter/basic/test_snapshot_check_cols.py::TestSnapshotCheckCols::test_snapshot_check_cols - assert 40 == 30

We run all the tests several times and get this error in test_snapshot_check_cols. Maybe you have an idea what it could be? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants