sqlserver_column.py: Handle string dtype of nvarchar #606

Cogito · 2025-04-07T08:08:07Z

nvarchar is not considered a string in the base adapter, nor in the fabric adapter. This leads to issues when dealing with nvarchar columns, such as in https://github.com/dbt-msft/dbt-sqlserver-utils/blob/master/macros/sql/union.sql

In this commit we add nvarchar to the list of string dtypes, return the correct string_size for nvarchar (which is doubled by default), and use the correct dtype in string_type.

I believe this mishandling of nvarchar is the root cause of #446

I ran into this issue when unioning multiple relations which used nvarchar. The generated code had no string length specified for the nvarchar fields, which resulted in the default string length of 30 being used.

I'm not sure if how I handle string_type is best, as it is a class method in the base class, but I think we want it to be instance specific as that is how the dtypes are handled.

Additionally, I'm not sure why char_size is twice the string length for nvarchar but I noticed it in my testing which is why I halve it in string_size(). I do understand that nvarchar uses twice the bytes as varchar, so suspect it is related, but I don't know if we would want to fix char_size to be the actual number of characters or just fix it in string_size.

nvarchar is not considered a string in the base adapter, nor in the fabric adapter. This leads to issues when dealing with nvarchar columns, such as in https://github.com/dbt-msft/dbt-sqlserver-utils/blob/master/macros/sql/union.sql In this commit we add nvarchar to the list of string dtypes, return the correct string_size for nvarchar (which is doubled by default), and use the correct dtype in string_type

Cogito · 2025-04-07T08:08:35Z

apologies for the noise, I renamed the branch which closed the first pull request

Benjamin-Knight · 2025-04-10T14:07:46Z

Additionally, I'm not sure why char_size is twice the string length for nvarchar but I noticed it in my testing which is why I halve it in string_size(). I do understand that nvarchar uses twice the bytes as varchar, so suspect it is related, but I don't know if we would want to fix char_size to be the actual number of characters or just fix it in string_size.

As you say nvarchar uses twice the bytes to store the same length. Looking at the macros we grab the max_length value from the sys.columns view and I suspect that ends up being reported as the char_size.

That means for nvarchar columns its actually reporting double the size of the string that can be stored in it and the code your adding would handle this, it might need to be handled in the macro itself if the code expects char_size to the number of characters we can store in a column so that its correctly reported through the codebase.

I assume this means nchar columns would also be affected.

Cogito · 2025-04-14T05:41:33Z

My reading of the existing code is that char_size should be the declared maximum number of characters allowed, not the number of bytes allocated.

Anywhere it is used it is assumed to be the number of characters. To date I doubt many are using it at all, as nvarchar is not currently interpreted as a string type, so the width of any columns being created is being set to the default (which is 30 I think everywhere?)

As per https://stackoverflow.com/questions/37187654/alternative-to-sys-columns-max-length we could potentially use INFORMATION_SCHEMA.COLUMNS instead of sys.columns in https://github.com/microsoft/dbt-fabric/blob/d3b8665f4e116d0fcbd14e544071f5d3cf7eb15e/dbt/include/fabric/macros/adapters/columns.sql#L9

This would probably require us maintaining a fork of that macro in https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/adapter/columns.sql unless we can get it fixed in the fabric adapter. Fabric doesn't have nvarchar datatypes so not sure if it makes sense for them to fix it there or not.

Benjamin-Knight · 2025-04-14T09:11:52Z

A SQL server specific version of the macro that gets the data seems the safest method to me, as you say it will not be implemented at the Fabric level.

That should mean that the only code change would need to be the update to is_string to handle nchar and nvarchar.

We will want a variation of https://github.com/microsoft/dbt-fabric/blob/main/tests/functional/adapter/test_column_types.py as well to support the new nvarchar and nchar handling.

Cogito added 2 commits April 7, 2025 17:51

sqlserver_column.py: Import DbtRuntimeError

05156f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlserver_column.py: Handle string dtype of nvarchar #606

sqlserver_column.py: Handle string dtype of nvarchar #606

Cogito commented Apr 7, 2025

Cogito commented Apr 7, 2025

Benjamin-Knight commented Apr 10, 2025 •

edited

Loading

Cogito commented Apr 14, 2025 •

edited

Loading

Benjamin-Knight commented Apr 14, 2025

sqlserver_column.py: Handle string dtype of nvarchar #606

Are you sure you want to change the base?

sqlserver_column.py: Handle string dtype of nvarchar #606

Conversation

Cogito commented Apr 7, 2025

Cogito commented Apr 7, 2025

Benjamin-Knight commented Apr 10, 2025 • edited Loading

Cogito commented Apr 14, 2025 • edited Loading

Benjamin-Knight commented Apr 14, 2025

Benjamin-Knight commented Apr 10, 2025 •

edited

Loading

Cogito commented Apr 14, 2025 •

edited

Loading