Skip to content

Conversation

@IgorStepanov
Copy link

  • Do only one thing
  • Non breaking API changes
  • Tested

What did this pull request do?

Fixes #7513

User Case Description

PostgreSQL has 63-character limit for identifiers (including select field aliases).
In a long chain of joins we may have a troubles:

err = db.Model(&Table1{}).Debug().
	Joins("Looooooooooooooooooooooooooooooooooooooooooooooooooooooooong").
        Joins("Looooooooooooooooooooooooooooooooooooooooooooooooooooooooong.Table2").
        Joins("Looooooooooooooooooooooooooooooooooooooooooooooooooooooooong.Table2.Table3").Find(&objList).Error

This code truncates long aliases similar Namer.UniqueName and stores a truncated to original map in db.Statement that allow to not touch old fileld mapping strategy.

@IgorStepanov IgorStepanov changed the title Fix: Long column identifiers in deeply nested joins cause fields to be omitted #7513 Fix: Long column identifiers in deeply nested joins cause fields to be omitted Jul 16, 2025
@IgorStepanov IgorStepanov force-pushed the check-max-ident-size-for-joins branch 3 times, most recently from d8589e6 to 19746ea Compare July 17, 2025 12:25
Copy link

@maxwey maxwey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a crack at this bug!

I left a couple comments, but am not a maintainer (or even very familiar with Gorm internals)

schema/naming.go Outdated
ns.IdentifierMaxLength = 64
}

if utf8.RuneCountInString(formattedName) > ns.IdentifierMaxLength {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres (can't speak for others) does not limit names to 63 characters, but instead limits it to 63 bytes. Counting runes instead of bytes can mean an incorrect truncation limit.

For example, using the query:

SELECT id AS "aręęęľlly_long_name_that_isnt_64_runes_long_is_still_truncated" FROM forms LIMIT 1;

Yields the results:

aręęęľlly_long_name_that_isnt_64_runes_long_is_still_trunca
1

Which you'll note is still being truncated, despite it being only 62 runes long.
Here's a go playground with the differences.
It appears within the playground linked above that the truncation that yields similar results to Postgres is the simple 63 byte truncation, not 63 rune truncation.

As a "fun" edge case, if a multi-byte utf8 character is the final character in the string and being truncated, Postgres appears to drop the invalid character (though doing a simple identifer[:63] would result in the last byte being invalid).

e.g.:

SELECT id AS "a_multi_byte_character_that_is_64_runes__long__is__truncated__漢" FROM forms LIMIT 1;

Results

a_multi_byte_character_that_is_64_runes__long__is__truncated__
1

Mutli byte ending - Go playground

(All my testing was done with Postgres 13.20)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thanks. I'll fix it today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if parentTableName != clause.CurrentTable {
curAliasName = utils.NestedRelationName(parentTableName, curAliasName)
aliasName := db.NamingStrategy.JoinNestedRelationNames([]string{parentTableName, curAliasName})
db.Statement.TruncatedFields[aliasName] = utils.NestedRelationName(parentTableName, curAliasName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer to name this TruncatedAliases since not all aliases being added to the map are field aliases, as seen here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@IgorStepanov IgorStepanov force-pushed the check-max-ident-size-for-joins branch 2 times, most recently from bc99639 to a4270cc Compare July 18, 2025 02:27
@jinzhu
Copy link
Member

jinzhu commented Jul 21, 2025

Would it be possible to reimplement this feature using ColumnMapping in combination with NamingStrategy? Also, the current truncateName implementation seems to share a lot of duplicated logic with formatName.

@IgorStepanov IgorStepanov force-pushed the check-max-ident-size-for-joins branch from a3054ec to 654aecb Compare August 9, 2025 23:37
@IgorStepanov
Copy link
Author

@jinzhu

Hello. Sorry for the delay.
I added using of ColumnMapping (temoorary truncatedTableAliases still used for table aliases, but as local variable).
Also I reworked formatName to use my function truncateName. However I changed hash function (as linter advised), thus I must have changed tests for it.

Looks like current linter issue is not affected by me.

@IgorStepanov
Copy link
Author

@jinzhu any comments?

@IgorStepanov IgorStepanov force-pushed the check-max-ident-size-for-joins branch from 654aecb to 9a48c44 Compare September 11, 2025 14:12
@propel-code-bot propel-code-bot bot changed the title Fix: Long column identifiers in deeply nested joins cause fields to be omitted Fix omission of fields in deeply nested joins with long identifiers (#7513) Sep 11, 2025
@propel-code-bot
Copy link
Contributor

Fix: Truncation Handling for Long Column Identifiers and Aliases in Nested Joins

This pull request addresses an issue where long column and table aliases resulting from deeply nested joins in GORM exceed PostgreSQL's identifier byte limit (63 bytes), causing fields to be omitted or SQL errors. The patch introduces systematic truncation and mapping for long identifiers using a byte-aware approach, improving the reliability of queries that generate lengthy aliases in complex join scenarios. The changes affect the naming strategies, query building, and associated tests to ensure compatibility and coverage of edge cases (including multibyte UTF-8 characters).

Key Changes

• Introduced truncation logic in NamingStrategy.truncateName() using byte length rather than rune count, ensuring strict PostgreSQL identifier compliance.
• Switched hash algorithm from SHA-1 to SHA-224 for generating suffixes when truncating names.
• Added function brokenTailSize() to handle clean truncation of UTF-8 strings to valid byte slices.
• Extended Namer interface and NamingStrategy to support JoinNestedRelationNames, facilitating consistent alias generation for nested relations.
• Modified column aliasing logic in callbacks/query.go to store and map truncated aliases, preventing field omission for long join chains.
• Refined handling of table/alias mapping (truncatedTableAliases), and adjusted code to map truncated-to-original table aliases in the statement context.
• Added and updated tests in tests/joins_test.go, schema/naming_test.go, and schema/relationship_test.go to verify correct behavior, hash outputs, and edge cases (including multibyte characters).

Affected Areas

schema/naming.go
callbacks/query.go
schema/naming_test.go
schema/relationship_test.go
tests/joins_test.go

This summary was automatically generated by @propel-code-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long column identifiers in deeply nested joins cause fields to be omitted

3 participants