-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing \u0000 with nothing for PostgreSqlDatabaseDialect #1264
base: master
Are you sure you want to change the base?
Replacing \u0000 with nothing for PostgreSqlDatabaseDialect #1264
Conversation
|
How are your producers adding that byte sequence in the first place? Wouldn't it make more sense to fix your strings at the producer? |
@OneCricketeer In our case, our producer is a Kafka Connector which reads data from MySQL database. Null character \u0000 is supported by MySQL. In any case, we need to change one of the connectors to make this thing work. |
According to: https://stackoverflow.com/a/1348551, 0x00 is never a supported character and MySQL produces it because it is a supported char. PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported). Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE Often time we don't control the source, so it needs to be done at the consumer. @oleksandrkovalenko, maybe we could put this behind a flag so we can ensure no behavior change? |
If replacing with nothing the correct thing to do? We've elected to replace with the "Replace Character" ( |
Stumbled across the same issue. As @FreCap mentioned NULL (\0x00) is not a valid character for postgres. As the suggested change will replace this \0x00 with an empty string is technically not the same. The suggested PostgreSQL solution would be to change the column definition from text to bytea to store hex values. For my personal usecase this is not an option. I'd benefit from opt-in replacement from \0x00 to empty string. As the change is limited to the PostgreSqlDatabaseDialect the expected behavior change would be that if the opt-int option is taken then there is no more error when processing a message containing a \0x00 value for a text field. This would keep the connector up and running. That would be great :) |
@FreCap Would you be able to share an example how can I put this change behind a flag? |
Problem
\u0000 UFT character is not supported by Postgres text field. An attempt to insert fails with error:
org.postgresql.util.PSQLException:
ERROR: invalid byte sequence for encoding "UTF8": 0x00`Solution
We are replacing \u0000 character with nothing
Does this solution apply anywhere else?
If yes, where?
Test Strategy
Testing done:
Release Plan