Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] Table comment doesn't support UTF-8 #6612

Open
shaofengshi opened this issue Mar 5, 2025 · 5 comments · May be fixed by #6625
Open

[Bug report] Table comment doesn't support UTF-8 #6612

shaofengshi opened this issue Mar 5, 2025 · 5 comments · May be fixed by #6625
Labels
bug Something isn't working

Comments

@shaofengshi
Copy link
Contributor

Version

main branch

Describe what's wrong

I have hive table, and I enter the comment with some Chinese character. After save the table, on UI, the comments just shows with "?", like this:

Image

I checked the HTTP response, which just carries "?" in the json content.

Error message and/or stacktrace

No error.

How to reproduce

Enter some chinese character in a table's comment field, you will see it:

Image Image

Additional context

No response

@shaofengshi shaofengshi added the bug Something isn't working label Mar 5, 2025
@shaofengshi
Copy link
Contributor Author

A mysql table doesn't have this problem. So seems it is only related with Hive tables.

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 5, 2025

Yeah, the data in MySQL attached to the Hive cluster is as follows:

mysql> SELECT PARAM_VALUE FROM TABLE_PARAMS WHERE PARAM_KEY = 'comment' AND TBL_ID IN (SELECT TBL_ID FROM TBLS WHERE TBL_NAME = 't1');
+-------------+
| PARAM_VALUE |
+-------------+
| ??          |
+-------------+
1 row in set (0.01 sec)

We need to change the charset of the MySQL cluster, but it does not work when I take try, Let me dive into it.

@shaofengshi
Copy link
Contributor Author

Yeah, the data in MySQL attached to the Hive cluster is as follows:

mysql> SELECT PARAM_VALUE FROM TABLE_PARAMS WHERE PARAM_KEY = 'comment' AND TBL_ID IN (SELECT TBL_ID FROM TBLS WHERE TBL_NAME = 't1');
+-------------+
| PARAM_VALUE |
+-------------+
| ??          |
+-------------+
1 row in set (0.01 sec)

We need to change the charset of the MySQL cluster, but it does not work when I take try, Let me dive into it.

I also modified the connection string in hive-site.xml to "jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&characterEncoding=UTF-8", but that doesn't work.

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 6, 2025

This is the limitation of Hive Metastoreat present. For MySQL, Hive metastore only supports latin1 and does not support UTF8, see https://issues.apache.org/jira/browse/HIVE-18083

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 6, 2025

This problem lies in the storage itself and Gravitino server and web do not have character issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants