Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Optimize fetching table by name identifier #6638

Open
yuqi1129 opened this issue Mar 7, 2025 · 0 comments · May be fixed by #6640
Open

[Improvement] Optimize fetching table by name identifier #6638

yuqi1129 opened this issue Mar 7, 2025 · 0 comments · May be fixed by #6640
Labels
improvement Improvements on everything

Comments

@yuqi1129
Copy link
Contributor

yuqi1129 commented Mar 7, 2025

What would you like to be improved?

According to the CPU profiler,

Image

The following code is very time-consuming; we need to merge the logic of fetching the table as much as possible

public TableEntity getTableByIdentifier(NameIdentifier identifier) {
NameIdentifierUtil.checkTable(identifier);
Long schemaId =
CommonMetaService.getInstance().getParentEntityIdByNamespace(identifier.namespace());
TablePO tablePO = getTablePOBySchemaIdAndName(schemaId, identifier.name());
List<ColumnPO> columnPOs =
TableColumnMetaService.getInstance()
.getColumnsByTableIdAndVersion(tablePO.getTableId(), tablePO.getCurrentVersion());
return POConverters.fromTableAndColumnPOs(tablePO, columnPOs, identifier.namespace());
}

The SQL corresponding to getColumnsByTableIdAndVersion can be optimized:

https://github.com/apache/gravitino/blob/1297713992dfd376fc2a6fba805a6cdee61c4373/core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableColumnBaseSQLProvider.java#L28C17-L48

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version;
8 rows in set (0.28 sec)

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version
    ->   and table_id = 2716478369449788787;
8 rows in set (0.00 sec)

If we add a condition like table_id = xxxx in the end, It will more efficient.

How should we improve?

No response

@yuqi1129 yuqi1129 added the improvement Improvements on everything label Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant