[Improvement] Optimize fetching table by name identifier #6638

yuqi1129 · 2025-03-07T10:52:57Z

What would you like to be improved?

According to the CPU profiler,

The following code is very time-consuming; we need to merge the logic of fetching the table as much as possible

gravitino/core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java

Lines 80 to 92 in 1297713

    
           public TableEntity getTableByIdentifier(NameIdentifier identifier) { 
        
             NameIdentifierUtil.checkTable(identifier); 
        
             Long schemaId = 
        
                 CommonMetaService.getInstance().getParentEntityIdByNamespace(identifier.namespace()); 
        
             TablePO tablePO = getTablePOBySchemaIdAndName(schemaId, identifier.name()); 
        
             List<ColumnPO> columnPOs = 
        
                 TableColumnMetaService.getInstance() 
        
                     .getColumnsByTableIdAndVersion(tablePO.getTableId(), tablePO.getCurrentVersion()); 
        
             return POConverters.fromTableAndColumnPOs(tablePO, columnPOs, identifier.namespace()); 
        
           }

The SQL corresponding to getColumnsByTableIdAndVersion can be optimized:

https://github.com/apache/gravitino/blob/1297713992dfd376fc2a6fba805a6cdee61c4373/core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableColumnBaseSQLProvider.java#L28C17-L48

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version;
8 rows in set (0.28 sec)

mysql> select
    ->   *
    -> from
    ->   table_column_version_info t1
    ->   inner join (
    ->     SELECT
    ->       column_id,
    ->       MAX(table_version) AS max_table_version
    ->     from
    ->       table_column_version_info
    ->     where
    ->       table_id = 2716478369449788787
    ->       and table_version <= 10
    ->       and deleted_at = 0
    ->     group by
    ->       column_id
    ->   ) t2 on t1.column_id = t2.column_id
    ->   AND t1.table_version = t2.max_table_version
    ->   and table_id = 2716478369449788787;
8 rows in set (0.00 sec)

If we add a condition like table_id = xxxx in the end, It will more efficient.

How should we improve?

No response

The text was updated successfully, but these errors were encountered:

yuqi1129 added the improvement Improvements on everything label Mar 7, 2025

yuqi1129 linked a pull request Mar 8, 2025 that will close this issue

[#6638]Optimize method listColumnPOsByTableIdAndVersion to make it more efficient #6640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Optimize fetching table by name identifier #6638

[Improvement] Optimize fetching table by name identifier #6638

yuqi1129 commented Mar 7, 2025 •

edited

Loading

[Improvement] Optimize fetching table by name identifier #6638

[Improvement] Optimize fetching table by name identifier #6638

Comments

yuqi1129 commented Mar 7, 2025 • edited Loading

What would you like to be improved?

How should we improve?

yuqi1129 commented Mar 7, 2025 •

edited

Loading