Add per catalog metastore metrics to QueryStats #26900

lukasz-stec · 2025-10-09T15:32:22Z

Description

Slow metastore can be a root cause of slow analysis or planning. This adds explicit metrics to the QueryStats with remote metastore call stats made for a given query.

This is what this looks like in the query.json:

{
  "queryStats": {
    ...
    "catalogMetadataMetrics": {
      "hive": {
        "metastore.getDatabase.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 1,
          "min": 281684250,
          "max": 281684250,
          "p01": 281684250,
          "p05": 281684250,
          "p10": 281684250,
          "p25": 281684250,
          "p50": 281684250,
          "p75": 281684250,
          "p90": 281684250,
          "p95": 281684250,
          "p99": 281684250
        },
        "metastore.getTable.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 1,
          "min": 22330833,
          "max": 22330833,
          "p01": 22330833,
          "p05": 22330833,
          "p10": 22330833,
          "p25": 22330833,
          "p50": 22330833,
          "p75": 22330833,
          "p90": 22330833,
          "p95": 22330833,
          "p99": 22330833
        },
        "metastore.all.time.distribution": {
          "@class": "io.trino.plugin.base.metrics.DistributionSnapshot",
          "total": 2,
          "min": 22330833,
          "max": 281684250,
          "p01": 22330833,
          "p05": 22330833,
          "p10": 22330833,
          "p25": 22330833,
          "p50": 281684250,
          "p75": 281684250,
          "p90": 281684250,
          "p95": 281684250,
          "p99": 281684250
        },
        "metastore.getDatabase.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 281684250
        },
        "metastore.all.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 304015083
        },
        "metastore.getTable.time.total": {
          "@class": "io.trino.plugin.base.metrics.LongCount",
          "total": 22330833
        }
      }
    },
    ...
  }
}

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X) Release notes are required, with the following suggested text:

## Section
* Add query metastore stats to the `QueryInfo` for hive and iceberg connectors.

Summary by Sourcery

Add per-catalog metastore metrics to QueryStats by extending the Metadata API and collecting metrics from each active catalog on query completion or failure, instrument connectors with MeasuredHiveMetastore, and verify behavior with new connector tests.

New Features:

Expose per-catalog metastore call metrics in QueryStats and QueryInfo
Instrument Hive and Iceberg connectors with MeasuredHiveMetastore to record per-method API call durations and failures

Enhancements:

Extend Metadata SPI and implementations (TracingMetadata, MetadataManager) to support getMetrics and listActiveCatalogs
Collect and attach catalog metadata metrics in QueryStateMachine upon query completion or failure
Refactor QueryStateMachineBuilder to use a beforeQueryCleanup hook instead of custom Metadata wrapper

Build:

Add dependencies on io.airlift.stats and trino-plugin-toolkit in trino-metastore pom.xml

Tests:

Add tests in BaseHiveConnectorTest and BaseIcebergConnectorTest verifying metastore metrics are present for both successful and timed-out queries
Update existing tests and fixtures to account for the new catalogMetadataMetrics field and DistributionSnapshot class

Summary by Sourcery

Add per-catalog metastore metrics tracking to QueryStats by instrumenting connectors and extending Metadata APIs to gather and report connector-specific metadata call metrics on query finish or failure.

New Features:

Expose per-catalog metastore call metrics in QueryStats and QueryInfo by collecting connector metadata metrics on query completion or failure
Instrument Hive and Iceberg connectors with MeasuredHiveMetastore to record per-method metastore API call durations and failures

Enhancements:

Extend Metadata SPI and core implementations to support getMetrics and listActiveCatalogs
Collect and attach catalog metadata metrics in QueryStateMachine via a beforeQueryCleanup hook

Build:

Add dependencies on io.airlift.stats and trino-plugin-toolkit to support new metrics types

Tests:

Add connector tests in BaseHiveConnectorTest and BaseIcebergConnectorTest to verify catalog metadata metrics for both successful and timed-out queries
Update core tests and fixtures to account for the new catalogMetadataMetrics field in QueryStats and DistributionSnapshot class

sourcery-ai · 2025-10-09T15:32:31Z

Reviewer's Guide

This PR extends the metadata API to collect and expose per-catalog metastore call metrics in QueryStats and QueryInfo. It adds SPI methods for listing active catalogs and fetching connector metrics, wraps Hive metastore calls with timing and failure counting, integrates metrics capture into QueryStateMachine on query completion or failure, and updates connectors and tests to support and validate the new metrics field.

Class diagram for MeasuredHiveMetastore and metastore metrics integration

classDiagram
    class MeasuredHiveMetastore {
        -HiveMetastore delegate
        -MetastoreApiCallStats allApiCallsStats
        -Map<String, MetastoreApiCallStats> apiCallStats
        -Ticker ticker
        +Metrics getMetrics()
        +<all HiveMetastore methods> (wrapped)
    }
    class MetastoreApiCallStats {
        -TDigest timeNanosDistribution
        -long totalTimeNanos
        -long totalFailures
        +addTime(long)
        +addFailure()
        +put(ImmutableMap.Builder<String, Metric<?>>, String)
    }
    class MeasuredMetastoreFactory {
        -HiveMetastoreFactory metastoreFactory
        +createMetastore(Optional<ConnectorIdentity>)
        +isImpersonationEnabled()
    }
    MeasuredHiveMetastore --> HiveMetastore : delegates
    MeasuredHiveMetastore --> MetastoreApiCallStats : uses
    MeasuredMetastoreFactory --> MeasuredHiveMetastore : creates
    MeasuredMetastoreFactory --> HiveMetastoreFactory : delegates

    class HiveMetastore {
        <<interface>>
        +getMetrics() : Metrics
        +<other methods>
    }
    MeasuredHiveMetastore ..|> HiveMetastore

    class Metrics {
        +Map<String, Metric<?>> metrics
    }
    MeasuredHiveMetastore --> Metrics : returns
    MetastoreApiCallStats --> Metric : builds

    class Metric {
        <<interface>>
    }

Class diagram for QueryStats and catalogMetadataMetrics field

classDiagram
    class QueryStats {
        +Map<String, Metrics> catalogMetadataMetrics
        +getCatalogMetadataMetrics()
        +<other fields and methods>
    }
    QueryStats --> Metrics : contains

    class Metrics {
        +Map<String, Metric<?>> metrics
    }
    Metrics --> Metric : contains

    class Metric {
        <<interface>>
    }

File-Level Changes

Change	Details	Files
Extend Metadata SPI to support connector-specific metrics and active catalog listing	Add getMetrics and listActiveCatalogs methods to Metadata and ConnectorMetadata interfaces Implement new SPI methods in MetadataManager, TracingMetadata, TracingConnectorMetadata, HiveMetadata, IcebergMetadata, LakehouseMetadata, and classloader-safe wrappers Propagate SPI changes through core and plugin code	`core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java` `core/trino-main/src/main/java/io/trino/metadata/Metadata.java` `core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java` `core/trino-main/src/main/java/io/trino/tracing/TracingMetadata.java` `core/trino-main/src/main/java/io/trino/tracing/TracingConnectorMetadata.java` `plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java` `plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java` `plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseMetadata.java` `lib/trino-plugin-toolkit/src/main/java/io/trino/plugin/base/classloader/ClassLoaderSafeConnectorMetadata.java`
Instrument QueryStateMachine to collect and serialize per-catalog metrics	Add catalogMetadataMetrics field and collectCatalogMetadataMetrics method Invoke metrics collection in transitionToFinishing and transitionToFailed Include catalogMetadataMetrics in QueryStats and JSON serialization	`core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java` `core/trino-main/src/main/java/io/trino/execution/QueryStats.java`
Introduce MeasuredHiveMetastore wrapper for capturing metastore API call statistics	Create MeasuredHiveMetastore and MetastoreApiCallStats to record timings and failures Wrap original HiveMetastoreFactory with MeasuredMetastoreFactory in CachingHiveMetastoreModule Implement getMetrics in CachingHiveMetastore and TracingHiveMetastore	`lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java` `plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/CachingHiveMetastoreModule.java` `lib/trino-metastore/src/main/java/io/trino/metastore/cache/CachingHiveMetastore.java` `lib/trino-metastore/src/main/java/io/trino/metastore/tracing/TracingHiveMetastore.java`
Refactor QueryStateMachineBuilder for beforeQueryCleanup hook	Replace withMetadata override in tests with beforeQueryCleanup callback Update TestQueryStateMachine builder logic	`core/trino-main/src/test/java/io/trino/execution/TestQueryStateMachine.java`
Update tests and dependencies to validate new metrics	Add catalogMetadataMetrics assertions in Hive and Iceberg connector tests Update system and information schema connector tests to ignore getMetrics calls Add getCatalogMetadataMetrics helper and adjust TestQueryStats/TestStageStats/TestQueryInfo constructors Add dependencies on io.airlift.stats and trino-plugin-toolkit in metastore POM	`plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java` `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java` `testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java` `core/trino-main/src/test/java/io/trino/execution/TestQueryStats.java` `core/trino-main/src/test/java/io/trino/execution/TestStageStats.java` `core/trino-main/src/test/java/io/trino/execution/TestQueryInfo.java` `core/trino-main/src/test/java/io/trino/server/TestBasicQueryInfo.java` `testing/trino-tests/src/test/java/io/trino/connector/informationschema/TestInformationSchemaConnector.java` `testing/trino-tests/src/test/java/io/trino/connector/system/metadata/TestSystemMetadataConnector.java` `lib/trino-metastore/pom.xml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
- Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
- Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.

## Individual Comments

### Comment 1
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java:9225-9226` </location>
<code_context>
         assertQuerySucceeds("CALL system.flush_metadata_cache()");
     }

+    @Test
+    public void testCatalogMetadataMetrics()
+    {
+        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
</code_context>

<issue_to_address>
**suggestion (testing):** Missing test for metrics with multiple catalogs.

Please add a test that runs a query across multiple catalogs to ensure metrics are tracked separately for each.

Suggested implementation:

```java
    @Test
    public void testCatalogMetadataMetrics()
    {
        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
                getSession(),
                "SELECT count(*) FROM region r, nation n WHERE r.regionkey = n.regionkey");
        Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());
        assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");
    }

    @Test
    public void testCatalogMetadataMetricsWithMultipleCatalogs()
    {
        // Assume "iceberg" and "tpch" catalogs are available for testing
        MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
                getSession(),
                "SELECT count(*) FROM iceberg.region r JOIN tpch.nation n ON r.regionkey = n.regionkey");
        Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());

        // Assert metrics for iceberg catalog
        assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");

        // Assert metrics for tpch catalog (replace with actual metric names if different)
        assertCountMetricExists(metrics, "tpch", "metastore.all.time.total");
        assertDistributionMetricExists(metrics, "tpch", "metastore.all.time.distribution");
        assertCountMetricExists(metrics, "tpch", "metastore.getTable.time.total");
        assertDistributionMetricExists(metrics, "tpch", "metastore.getTable.time.distribution");
    }

```

- If the "tpch" catalog does not support the same metrics, adjust the metric names or assertions accordingly.
- Ensure that the catalogs "iceberg" and "tpch" are available and configured in your test environment.
- If you use different catalogs, update the catalog names in the test.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

lukasz-stec · 2025-10-10T07:07:13Z

There are related CI failures. Moving to draft until I fix it

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the assertCountMetricExists and assertDistributionMetricExists helpers into a shared base test would DRY up the code and centralize metric validation.
Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
- The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the `assertCountMetricExists` and `assertDistributionMetricExists` helpers into a shared base test would DRY up the code and centralize metric validation.
- Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.

## Individual Comments

### Comment 1
<location> `core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java:392-390` </location>
<code_context>
         return queryStateMachine;
     }

+    private void collectCatalogMetadataMetrics()
+    {
+        // collect the metrics only once. This avoid issue with transaction being removed
+        // after the check but before the metrics collection
+        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
+                // The metrics collection depends on active transaction as the metrics
+                // are stored in the transactional ConnectorMetadata, but the collection can be
+                // run after the query has failed e.g., via cancel.
+                return;
+            }
+
+            ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider handling exceptions during metrics collection to avoid masking query state transitions.

Unexpected exceptions during metrics collection may interrupt query state transitions. Wrapping this logic in a try-catch and logging errors will improve robustness, especially during transitions like finishing or failure.

Suggested implementation:

```java
    private void collectCatalogMetadataMetrics()
    {
        // collect the metrics only once. This avoid issue with transaction being removed
        // after the check but before the metrics collection
        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
                // The metrics collection depends on active transaction as the metrics
                // are stored in the transactional ConnectorMetadata, but the collection can be
                // run after the query has failed e.g., via cancel.
                return;
            }

            try {
                ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
                for (CatalogInfo activeCatalog : metadata.listActiveCatalogs(session)) {
                    Metrics metrics = metadata.getMetrics(session, activeCatalog.catalogName());
                    if (!metrics.getMetrics().isEmpty()) {
                        catalogMetadataMetrics.put(activeCatalog.catalogName(), metrics);
                    }
                }

                this.catalogMetadataMetrics.set(catalogMetadataMetrics.buildOrThrow());
            }
            catch (Exception e) {
                log.error(e, "Error collecting catalog metadata metrics for query %s", queryId);
            }
        }
    }

```

If the logger (`log`) is not already defined in this class, you should add:

```java
private static final Logger log = Logger.get(QueryStateMachine.class);
```

at the top of the class, with the appropriate import:

```java
import io.airlift.log.Logger;
```
</issue_to_address>

### Comment 2
<location> `lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java:46-54` </location>
<code_context>
+    private final HiveMetastore delegate;
+    private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
+    private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
+    private final Ticker ticker = Ticker.systemTicker();
+
+    public MeasuredHiveMetastore(HiveMetastore delegate)
</code_context>

<issue_to_address>
**suggestion:** Consider allowing ticker injection for testability.

Injecting the ticker through the constructor would make it easier to test time-dependent logic.

```suggestion
    private final HiveMetastore delegate;
    private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
    private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
    private final Ticker ticker;

    public MeasuredHiveMetastore(HiveMetastore delegate)
    {
        this(delegate, Ticker.systemTicker());
    }

    public MeasuredHiveMetastore(HiveMetastore delegate, Ticker ticker)
    {
        this.delegate = requireNonNull(delegate, "delegate is null");
        this.ticker = requireNonNull(ticker, "ticker is null");
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

lukasz-stec · 2025-10-13T07:20:33Z

@findepi @raunaqmorarka This is ready for review. There is one CI failure, but it is unrelated.

findepi · 2025-10-13T12:18:04Z

test (plugin/trino-lakehouse) this job hanged it contains a couple errors like this

Treating the bytes as signed or unsigned?

It's worth a code comment explaining that Trino string codepoint-based collation is equivalent to sorting (unsigned) bytes ... in UTF-8 encoding.
When we sent `COLLATE BINARY` to oracle, are we assuming the data is compared byte-wise on Utf-8 encoding, or can it be something else?

and then

2025-10-10T06:42:14.693-0600	WARN	TestHangMonitor	io.trino.testing.services.junit.LogTestDurationListener	No test started or completed in 8.00m. Running tests:
	TestLakehouseConnectorTest running for 9.33m
	JUnit Jupiter running for 9.41m
	TestLakehouseDeltaConnectorSmokeTest running for 9.41m
	TestLakehouseHiveConnectorSmokeTest running for 9.26m
	TestLakehouseIcebergConnectorSmokeTest running for 9.38m
	TestLakehouseFileConnectorSmokeTest running for 9.41m.

are they related?

lukasz-stec · 2025-10-13T13:41:53Z

test (plugin/trino-lakehouse) this job hanged it contains a couple errors like this
...

are they related?

Yes, the impl for the io.trino.plugin.lakehouse.LakehouseMetadata#getMetrics was broken. I fixed it and added proper tests

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

findepi · 2025-10-13T14:17:46Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

+        // collect the metrics only once. This avoid issue with transaction being removed
+        // after the check but before the metrics collection
+        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {


In theory this is racy. A check in TM for a transaction on this line does not guarantee the TM knows about the transaction during listActiveCatalogs. Perhaps, the current query is timed out and cleanup happens asyncly.

Perhaps instead we could just validate transaction only once?

List<CatalogInfo> activeCatalogs; try { activeCatalogs = metadata.listActiveCatalogs(session); } catch (NotInTransactionException e) { // explanation return; }

That won't work, unfortunately, as we need the transaction also to access ConnectorMetadata that keeps the metrics.
I moved the invocation of the collectCatalogMetadataMetrics() to the places where only a single thread can reach before the transaction is committed or aborted.

do we need this if?
i realized there is already a catch (NotInTransactionExcept below

I just added the catch NotInTransactionExcept yesterday. It was not there before. Now, the if is not strictly necessary, but since this method may be called without a transaction running, and it is not exceptional flow (e.g.,START TRANSACTION statement does not itself have a transaction, or async updateQueryInfo that can happen after the query is finished), I thought I would leave it in place.
I'm fine dropping it as well.

findepi · 2025-10-13T14:19:31Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

    {
        queryStateTimer.endQuery();

+        collectCatalogMetadataMetrics();


esp in case of query failure we could want to be defensive, and suppress exceptions coming from here.

the easiest way would be to put the metrics collection code inside cleanupQuery.
are there any ordering requirements between metadata.cleanupQuery and metadata.getMetrics?

There should be no exception thrown here except for a bug. IMO, it is better to fail the query in that case, but since this is connector-specific, I understand the urge to be defensive here. Will add try catch,
There is no ordering requirement with cleanupQuery directly but cleanupQuery can be executed by multiple threads, so it is not a good place for the collectCatalogMetadataMetrics unless I move it after the

QueryState oldState = queryState.trySet(FAILED); if (oldState.isDone()) { if (log) { QUERY_STATE_LOG.debug(throwable, "Failure after query %s finished", queryId); } return false; }

core/trino-main/src/test/java/io/trino/execution/TestQueryStateMachine.java

...rino-tests/src/test/java/io/trino/connector/system/metadata/TestSystemMetadataConnector.java

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

findepi · 2025-10-13T14:31:21Z

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

+        }
+
+        public synchronized void put(ImmutableMap.Builder<String, Metric<?>> metrics, String prefix)
+        {


nit

DistributionSnapshot distributionSnapshot; long totalTimeNanos; long totalFailures; synchronized (this) { // DistributionSnapshot does not retain reference to the histogram distributionSnapshot = new DistributionSnapshot(new TDigestHistogram(timeNanosDistribution)); totalTimeNanos = this.totalTimeNanos; totalFailures = this.totalFailures; }

perhaps this is not needed because collecting metrics will generally not coincide with mestore comments
feel free to dismiss, but then add explanatory comment

I'm sorry, I don't get it. What is this change about?

reducing synchronization scope

I will leave it as is, as the ImmutableMap.Builder.put is pretty simple, so the synchronization scope would not be reduced much + there should not be a lot of contention here anyway, as in normal operation, the metrics updates and the metrics collections happen at different times.
I added a comment about DistributionSnapshot

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java

Metadata and QueryStateMachine must use the same `TransactionManager` instance.

lukasz-stec

Thanks for the review @findepi ! I addressed the comments

lukasz-stec · 2025-10-13T15:34:46Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

+        // collect the metrics only once. This avoid issue with transaction being removed
+        // after the check but before the metrics collection
+        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {


That won't work, unfortunately, as we need the transaction also to access ConnectorMetadata that keeps the metrics.
I moved the invocation of the collectCatalogMetadataMetrics() to the places where only a single thread can reach before the transaction is committed or aborted.

lukasz-stec · 2025-10-13T15:39:49Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

    {
        queryStateTimer.endQuery();

+        collectCatalogMetadataMetrics();


There should be no exception thrown here except for a bug. IMO, it is better to fail the query in that case, but since this is connector-specific, I understand the urge to be defensive here. Will add try catch,
There is no ordering requirement with cleanupQuery directly but cleanupQuery can be executed by multiple threads, so it is not a good place for the collectCatalogMetadataMetrics unless I move it after the

QueryState oldState = queryState.trySet(FAILED); if (oldState.isDone()) { if (log) { QUERY_STATE_LOG.debug(throwable, "Failure after query %s finished", queryId); } return false; }

lukasz-stec

Missed some comments

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

lukasz-stec · 2025-10-13T17:47:38Z

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

+        }
+
+        public synchronized void put(ImmutableMap.Builder<String, Metric<?>> metrics, String prefix)
+        {


I'm sorry, I don't get it. What is this change about?

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java

plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java

lukasz-stec · 2025-10-14T21:06:45Z

@findepi @raunaqmorarka There were issues with collecting the metadata metrics concurrently with updating the final query info. The query info could be triggered asynchronously, and thus, missing the collected metrics. To fix this, I added a metadata metrics collection before every query stats collection. This also makes the metadata metrics available before the query is done, which is an additional benefit.

findepi · 2025-10-15T13:28:31Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

+        // collect the metrics only once. This avoid issue with transaction being removed
+        // after the check but before the metrics collection
+        if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+            if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {


do we need this if?
i realized there is already a catch (NotInTransactionExcept below

findepi · 2025-10-15T13:28:51Z

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java

+            this.catalogMetadataMetrics.set(catalogMetadataMetrics.buildOrThrow());
+        }
+        catch (NotInTransactionException e) {
+            // ignore


Explain why this should be ignored, ie what are the legit cases where this can happen

comment added

The goal is to expose in QueryStats, per catalog, connector-specific metrics like metastore api call stats.

cla-bot bot added the cla-signed label Oct 9, 2025

lukasz-stec requested review from findepi and raunaqmorarka October 9, 2025 15:32

sourcery-ai bot reviewed Oct 9, 2025

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Show resolved Hide resolved

github-actions bot added iceberg Iceberg connector hive Hive connector labels Oct 9, 2025

lukasz-stec marked this pull request as draft October 10, 2025 07:06

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 57c65e9 to 2b8cd5c Compare October 10, 2025 08:07

github-actions bot added the lakehouse label Oct 10, 2025

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 2b8cd5c to 0c78e13 Compare October 10, 2025 08:58

github-actions bot added the jdbc Relates to Trino JDBC driver label Oct 10, 2025

Move DistributionSnapshot to trino-plugin-toolkit

b08eb4d

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch 2 times, most recently from a120650 to 75ab2b8 Compare October 10, 2025 12:26

lukasz-stec marked this pull request as ready for review October 10, 2025 15:10

sourcery-ai bot reviewed Oct 10, 2025

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java Show resolved Hide resolved

lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java Show resolved Hide resolved

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 75ab2b8 to 15f1657 Compare October 13, 2025 13:38

findepi reviewed Oct 13, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 15f1657 to c3d6dcb Compare October 13, 2025 14:55

Use the same TransactionManager

82c9d70

Metadata and QueryStateMachine must use the same `TransactionManager` instance.

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from c3d6dcb to 8be4339 Compare October 13, 2025 15:50

lukasz-stec commented Oct 13, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 8be4339 to 879926b Compare October 13, 2025 18:19

lukasz-stec commented Oct 13, 2025

View reviewed changes

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from 879926b to 19f603d Compare October 13, 2025 20:58

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch 3 times, most recently from a4aa981 to a5d9fe1 Compare October 14, 2025 15:19

lukasz-stec requested a review from findepi October 14, 2025 19:03

findepi approved these changes Oct 15, 2025

View reviewed changes

Add Metadata.getMetrics

c25ce92

The goal is to expose in QueryStats, per catalog, connector-specific metrics like metastore api call stats.

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from a44749e to c0fddfe Compare October 15, 2025 14:15

lukasz-stec added 3 commits October 15, 2025 16:21

Add metastore metrics to HiveMetadata.getMetrics

12b6476

Add metastore metrics to IcebergMetadata.getMetrics

c130a91

Add metastore metrics to LakehouseMetadata.getMetrics

10d41ae

lukasz-stec force-pushed the ls/2510/01-catalog-metastore-metrics branch from c0fddfe to 10d41ae Compare October 15, 2025 14:21

empty

daf7225

Add per catalog metastore metrics to QueryStats #26900

Are you sure you want to change the base?

Add per catalog metastore metrics to QueryStats #26900

Conversation

lukasz-stec commented Oct 9, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for MeasuredHiveMetastore and metastore metrics integration

Class diagram for QueryStats and catalogMetadataMetrics field

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukasz-stec commented Oct 10, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukasz-stec commented Oct 13, 2025

Uh oh!

findepi commented Oct 13, 2025

Uh oh!

lukasz-stec commented Oct 13, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukasz-stec commented Oct 9, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 9, 2025 •

edited

Loading