Iceberg queries stuck in the planning phase for long time #23451

sriharshaj · 2024-09-16T20:22:34Z

Our Iceberg queries are getting stuck in the planning phase for 2 to 3 minutes, although they eventually run successfully. We are currently upgrading Trino from version 444 to 454. This issue has been occurring since Trino version 451 (we went back and tried different versions).

Our Iceberg stack -->
Storage: s3
File format: parquet
Catalog: Hive

Query explain:

 Trino version: 454
 Fragment 0 [SINGLE]
     Output layout: [expr]
     Output partitioning: SINGLE []
     Output[columnNames = [_col0]]
     │   Layout: [expr:integer]
     │   Estimates: {rows: 1 (5B), cpu: 0, memory: 0B, network: 0B}
     │   _col0 := expr
     └─ Project[]
        │   Layout: [expr:integer]
        │   Estimates: {rows: 1 (5B), cpu: 5, memory: 0B, network: 0B}
        │   expr := integer '1'
        └─ Limit[count = 1]
           │   Layout: []
           │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
           └─ LocalExchange[partitioning = SINGLE]
              │   Layout: []
              │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
              └─ RemoteSource[sourceFragmentIds = [1]]
                     Layout: []

 Fragment 1 [SOURCE]
     Output layout: []
     Output partitioning: SINGLE []
     LimitPartial[count = 1]
     │   Layout: []
     │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
     └─ TableScan[table = table_name$data@2183018889107447028 constraint on [dt] LIMIT 1]
            Layout: []
            Estimates: {rows: 8499944 (0B), cpu: 0, memory: 0B, network: 0B}
            83:dt:varchar
                :: [[2024-08-30T13:00:00Z, 2024-08-30T14:00:00Z)]

(1 row)

Query 20240830_150248_35329_wgfii, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
3:23 [0 rows, 0B] [0 rows/s, 0B/s]

Query info:
https://gist.github.com/sriharshaj/f26e655f233b84754da8216be2ae0172

The text was updated successfully, but these errors were encountered:

wendigo · 2024-09-17T13:00:10Z

This could be related to #23384

wendigo · 2024-09-17T13:00:59Z

We will release a new Trino version this week. I'll postpone further investigation so we can check whether the fix in this new version helps.

sriharshaj · 2024-09-18T20:07:35Z

@wendigo We tested Trino version 458, and while the planning time has improved, it remains unusually high, ranging between 30 to 50 seconds.

wendigo · 2024-09-18T20:14:47Z

@sriharshaj do you know what accounts for this number? Metadata retrieval from storage? Metastore calls?

sriharshaj · 2024-09-18T20:32:55Z

Here are the optimizer summaries.

   "optimizerRulesSummaries": [
      {
        "rule": "io.trino.sql.planner.optimizations.AddExchanges",
        "invocations": 1,
        "applied": 1,
        "totalTime": 5120641837,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.DetermineTableScanNodePartitioning",
        "invocations": 3,
        "applied": 1,
        "totalTime": 4029729665,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PushPredicateIntoTableScan",
        "invocations": 1,
        "applied": 1,
        "totalTime": 3586457350,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.FilterExpressionRewrite",
        "invocations": 58,
        "applied": 1,
        "totalTime": 2148574,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.ProjectExpressionRewrite",
        "invocations": 92,
        "applied": 0,
        "totalTime": 921214,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.optimizations.PredicatePushDown",
        "invocations": 7,
        "applied": 7,
        "totalTime": 442238,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PushLimitIntoTableScan",
        "invocations": 4,
        "applied": 1,
        "totalTime": 293083,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneTableScanColumns",
        "invocations": 7,
        "applied": 1,
        "totalTime": 273248,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneOutputSourceColumns",
        "invocations": 16,
        "applied": 1,
        "totalTime": 268922,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneProjectColumns",
        "invocations": 7,
        "applied": 4,
        "totalTime": 224337,
        "failures": 0
      }
    ],

Is there a way to analyze why the optimizers are taking so long? Additionally, where can I find details on metadata retrieval and Metastore calls?

wendigo · 2024-09-18T20:36:28Z

@sriharshaj You can enable tracing (opentelemetry) and capture what cluster is doing

sriharshaj · 2024-09-19T17:27:17Z

@wendigo Can I capture those metrics with JMX? We don't have opentelemetry setup.

wendigo · 2024-09-19T18:06:51Z

@sriharshaj jmx keeps aggregates, not individual events

sriharshaj · 2024-09-19T19:04:41Z

@wendigo I installed Trino locally and ran the same query. The planning phase took approximately 15 seconds.

The ConnectorMetadata.getTableProperties method is taking around 1.5 to 2.0 seconds to retrieve the table metadata.

During query optimization in Trino, metadata is being fetched five times, and during the fragment generation phase, it’s being retrieved three additional times.

sriharshaj · 2024-09-19T19:12:15Z

This issue occurs exclusively with Iceberg queries. For Hive, everything works as expected.

wendigo · 2024-09-19T19:25:45Z

@sriharshaj I believe that @raunaqmorarka added recently some cache for metadata files.

wendigo · 2024-09-19T19:56:02Z

What's the version you are using? @sriharshaj

sriharshaj · 2024-09-19T20:51:19Z

@wendigo We are facing this issue since 451.

I traced down the issue to this change: https://github.com/trinodb/trino/pull/15712/files#diff-e1cb17efec6787989f9df9ee40c4f2809ff3fe946cd2ec721ff8932b131997b8R618.

Our schema contains a large number of nested fields, which results in all columns being mapped to IcebergColumnHandle. When I debugged a specific table, it was mapping approximately 2,260 nested columns to IcebergColumnHandle, which can be significantly impacting the planning.

wendigo · 2024-09-19T21:00:24Z

@krvikash @raunaqmorarka can you take a look?

sriharshaj · 2024-09-19T21:10:34Z

Thank you, @wendigo, for your guidance in helping me identify the issue.

sriharshaj changed the title ~~Iceberg queries are getting stuck in the planning phase for 2 to 3 minutes~~ Iceberg queries stuck in the planning phase for 2-3 minutes Sep 18, 2024

sriharshaj changed the title ~~Iceberg queries stuck in the planning phase for 2-3 minutes~~ Iceberg queries stuck in the planning phase for long time Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg queries stuck in the planning phase for long time #23451

Iceberg queries stuck in the planning phase for long time #23451

sriharshaj commented Sep 16, 2024 •

edited

Loading

wendigo commented Sep 17, 2024

wendigo commented Sep 17, 2024

sriharshaj commented Sep 18, 2024

wendigo commented Sep 18, 2024

sriharshaj commented Sep 18, 2024

wendigo commented Sep 18, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

Iceberg queries stuck in the planning phase for long time #23451

Iceberg queries stuck in the planning phase for long time #23451

Comments

sriharshaj commented Sep 16, 2024 • edited Loading

wendigo commented Sep 17, 2024

wendigo commented Sep 17, 2024

sriharshaj commented Sep 18, 2024

wendigo commented Sep 18, 2024

sriharshaj commented Sep 18, 2024

wendigo commented Sep 18, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

wendigo commented Sep 19, 2024

sriharshaj commented Sep 19, 2024

sriharshaj commented Sep 16, 2024 •

edited

Loading