Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Recycle Bin Thread Lock Contention Causes High Latency in Drop Partition Operations #48224

Open
2 tasks done
luwei16 opened this issue Feb 24, 2025 · 0 comments
Open
2 tasks done

Comments

@luwei16
Copy link
Contributor

luwei16 commented Feb 24, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Description

When there are a large number of partitions in the catalog recycle bin, the recycle bin thread (Daemon) holds the lock <0x000000046c2bcc68> of CatalogRecycleBin for an extended period. This results in other operations like DROP PARTITION being blocked while waiting for the lock, as observed in the thread stack:

"thrift-server-pool-86" #19384 daemon prio=5 os_prio=0 tid=0x00007f6ddc088000 nid=0x3815aa waiting for monitor entry [0x00007f6973bac000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.doris.catalog.CatalogRecycleBin.recyclePartition(CatalogRecycleBin.java:187)
        - waiting to lock <0x000000046c2bcc68> (a org.apache.doris.catalog.CatalogRecycleBin)
        at org.apache.doris.catalog.OlapTable.dropPartition(OlapTable.java:950)
        at org.apache.doris.catalog.OlapTable.dropPartition(OlapTable.java:973)
        at org.apache.doris.datasource.InternalCatalog.dropPartitionWithoutCheck(InternalCatalog.java:1879)
        at org.apache.doris.datasource.InternalCatalog.dropPartition(InternalCatalog.java:1868)
        at org.apache.doris.catalog.Env.dropPartition(Env.java:3179)
        at org.apache.doris.alter.Alter.processAlterOlapTable(Alter.java:224)
        at org.apache.doris.alter.Alter.processAlterTable(Alter.java:467)
        at org.apache.doris.catalog.Env.alterTable(Env.java:4456)
        at org.apache.doris.qe.DdlExecutor.execute(DdlExecutor.java:170)
        at org.apache.doris.qe.StmtExecutor.handleDdlStmt(StmtExecutor.java:2801)
        at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:963)
        at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:595)
        at org.apache.doris.qe.ConnectProcessor.proxyExecute(ConnectProcessor.java:704)
        at org.apache.doris.service.FrontendServiceImpl.forward(FrontendServiceImpl.java:1060)
        at sun.reflect.GeneratedMethodAccessor464.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.doris.service.FeServer.lambda$start$0(FeServer.java:60)
        at org.apache.doris.service.FeServer$$Lambda$193/1612883140.invoke(Unknown Source)
        at com.sun.proxy.$Proxy28.forward(Unknown Source)
        at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:3792)
        at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:3772)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

"recycle bin" #38 daemon prio=5 os_prio=0 tid=0x00007f6e64116000 nid=0x380d62 runnable [0x00007f69402df000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.doris.catalog.CatalogRecycleBin.getSameNamePartitionIdListToErase(CatalogRecycleBin.java:527)
        - locked <0x000000046c2bcc68> (a org.apache.doris.catalog.CatalogRecycleBin)
        at org.apache.doris.catalog.CatalogRecycleBin.erasePartitionWithSameName(CatalogRecycleBin.java:556)
        - eliminated <0x000000046c2bcc68> (a org.apache.doris.catalog.CatalogRecycleBin)
        at org.apache.doris.catalog.CatalogRecycleBin.erasePartition(CatalogRecycleBin.java:510)
        - locked <0x000000046c2bcc68> (a org.apache.doris.catalog.CatalogRecycleBin)
        at org.apache.doris.catalog.CatalogRecycleBin.runAfterCatalogReady(CatalogRecycleBin.java:1010)
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116)

Reproduction Steps

Create table with frequent partition DROP/CREATE operations
Set catalog_trash_expire_second to large value
Monitor thread lock contention via JStack

Expected Behavior

DDL operations should complete within predictable timeframes regardless of recycle bin size.

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant