-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](lock)add deadlock detection tool and monitored lock implementations #39015 #40856
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
WIP... |
2b248c6
to
f7e43ed
Compare
run buildall |
TPC-H: Total hot run time: 48871 ms
|
TPC-DS: Total hot run time: 211492 ms
|
ClickBench: Total hot run time: 30.93 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
Delay merging this pr since it's a feature and large. |
…tions apache#39015 (apache#39099) This issue proposes the addition of new features to the project, including a deadlock detection tool and monitored lock implementations. These features will help in identifying and debugging potential deadlocks and monitoring lock usage. Features: A monitored version of Lock that tracks and logs lock acquisition and release times. Functionality: Overrides lock(), unlock(), tryLock(), and tryLock(long timeout, TimeUnit unit) methods. Logs information about lock acquisition time, release time, and any failure to acquire the lock within the specified timeout. ##### eg ```log 2024-08-07 12:02:59 [ Thread-2:2006 ] - [ WARN ] Thread ID: 12, Thread Name: Thread-2 - Lock held for 1912 ms, exceeding hold timeout of 1000 ms Thread stack trace: at java.lang.Thread.getStackTrace(Thread.java:1564) at org.example.lock.AbstractMonitoredLock.afterUnlock(AbstractMonitoredLock.java:49) at org.example.lock.MonitoredReentrantLock.unlock(MonitoredReentrantLock.java:32) at org.example.ExampleService.timeout(ExampleService.java:17) at org.example.Main.lambda$test2$1(Main.java:39) at java.lang.Thread.run(Thread.java:750) ``` Uses ScheduledExecutorService for periodic deadlock checks. Logs deadlock information including thread names, states, lock info, and stack traces. **ThreadMXBean accesses thread information in the local JVM, which is already in memory, so accessing it is less expensive than fetching data from external resources such as disk or network. Thread state cache: The JVM typically maintains a cache of thread states, reducing the need for real-time calculations or additional data processing.** ##### eg ```log Thread Name: Thread-0 Thread State: WAITING Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213 Lock Owner Name: Thread-1 Lock Owner Id: 12 Waited Time: -1 Blocked Time: -1 Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213 Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213 Stack Trace: at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22) at org.example.Main.lambda$testDeadLock$3(Main.java:79) at org.example.Main$$Lambda$1/1221555852.run(Unknown Source) at java.lang.Thread.run(Thread.java:750) 2024-08-07 14:11:28 [ pool-1-thread-1:2001 ] - [ WARN ] Deadlocks detected: Thread Name: Thread-1 Thread State: WAITING Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf Lock Owner Name: Thread-0 Lock Owner Id: 11 Waited Time: -1 Blocked Time: -1 Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf Stack Trace: at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22) at org.example.Main.lambda$testDeadLock$4(Main.java:93) at org.example.Main$$Lambda$2/1556956098.run(Unknown Source) at java.lang.Thread.run(Thread.java:750) ``` ``` @WarmUp(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @threads(1) Benchmark Mode Cnt Score Error Units LockBenchmark.testMonitoredLock thrpt 2 15889.407 ops/ms LockBenchmark.testMonitoredLock:·gc.alloc.rate thrpt 2 678.061 MB/sec LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm thrpt 2 56.000 B/op LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space thrpt 2 668.249 MB/sec LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm thrpt 2 55.080 B/op LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space thrpt 2 0.075 MB/sec LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm thrpt 2 0.006 B/op LockBenchmark.testMonitoredLock:·gc.count thrpt 2 20.000 counts LockBenchmark.testMonitoredLock:·gc.time thrpt 2 6.000 ms LockBenchmark.testNativeLock thrpt 2 103130.635 ops/ms LockBenchmark.testNativeLock:·gc.alloc.rate thrpt 2 ≈ 10⁻⁴ MB/sec LockBenchmark.testNativeLock:·gc.alloc.rate.norm thrpt 2 ≈ 10⁻⁶ B/op LockBenchmark.testNativeLock:·gc.count thrpt 2 ≈ 0 counts @WarmUp(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) @measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @threads(100) Benchmark Mode Cnt Score Error Units LockBenchmark.testMonitoredLock thrpt 2 10994.606 ops/ms LockBenchmark.testMonitoredLock:·gc.alloc.rate thrpt 2 488.508 MB/sec LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm thrpt 2 56.002 B/op LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space thrpt 2 481.390 MB/sec LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm thrpt 2 55.163 B/op LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space thrpt 2 0.020 MB/sec LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm thrpt 2 0.002 B/op LockBenchmark.testMonitoredLock:·gc.count thrpt 2 18.000 counts LockBenchmark.testMonitoredLock:·gc.time thrpt 2 9.000 ms LockBenchmark.testNativeLock thrpt 2 558652.036 ops/ms LockBenchmark.testNativeLock:·gc.alloc.rate thrpt 2 0.016 MB/sec LockBenchmark.testNativeLock:·gc.alloc.rate.norm thrpt 2 ≈ 10⁻⁴ B/op LockBenchmark.testNativeLock:·gc.count thrpt 2 ≈ 0 counts ``` Issue Number: close #xxx <!--Describe your changes.-->
f7e43ed
to
0cc13d5
Compare
This issue proposes the addition of new features to the project, including a deadlock detection tool and monitored lock implementations. These features will help in identifying and debugging potential deadlocks and monitoring lock usage. Features:
A monitored version of Lock that tracks and logs lock acquisition and release times. Functionality:
Overrides lock(), unlock(), tryLock(), and tryLock(long timeout, TimeUnit unit) methods. Logs information about lock acquisition time, release time, and any failure to acquire the lock within the specified timeout. ##### eg
Uses ScheduledExecutorService for periodic deadlock checks. Logs deadlock information including thread names, states, lock info, and stack traces.
ThreadMXBean accesses thread information in the local JVM, which is already in memory, so accessing it is less expensive than fetching data from external resources such as disk or network. Thread state cache: The JVM typically maintains a cache of thread states, reducing the need for real-time calculations or additional data processing. ##### eg
Issue Number: close #xxx
Proposed changes
Issue Number: close #xxx