Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 进程使用watch命令被卡死了大部分线程 #2938

Open
1 task
cfangpp opened this issue Nov 7, 2024 · 2 comments
Open
1 task

Elasticsearch 进程使用watch命令被卡死了大部分线程 #2938

cfangpp opened this issue Nov 7, 2024 · 2 comments

Comments

@cfangpp
Copy link

cfangpp commented Nov 7, 2024

  • 我已经在 issues 里搜索,没有重复的issue。

环境信息

  • arthas-boot.jar 或者 as.sh 的版本: xxx
  • Arthas 版本: 3.5.4
  • 操作系统版本: centos
  • 目标进程的JVM版本:
    java version "1.8.0_201"
    Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
    Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
  • 执行arthas-boot的版本: xxx

重现问题的步骤

  1. xxx
  2. xxx
  3. xxx

期望的结果

为什么?

实际运行的结果

实际运行结果,最好有详细的日志,异常栈。尽量贴文本。

[search][T#4]" #421 daemon prio=5 os_prio=0 tid=0x00007fe2b0101000 nid=0x179d waiting for monitor entry [0x00007fe0bd89a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
        at java.security.ProtectionDomain$2$1.get(ProtectionDomain.java:473)
        at sun.security.provider.PolicyFile.implies(PolicyFile.java:1080)
        at sun.security.provider.PolicySpiFile.engineImplies(PolicySpiFile.java:75)
        at java.security.Policy$PolicyDelegate.implies(Policy.java:780)
        at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java:102)
        at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
        at java.lang.Class.getClassLoader(Class.java:683)
        at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28)
        at java.arthas.SpyAPI.atEnter(SpyAPI.java:59)

为什么持锁线程进入BLOCKED?
"[search][T#20]" #458 daemon prio=5 os_prio=0 tid=0x00007fe30c10f000 nid=0x5167 waiting for monitor entry [0x00007fe263cfc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - locked <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
        at java.security.ProtectionDomain$2$1.get(ProtectionDomain.java:473)
        at sun.security.provider.PolicyFile.implies(PolicyFile.java:1080)
        at sun.security.provider.PolicySpiFile.engineImplies(PolicySpiFile.java:75)
        at java.security.Policy$PolicyDelegate.implies(Policy.java:780)
        at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java:102)
        at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
        at java.security.AccessController.checkPermission(AccessController.java:884)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
        at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
        at java.lang.Class.getClassLoader(Class.java:683)
        at com.taobao.arthas.core.advisor.SpyImpl.atExit(SpyImpl.java:53)
        at java.arthas.SpyAPI.atExit(SpyAPI.java:64)

	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- locked <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)
	- waiting to lock <0x00007fe52f60ba78> (a java.util.Collections$SynchronizedMap)

@cfangpp cfangpp changed the title 使用watch命令jvm进程卡死了大部分线程 使用watch命令jvm进程被卡死了大部分线程 Nov 8, 2024
@cfangpp
Copy link
Author

cfangpp commented Nov 8, 2024

目标进程进入了死循环,流程如下:

  1. 首先arthas拦截了目标进程中java.security.Policy实现方法implies,这里目标进程是ES,实现类是org.elasticsearch.bootstrap.ESPolicy。
  2. 进入com.taobao.arthas.core.advisor.SpyImpl遇到clazz.getClassLoader(),该方法会进行java.lang.RuntimePermission "getClassLoader"权限校验。
  3. 调取目标进程中java.security.Policy.implies,重复进入arthas SpyImpl。
  4. 最后陷入死循环。

java.lang.StackOverflowError: null
at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
at java.lang.Class.getClassLoader(Class.java:683)
at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28)
at java.arthas.SpyAPI.atEnter(SpyAPI.java:59)
at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java)
at java.security.ProtectionDomain.implies(ProtectionDomain.java:279)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:450)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.ClassLoader.checkClassLoaderPermission(ClassLoader.java:1528)
at java.lang.Class.getClassLoader(Class.java:683)
at com.taobao.arthas.core.advisor.SpyImpl.atEnter(SpyImpl.java:28)
at java.arthas.SpyAPI.atEnter(SpyAPI.java:59)
at org.elasticsearch.bootstrap.ESPolicy.implies(ESPolicy.java)

@cfangpp cfangpp changed the title 使用watch命令jvm进程被卡死了大部分线程 Elasticsearch 进程使用watch命令被卡死了大部分线程 Nov 8, 2024
@cfangpp
Copy link
Author

cfangpp commented Nov 8, 2024

解决办法,定制SecureSM的checkPermission方法,跳过检查

@Override
public void checkPermission(Permission perm) {
    // just for arthas
    if (perm instanceof RuntimePermission && "getClassLoader".equals(perm.getName())) {
        for (StackTraceElement element : Thread.currentThread().getStackTrace()) {
            if ("java.arthas.SpyAPI".equals(element.getClassName())) {
                return;
            }
        }
    }
    super.checkPermission(perm);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants