Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for loom #230

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Support for loom #230

wants to merge 7 commits into from

Conversation

zenghu1chen
Copy link

My project uses jedis component with loom, but there are cases of pinning and even deadlocks when redis times out. By replacing synchronized with ReentrantLock in the code, the above situations will be avoided. Below is the stacktrace when pinning occurs.

Thread[#253,ForkJoinPool-1-worker-2,5,CarrierThreads]
    java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:180)
    java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:398)
    java.base/jdk.internal.vm.Continuation.yield0(Continuation.java:390)
    java.base/jdk.internal.vm.Continuation.yield(Continuation.java:357)
    java.base/java.lang.VirtualThread.yieldContinuation(VirtualThread.java:370)
    java.base/java.lang.VirtualThread.parkNanos(VirtualThread.java:532)
    java.base/java.lang.System$2.parkVirtualThread(System.java:2615)
    java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:67)
    java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:408)
    java.base/sun.nio.ch.Poller.poll2(Poller.java:137)
    java.base/sun.nio.ch.Poller.poll(Poller.java:102)
    java.base/sun.nio.ch.Poller.poll(Poller.java:87)
    java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:175)
    java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:275)
    java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:299)
    java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:340)
    java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:789)
    java.base/java.net.Socket$SocketInputStream.read(Socket.java:1025)
    java.base/java.io.InputStream.read(InputStream.java:217)
    redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:195)
    redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
    redis.clients.jedis.Protocol.process(Protocol.java:141)
    redis.clients.jedis.Protocol.read(Protocol.java:205)
    redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:306)
    redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:200)
    redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:163)
    credis.java.client.pool.impl.CRedisJedisFactory.destroyObject(CRedisJedisFactory.java:57)
    org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:886)
    org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:634) <== monitors:1
    redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:101)
    redis.clients.util.Pool.returnBrokenResource(Pool.java:80)
    redis.clients.jedis.Jedis.close(Jedis.java:3359)
    ...

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@312fb14). Click here to learn what that means.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master     #230   +/-   ##
=========================================
  Coverage          ?   83.19%           
  Complexity        ?      787           
=========================================
  Files             ?       42           
  Lines             ?     3076           
  Branches          ?      312           
=========================================
  Hits              ?     2559           
  Misses            ?      410           
  Partials          ?      107           
Impacted Files Coverage Δ
...in/java/org/apache/commons/pool2/PooledObject.java 47.05% <100.00%> (ø)
...ache/commons/pool2/impl/BaseGenericObjectPool.java 88.36% <100.00%> (ø)
...che/commons/pool2/impl/GenericKeyedObjectPool.java 85.14% <100.00%> (ø)
...g/apache/commons/pool2/impl/GenericObjectPool.java 86.47% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@garydgregory
Copy link
Member

@zenghu1chen
Thank you for the PR.
Please see the build failure and update the PR.
There are no unit tests here, I'd really like to see a test that can demonstrate a dead lock in order to prove that the changes actually fix an issue rather than relying on some external empirical evidence. The test need not use Redis for example but be built in whatever way effectively causes a test to fail or hang.
CC: @psteitz

@@ -33,6 +34,8 @@
*/
public interface PooledObject<T> extends Comparable<PooledObject<T>> {

ReentrantLock lock = new ReentrantLock();
Copy link
Member

@garydgregory garydgregory Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global variable? That looks like an anti-pattern. -1 unless you can explain what I am misunderstanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock is to replace synchronized(PooledObject) with ReentrantLock in the GenericObjectPool,BaseGenericObjectPool and GenericKeyedObjectPool class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @zenghu1chen
A mutable global variable in an interface is an anti-pattern IMO, you've not addressed this issue in your comment.

@zenghu1chen
Copy link
Author

@zenghu1chen Thank you for the PR. Please see the build failure and update the PR. There are no unit tests here, I'd really like to see a test that can demonstrate a dead lock in order to prove that the changes actually fix an issue rather than relying on some external empirical evidence. The test need not use Redis for example but be built in whatever way effectively causes a test to fail or hang. CC: @psteitz

I'm trying to write test cases to reproduce the deadlock situation. Please wait for a while, thank you.

@psteitz
Copy link
Contributor

psteitz commented Jul 17, 2023

To help with tracing, could you provide the versions of both Jedis and Commons Pool that you are using?

@zenghu1chen
Copy link
Author

To help with tracing, could you provide the versions of both Jedis and Commons Pool that you are using?

Hello @psteitz
The version of Jedis I'm using is 2.8.1.3, the version of Commons Pool is 2.4.2, and the Java version is 19.0.2.
My application was running stably for two months, but one day Redis had a failure, which caused timeouts or other exceptions (I'm not sure about the details of the failure). After that, all requests were hung up and did not recover for several hours, and it was not until after restarting the application instance that it resumed normal operation.

@garydgregory
Copy link
Member

@zenghu1chen
I think the first step would be to test your use case with the current version of Pool, 2.11.1, not 2.4.2. Also consider testing with git branch POOL_2_X. 2.4.2 is very old!

@zenghu1chen
Copy link
Author

@zenghu1chen I think the first step would be to test your use case with the current version of Pool, 2.11.1, not 2.4.2. Also consider testing with git branch POOL_2_X. 2.4.2 is very old!

@garydgregory
I tried to test Jedis to reproduce deadlock situations, but I failed because I was not clear about what kind of Redis failure would lead to deadlock.
However, since I observed the occurrence of "pinning" in the GenericObjectPool class, I started testing this class. Currently, there is an imperfect test case that can reproduce the deadlock situation. The usage is a bit strange and relies on probability to generate deadlock, but it should also demonstrate the possibility of deadlock.
You can find the test case at the following link:
https://github.com/zenghu1chen/loom-compatibility-test/blob/main/src/main/java/org/example/commonspool2/GenericObjectPoolTest.java

@garydgregory
Copy link
Member

Hi @zenghu1chen
Thank for your update, I might be able to look this weekend.
Do note that I am still -1 on this PR due to the use of a global variable.

@psteitz
Copy link
Contributor

psteitz commented Jul 20, 2023

Many thanks for the test and analysis. I will look at this in the next couple of days too. The test looks interesting as it is doing something that is not expected - invalidating an object after returning it to the pool. Generally, once you return an object to the pool, you should not reference it subsequently as it may have been handed out to another client. To check and invalidate idle instances in the pool, its best to enable testWhileIdle, enable the evictor and let the pool handle it. But there should be protection at the PooledObject level to prevent a deadklock here, so worth investigating. Is the Jedis code doing this? It would be good to open a JIRA for this issue and we should probably separate the deadlock issue (if it turns out there is one) from the use of intrinsic locks. A catchall issue to replace all intrinsic locks might make sense at this point. I agree with Gary on the smelliness of putting the lock in the PooledObject interface. An alternative might be to expose a lock() method and implement it in DefaultPooledObject using a ReentrantLock. Unfortunately, without a default implementation that would be an incompatible change, so not possible in 2.x. Maybe someone else has a clever idea on how to provide a default that would work.

@psteitz
Copy link
Contributor

psteitz commented Jul 23, 2023

I looked at the test case. I would expect to see the test threads getting a lot of IllegalStateExceptions due to what amounts to a misuse of the API. We should probably clarify that invalidating an object after you have returned it to the pool (or doing anything else with it) violates the pool contract. The borrow - invalidate sequence in the test could be intermediated by a borrow by another thread. That will cause ISE when the other thread returns it if that happens after the invalidate's destroy completes. The invalidates will throw ISE if another thread has destroyed the instance (which the test setup allows). All of this is to say, you can't do what the test is doing. Does the Jedis code do this?

@psteitz
Copy link
Contributor

psteitz commented Jul 24, 2023

One more comment on this. Your Jedis version also looks very old. You might try updating it along with the pool version. this part of the stack trace above looks odd to me:

    credis.java.client.pool.impl.CRedisJedisFactory.destroyObject(CRedisJedisFactory.java:57)
    org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:886)

I downloaded the tag for the version of Jedis that you mentioned and I can't find the sourses for the class above in it (or anywhere else online). What is odd is that the GOP destroy method is private. Whatever the thing above is, it should not be calling that method directly, but instead using invalidateObject. The intrinsic lock on the pooled object makes sense there and I can see the pinning, but I don't see any evidence or cause for deadlock here.

@zenghu1chen
Copy link
Author

@psteitz Thank you for your reply.
This case was assembled by me randomly without being able to confirm the cause of the fault. There were indeed many unreasonable calls, but the final result was that the program hung up, and even after 60s timeout, it did not time out as expected, so I think there is a deadlock. I am currently unable to explain the principle behind it, and I have also encountered similar deadlock issues in other components, and there may be some common principles behind them. However, I haven't figured out everything yet. It may take some time for me to think it through.

@psteitz
Copy link
Contributor

psteitz commented Jan 18, 2024

Returning back to this, I think we should eliminate the use of intrinsic locks here. The example is a little strange, but I can imagine other scenarios where pinning happens due to the intrinsic locks on PooledObjects. The code referenced in the PR acquires a lock on a PooledObject'ss monitor. DefaultPooledObject uses syncrhonized methods to guard state changes. We obviously can't add an explicit lock to the PooledObject interface. The simplest way that I can see to solve this is to add lock() and unlock() methods to the PooledObject interface, defaulted to no-ops (with warning labels) and implemented in DefaultPooledObject with a ReentrantLock. Any better ideas?

@zenghu1chen
Copy link
Author

@psteitz Thank you for your suggestion. I have already resubmitted the code.

@wyhasany
Copy link

@psteitz / @zenghu1chen please take a look on my code review

@Fenrur
Copy link

Fenrur commented May 23, 2024

Hello no news for this PR ? I also need this feature, I use apache polls with virtual threads

Copy link
Member

@linghengqian linghengqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants