-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve cluster connection pool logic when disconnecting #1864
Open
martinslota
wants to merge
21
commits into
redis:main
Choose a base branch
from
martinslota:martinslota/clean-up-node-listeners-upon-disconnect
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
fix: Improve cluster connection pool logic when disconnecting #1864
martinslota
wants to merge
21
commits into
redis:main
from
martinslota:martinslota/clean-up-node-listeners-upon-disconnect
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… to the connection pool instance
I now created a separate repository that (hopefully) makes it easy to reproduce the bug. We have been using the fix in this branch in production throughout the last roughly 3 months and it has considerably reduced the error rates we are seeing when shutting down Bull queue clients. |
This reverts commit 2979176.
…e to connect using the Cluster client
I just pushed the fixes identified in valkey-io/iovalkey#5. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Background
This is an attempt to fix errors occurring when a
connect()
call is made shortly after adisconnect()
, which is something that the Bull library does when pausing a queue.Here's a relatively minimal way to reproduce an error:
Running that script in a loop using
against the
main
branch ofioredis
quickly results in this output:My debugging led me to believe that the existing node cleanup logic in the
ConnectionPool
class leads to race conditions: upondisconnect()
, the this.connectionPool.reset() call will remove nodes from the pool without cleaning up the event listener which may then subsequently issue more than onedrain
event. Depending on timing, one of the extradrain
events may fire afterconnect()
and change the status toclose
, interfering with the connection attempt and leading to the error above.Changes
ConnectionPool
class and remove them from the nodes whenever they are removed from the pool.-node
/drain
regardless of whether nodes disconnected or were removed through areset()
call.reset()
, add nodes before removing old ones to avoid unwanteddrain
events.this
point to the connection pool instance.main
is seemingly different from the error shown above but it still seems related to the disconnection logic and still gets fixed by the changes in this PR.