-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Hash range collision causes out-of-order cases between existing consumers in Key_Shared #23315
Comments
Just guessing, but perhaps the intention of this code is to select a different client for each partition when there are multiple consumers with the same name: Line 127 in 4f96146
|
Slightly related issue about the incorrect results of getConsumerKeyHashRanges: #23321 |
I don't think it will cause out-of-order issue.
Get a different consumer for the same hash is expected since the consumers are changed. |
The result of this bug is that the target consumer will switch also for existing consumers in certain cases. @codelipenghui Did you consider that case? |
I'm pretty sure that |
@lhotari Oh, I got your point for now. One consumer joined will cause the key assignment change for many other consumers. Thanks for the explanation. |
It looks like #8396 wasn't a correct solution at the time it was made. @codelipenghui I think that we need to address this for all maintenance branches. |
I have created #23327 to fix the issue. Please review |
#23327 is now ready for review after multiple iterations. |
The list is sorted by consumerName. Lines 67 to 73 in 4f96146
The behavior when the keys to be compared are the same seems to be undefined. So, as you say, each partition's selector could be different. |
(Thank you @lhotari !) |
Possibly the same issue in this Slack message: https://www.linen.dev/s/apache-pulsar/t/23079402/hi-we-re-using-key-shared-mode-the-key-is-a-code-between-100 |
Search before asking
Read release policy
Version
Tested with https://github.com/apache/pulsar/tree/4f96146f13b136644a4eb0cf4ec36699e0431929 .
Minimal reproduce step
Apply the following patches and run the test.
What did you expect to see?
In auto-split hash mode, we expect that the new consumer takes the hash range from existing consumers.
(The dispatcher addresses the above case by recentlyJoinedConsumers.)
So, the range doesn't move between existing consumers.
What did you see instead?
When the hash range collides, the selector stores the consumer in the list of collisions.
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ConsistentHashingStickyKeyConsumerSelector.java
Lines 67 to 73 in 4f96146
And, get the consumer by the following calculation.
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ConsistentHashingStickyKeyConsumerSelector.java
Line 127 in 4f96146
4 % 3 = 1
, then return the consumer which has consumerId 1(add new consumer which has consumerId 3)
4 % 4 = 0
, then return the consumer which has consumerId 0Consumers with consumerId of 0 and 1 are existing consumers. So, the range moves between existing consumers.
The above case leads to out-of-order redelivery.
Shouldn't we care about this?
Anything else?
For ease, I use the same name as the consumer in this example. However, this issue is caused not only by consumers of the same name but also by coincidence hash collisions.
(This issue was originally reported by @hrsakai .)
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: