Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] remove in address2Region while bookie left to get correct rack info #4504

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ethqunzhong
Copy link
Contributor

Motivation

we use RegionAwareEnsemblePlacementPolicy in our pulsar cluster
We encountered some unexpected issues.
(In some situation, eg, Broker and bookie restart concurrently.)

  1. Bookie X join cluster for the first time, encounters a region exception, and address2Region record X's region as default-region.
  2. Bookie X left cluster and is removed from knownBookies, but address2Region retains the information of bookie X.
  3. update Bookie X's rack info, and calling onBookieRackChange will only update address2Region for addresses present in knownBookies; therefore, bookie X's region info is not updated.
  4. Bookie X join cluster again, since address2Region contains the previous default-region information, getRegion will directly use cached data, resulting of an incorrect region.

which may cause traffic skew in ensemble selection, Causing the bookie disk to be filled up quickly.
image

Changes

We should ensure that when a bookie leaves the cluster, we also clean up the corresponding region information for that bookie in address2Region, so that it can update the correct region for the bookie during onBookieRackChange and
handleBookiesThatJoined.
do leftBookies.forEach(address2Region::remove) in handleBookiesThatLeft

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Great work with the test and the detailed comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants