-
Notifications
You must be signed in to change notification settings - Fork 4k
release-26.1: mmaprototype: align replica transfer filtering with documented strategy #159886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-26.1: mmaprototype: align replica transfer filtering with documented strategy #159886
Conversation
Previously, computeCandidatesForRange (now computeCandidatesForReplicaTransfer) computed the mean over all constraint-satisfying stores, then applied post-means exclusions before returning candidates. These exclusions were: - Stores on nodes already housing replicas of the range - The shedding store itself (handled via a separate check in the loop) - If the shedding store's node was CPU overloaded, all other stores on that node The last bullet was ad-hoc and asymmetric: we wouldn't apply the same logic to other overloaded nodes that happened to have replicas. It also wasn't exercised, and the downstream candidate selection logic wouldn't choose these stores anyway (they'd be rejected for being on an overloaded node). This commit makes two changes: 1. Simplified post-means exclusions: The parameter is renamed from storesToExclude to postMeansExclusions to clarify its role. All exclusions are now handled uniformly in this set: - The shedding store (we're moving away from it) - Stores on nodes with other existing replicas The ad-hoc "exclude other stores on shedding node if node is overloaded" logic is removed. Within-node rebalance (to another store on the shedding node) is now permitted at this stage. 2. Pre-means filtering: This is new. Before computing the mean, we now filter out stores that aren't ready to receive replicas based on disposition. As a legacy case, we manually filter on high disk utilization, but there is a TODO to fold this into the disposition as well - this is tracked and will be done separately. This is done by retainReadyReplicaTargetStoresOnly. The shedding store bypasses the disposition check since it already has the replica - its load should be in the mean regardless of its disposition for NEW replicas. When filtering occurs, we recompute the mean over the filtered set. The common case (nothing filtered) still uses the cached mean for efficiency. Other cleanup: - Fixed logging (VEventf instead of mixing V() and Infof) - Added scratch fields to clusterState for the new filtering but also left a TODO to get out of scratch fields entirely. Part of cockroachdb#156776. Epic: CRDB-55052
- Improve documentation and TODOs - Skip leaseholder for disposition checks
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Backport 3/3 commits from #158473 on behalf of @tbg.
Bring replica rebalancing in line with the candidate filtering strategy
documented in rebalanceStore. Both lease transfers and replica transfers
now map relatively cleanly to the pre-means/post-means comment
on
rebalanceStore. I also made both lease and replica transfersprincipled about which of the existing replicas they should ignore
the disposition for (existing leaseholder for lease transfers, all
replicas for replica transfers).
Other changes:
Part of #156776.
^-- need to plumb the
mmaprototype.Statusinputs tommaAllocatorstill, both in asim and production code.Epic: CRDB-55052
Release justification: low risk mmaprototype changes only