Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation KEP 257: LeaderExcluded SubGroup #428

Merged
merged 14 commits into from
Mar 19, 2025

Conversation

Edwinhr716
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it

Adds a new type of subgroup to exclude the leader. Can provide exclusive placement guarantees in the workers even if leader has different hardware requirements

Which issue(s) this PR fixes

Fixes #257

Special notes for your reviewer

Does this PR introduce a user-facing change?


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 13, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Edwinhr716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and kerthcet March 13, 2025 20:50
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 13, 2025
@Edwinhr716 Edwinhr716 changed the title Implementation KEP 257: LeaderExcluded SubGroup [WIP} Implementation KEP 257: LeaderExcluded SubGroup Mar 13, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
@Edwinhr716 Edwinhr716 changed the title [WIP} Implementation KEP 257: LeaderExcluded SubGroup [WIP] Implementation KEP 257: LeaderExcluded SubGroup Mar 13, 2025
@Edwinhr716 Edwinhr716 changed the title [WIP] Implementation KEP 257: LeaderExcluded SubGroup Implementation KEP 257: LeaderExcluded SubGroup Mar 13, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the TPU part good to you @ahg-g otherwise, LGTM.

@@ -126,7 +126,8 @@ func (p *PodWebhook) Default(ctx context.Context, obj runtime.Object) error {
SetExclusiveAffinities(pod, groupUniqueKey, epKey, leaderworkerset.GroupUniqueHashLabelKey)
}
_, foundSubGroupSize := pod.Annotations[leaderworkerset.SubGroupSizeAnnotationKey]
if foundSubGroupSize && pod.Labels[leaderworkerset.SubGroupIndexLabelKey] == "" {
_, leaderExcludedSubGroup := pod.Annotations[leaderworkerset.SubGroupPolicyTypeAnnotationKey]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check the value == LeaderWorker additionally, which is benefit once we introduced more policies.

Suggested change
_, leaderExcludedSubGroup := pod.Annotations[leaderworkerset.SubGroupPolicyTypeAnnotationKey]
value, SubGroupTypeExists := pod.Annotations[leaderworkerset.SubGroupPolicyTypeAnnotationKey]

Copy link
Contributor Author

@Edwinhr716 Edwinhr716 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an annotation for subGroupType if it is LeaderWorker. Although if we are breaking backwards compatibility with #434 anyway, I think we can reconsider adding a label even if it is LeaderWorker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerthcet thoughts on adding the label for LeaderWorker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's all about subgroup.

Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nits.

Copy link

netlify bot commented Mar 17, 2025

Deploy Preview for kubernetes-sigs-lws failed. Why did it fail? →

Name Link
🔨 Latest commit 620de0c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-lws/deploys/67d84d2373b74400081c583d

Copy link

netlify bot commented Mar 17, 2025

Deploy Preview for kubernetes-sigs-lws canceled.

Name Link
🔨 Latest commit f80362b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-lws/deploys/67d9abfb341dbc000803c9ab

@kerthcet
Copy link
Contributor

Seems the test is crashing ...

@Edwinhr716 Edwinhr716 force-pushed the subgroup-leader-only branch from 23eb37d to b491d91 Compare March 18, 2025 17:04
@Edwinhr716
Copy link
Contributor Author

/retest
known flaky test

@kerthcet
Copy link
Contributor

/lgtm
/label tide/merge-method-squash

I think we need to update the https://lws.sigs.k8s.io/docs/reference/labels-annotations-and-environment-variables/ as well, but could be a follow-up.

Thanks for this.

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 19, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit 471be46 into kubernetes-sigs:main Mar 19, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support leader being in its own subgroup
3 participants