-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Add the manifests overlay for Kubeflow Training V2 #2382
base: master
Are you sure you want to change the base?
Conversation
Will review it later. /cc @kubeflow/wg-training-leads @kubeflow/release-team @saileshd1402 |
@Electronic-Waste: GitHub didn't allow me to request PR reviews from the following users: kubeflow/release-team, saileshd1402. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this @Doris-xm! I left some initial comments for you.
Btw, I recommend that you could learn more about the concept of Training V2. This will help you update the manifests overlay in training-operator and kubeflow/manifests:)
FYI: KubeCon 2024 NA Talk by Andrey and Yuki
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. Thank you for your great contributions!
/lgtm
/assign @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great contribution @willb!
/assign @kubeflow/wg-manifests-leads @kubeflow/release-team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great contribution and your patience!
I think, we should also add namespace configuration here:
to install these components in kubeflow-system
namespace.
Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: Xinmin Du <[email protected]>
Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: Xinmin Du <[email protected]>
Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: Xinmin Du <[email protected]>
Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: Xinmin Du <[email protected]>
…ace: kubeflow-system Signed-off-by: Xinmin Du <[email protected]> Signed-off-by: Xinmin Du <[email protected]>
9518f7b
to
1f1b0c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for updating this:)
/lgtm
/assign @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads @saileshd1402
Pull Request Test Coverage Report for Build 12742381626Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
# Conflicts: # manifests/v2/base/manager/kustomization.yaml # manifests/v2/base/rbac/kustomization.yaml # manifests/v2/base/webhook/kustomization.yaml # manifests/v2/overlays/only-manager/kustomization.yaml # manifests/v2/overlays/standalone/kustomization.yaml
Signed-off-by: Xinmin Du <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Electronic-Waste The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. PTAL if you have time @kubeflow/wg-training-leads @kubeflow/wg-manifests-leads @astefanutti
Btw, can you mark some addressed comments as resolved
@Doris-xm ? Some of them are outdated.
/lgtm
I think when you create the PR against kubeflow/manifests including the integrations tests to see how it behaves with the platform components and authorization, security etc. we can provide more input. |
manifests/overlays/kubeflow-platform/kubeflow-trainer-roles.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Xinmin Du <[email protected]>
Signed-off-by: Xinmin Du <[email protected]>
Signed-off-by: Xinmin Du <[email protected]>
Thanks for your detailed reviews. Appreciate for your patience and guidance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review @Doris-xm. Basically LGTM. Thanks!
I think when you create the PR against kubeflow/manifests including the integrations tests to see how it behaves with the platform components and authorization, security etc. we can provide more input.
It will be better if you could adopt the suggestions from @juliusvonkohout and create another PR againset kubeflow/manifests
: kubeflow/manifests#2948
/cc @kubeflow/wg-training-leads @astefanutti @kubeflow/wg-manifests-leads
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
namespace: kubeflow | ||
sortOptions: | ||
order: fifo | ||
resources: | ||
- ../../base/crds | ||
- ../../base/manager | ||
- ../../base/rbac | ||
- ../../base/runtimes/pretraining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does FIFO work fine to install all these components? Could you please examine it and provide the screenshot here? @Doris-xm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion. I will create a PR in kubeflow/manifests
to test it under CI/CD soon.
In my experiment, put runtimes
installation before the webhooks
works well:

The installation is shown above.
The trainer-controller-manager pod started successfully soon. I took this screenshot before it was ready.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! That sounds good to me. I have no other comment. Do you have any other ideas? @kubeflow/wg-training-leads @astefanutti
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing is that, as a required dependency, logically JobSet should be deployed first, so maybe move it in first position of the list, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing is that, as a required dependency, logically JobSet should be deployed first, so maybe move it in first position of the list, WDYT?
SGTM
@Electronic-Waste: GitHub didn't allow me to request PR reviews from the following users: kubeflow/wg-manifests-leads. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Xinmin Du <[email protected]>
@@ -4,12 +4,12 @@ namespace: kubeflow | |||
sortOptions: | |||
order: fifo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do the same for the standalone installation ?
So we don't need to install manager + runtimes in the separate overlays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Could you please create a separate issue to address it? @Doris-xm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion. I start to work on it.
Thanks for this @Doris-xm! I believe, we can move forward with this PR. In the future, we can raise another PR to integrate trainer into the kubeflow platform: kubeflow/manifests#2948 /lgtm |
What this PR does / why we need it:
This PR adds the manifests overlay for Kubeflow Training V2, allowing to install it within Kubeflow Platform.
Which issue(s) this PR fixes :
Fixes #2381
Checklist: