Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: pkg/upgrade: abstract resource deletion with Action #957

Draft
wants to merge 9 commits into
base: incubation
Choose a base branch
from

Conversation

ykaliuta
Copy link
Contributor

@ykaliuta ykaliuta commented Apr 8, 2024

Add abstraction to perfrom action on a set of resources defined as

type ResourceSpec struct {
        Gvk       schema.GroupVersionKind
        Namespace string
        // path to the field, like "metadata", "name"
        Path []string
        // set of values for the field to match object, any one matches
        Values []string
}

Provide high level wrappers and lowlevel builder interface.
It uses Action directly since there is no any validation or
generation logic at the moment.

Generated Action does not contain resource list and can be applied
to a set of resources with Exec() or ExecWithRetry() methods.

Provide also predicates to cover matching for current usecases.

User can define its own actions, matchers and retry checkers with
signatures:

type MatcherFunc func(r ResourceSpec, obj *unstructured.Unstructured) (bool, error)
type ActionFunc func(ctx context.Context, c client.Client, r ResourceSpec, obj *unstructured.Unstructured) error
type RetryCheckFunc func(ctx context.Context, c client.Client, resources ...ResourceSpec) (bool, error)

The latter one is taken by ExecWithRetry to check is the job is done.

Examples:

err = action.NewDeleteMatched(c,
                action.Not(
                        action.MatchMapKeyContains(ODHAppPrefix, "spec", "selector", "matchLabels"))).
                Exec(ctx, d...)

err = action.NewDelete(c).Exec(ctx, d...)

With the builder:

err = action.New(c).
        Do(action.Delete).
        ExecWithRetry(ctx,
                action.IfAnyLeft(action.DefaultMatcher), d...)

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Apr 8, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Apr 8, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ykaliuta. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ykaliuta ykaliuta requested a review from bartoszmajsak April 8, 2024 12:07
@ykaliuta ykaliuta force-pushed the refactor-delete branch 3 times, most recently from d0bf97e to 0f73430 Compare April 8, 2024 17:36
@ykaliuta ykaliuta force-pushed the refactor-delete branch 2 times, most recently from c4557c1 to 4cf9c43 Compare April 8, 2024 19:49
@ykaliuta ykaliuta requested a review from zdtsw April 8, 2024 19:54
// path to the field, like "metadata", "name"
Path []string
// set of values for the field to match object, any one matches
Values []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder why this is set to array but not just string.
same as below range job.Values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code uses it that way:

deprecatedClusterroles := []string{"rhods-namespace-read", "rhods-prometheus-operator"}

But it's possible to have two entries with just a string. Whatever is better.

Gvk schema.GroupVersionKind
Namespace string
// path to the field, like "metadata", "name"
Path []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand in unstructured.NestedString, we can pass down different field not just metadata.name , is the plan that JobSpec will be extend for other puropose more than only delete resource by name.

Copy link
Contributor Author

@ykaliuta ykaliuta Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually the main point of the proposal. https://issues.redhat.com/browse/RHOAIENG-4764 wants removing Anything with appName: watson-studio ( I used it in another draft #959 on top of old version of this, patch 5bb54af)

@ykaliuta
Copy link
Contributor Author

ykaliuta commented Apr 9, 2024

I was thinking a bit more about #959. If we agree about general idea, it probably makes sense to change it such a way that it just adds this new functionality for dashboard resources without touching any existing code. And then we can continue discussion (if it's not just rejected straight away) here. Just before the release it sounds less risky.

@zdtsw
Copy link
Member

zdtsw commented Apr 9, 2024

I was thinking a bit more about #959. If we agree about general idea, it probably makes sense to change it such a way that it just adds this new functionality for dashboard resources without touching any existing code. And then we can continue discussion (if it's not just rejected straight away) here. Just before the release it sounds less risky.

i agree.
migrate the old one can be wait after code freeze.
only need dashboard part to be in

Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see that we are unifying upgrades. I shared some suggestions for the solution adjusting the current approach.

Generally speaking, I think the "upgrade/migration" approach should be discussed in a bit more detail. There are a few things which come to mind (though some might be speculative):

  • should we introduce some convention on how/where to define these steps? so each component owner can follow it easily?
  • should we add some metadata to it? e.g. version from which the migration should be applied.
  • should the list of migration steps be calculated automatically or we implement each step in a fail-safe manner?
  • how long should we keep the code around?

Comment on lines 582 to 588
didWork, err := deleteResourcesWithDone(ctx, cli, &del)
if err != nil || !didWork {
return !didWork, err
}

return true, multiErr.ErrorOrNil()
didWork, err = deleteResourcesWithDone(ctx, cli, &del)
return !didWork, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why doing it twice instead of delegating the decision of retry to the caller?

// deleteResourcesWithDone takes array of Jobs and deletes all matching resources. Returns true if any was deleted or error.
func deleteResourcesWithDone(ctx context.Context, c client.Client, jobs *[]JobSpec) (bool, error) {
var errors *multierror.Error
didWork := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a native english speaker, so I find didWork a bit hard to understand. Is !didWork a singal that it in fact did not work or it was not done? If the latter, which I guess is the case, how about renaming didWork to done instead? But then it contradicts the statement Returns true if any was deleted or error.

I wonder - what do we want to signal with this flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just explain my logic behind that, it's definitely place to improve and I agree with your concerns.

So, the original delete*AndCheck checked existence of the resource right after deletion without starting next waiting cycle in ExponentialBackoffWithContext and at the point I did not want to introduce extra delay. And I did not want to add extra check for all the cases to the generic function. So generic function does not know if the deletion actually done or not, but it can signal that it performed some actions (vs there are no objects found for action). That is why the second call if the first one really finished all the work will signal that it did nothing, means for the ExponentialBackoffWithContext that the job is done.

I think in this place it is not really critical but it changes in theory original logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what this extra check gives us. We can essentially say we are done when there's nothing to delete, if there's something we perform delete and retry, as some resources can have finalizers. Not really convinced if that extra immediate .Get gives us any benefit.

Maybe an opportunity to simplify here?

w, err := deleteOneJob(ctx, c, job)
errors = multierror.Append(errors, err)
if w {
didWork = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the significance of the information that at least one JobSpec was finished successfully? Kind of repeating my previous comment :)

Gvk: gvkOdhApplication,
Namespace: dscApplicationsNamespace,
Path: pathName,
Values: []string{"jupiterhub"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we have any tests for those... so just in case...

Suggested change
Values: []string{"jupiterhub"},
Values: []string{"jupyterhub"},

@@ -257,175 +283,213 @@ func UpdateFromLegacyVersion(cli client.Client, platform deploy.Platform, appNS
// TODO: remove function once we have a generic solution across all components.
func CleanupExistingResource(ctx context.Context, cli client.Client, platform deploy.Platform, dscApplicationsNamespace, dscMonitoringNamespace string) error {
var multiErr *multierror.Error
var toDelete []JobSpec
pathName := []string{"metadata", "name"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps better to name it as metadataNamePath?

// set of values for the field to match object, any one matches
Values []string
// hook to call before action (delete)
Fixup func(ctx context.Context, c client.Client, obj *unstructured.Unstructured) error
Copy link
Contributor

@bartoszmajsak bartoszmajsak Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could name it PreHook, Before or something similar hinting the invocation phase. Fixup reminds me of rebase and git (and maybe linking, but that was long time ago). Not sure if just by reading Fixup everyone will understand the purpose.

@@ -43,6 +40,35 @@ import (
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/metadata/labels"
)

type JobSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding on the comment from related PR (#959 (comment)), there are a few things I would consider:

JobSpec to handle multiple resources

JobSpec could have a slice of ResourceSpec and Action func to perform on them. Such a function could be a Delete, DeleteWithRetry (wrapping the original Delete for example). The signature could be as simple as

func Action(ctx context.Context, c client.Client, resourceSpecs ...ResourceSpec) error

Then we can have a receiver method which will be invoking it.

Or maybe if we ponder a bit more about naming... Job implies things like schedule, retry etc, this is just Action we want to perform on a given (set) of resources. So maybe Action as a struct name and ActionFunc for the behaviour we want to attach to it?

Simplified creation

We can think of creating constructor functions to wrap this new struct, e.g.

  • NewDeleteAction(ctx context.Context, c client.Client, resourceSpecs ...ResourceSpec)
  • NewDeleteWithRetryAction(ctx context.Context, c client.Client, resourceSpecs ...ResourceSpec)
  • NewRemoveOwnerReferenceAction(ctx context.Context, c client.Client, resourceSpecs ...ResourceSpec)

I am on the verge if we should keep the client as part of the struct or just expect it to be passed when invoking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds nice! I'll cook something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszmajsak I cooked a draft, will the approach work? I did not make the wrappers yet.

Copy link
Contributor Author

@ykaliuta ykaliuta Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the wrappers :)
PS: totally untested, just RFC yet.

Copy link
Contributor Author

@ykaliuta ykaliuta May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the wrappers :) PS: totally untested, just RFC yet.

And I have a feeling, that it's better to have something like "ResourceList" which takes care of the filtering and produce the actual list of resources to apply the action to based on the resource specs.

@ykaliuta
Copy link
Contributor Author

ykaliuta commented Apr 9, 2024

Totally agree, I had similar thoughts but they have not yet crystallised in my mind. But looks like we are on the way.

Add abstraction to perfrom action on a set of resources defined as

type ResourceSpec struct {
	Gvk       schema.GroupVersionKind
	Namespace string
	// path to the field, like "metadata", "name"
	Path []string
	// set of values for the field to match object, any one matches
	Values []string
}

Provide high level wrappers and lowlevel builder interface.
It uses Action directly since there is no any validation or
generation logic at the moment.

Generated Action does not contain resource list and can be applied
to a set of resources with Exec() or ExecWithRetry() methods.

Provide also predicates to cover matching for current usecases.

User can define its own actions, matchers and retry checkers with
signatures:

type MatcherFunc func(r ResourceSpec, obj *unstructured.Unstructured) (bool, error)
type ActionFunc func(ctx context.Context, c client.Client, r ResourceSpec, obj *unstructured.Unstructured) error
type RetryCheckFunc func(ctx context.Context, c client.Client, resources ...ResourceSpec) (bool, error)

The latter one is taken by ExecWithRetry to check is the job is done.

Examples:

err = action.NewDeleteMatched(c,
		action.Not(
			action.MatchMapKeyContains(ODHAppPrefix, "spec", "selector", "matchLabels"))).
		Exec(ctx, d...)

err = action.NewDelete(c).Exec(ctx, d...)

With the builder:

err = action.New(c).
	Do(action.Delete).
        ExecWithRetry(ctx,
                action.IfAnyLeft(action.DefaultMatcher), d...)

Signed-off-by: Yauheni Kaliuta <[email protected]>
@ykaliuta ykaliuta changed the title WIP: pkg/upgrade: abstract resource deletion with Unstructured WIP: pkg/upgrade: abstract resource deletion with Action Apr 26, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@biswassri
Copy link
Contributor

@ykaliuta I think this is something that is covered by the recent refactor?

@zdtsw
Copy link
Member

zdtsw commented Jan 2, 2025

@ykaliuta I think this is something that is covered by the recent refactor?

probably can have it closed, as target is for the "incubation" and still in Draft

@ykaliuta
Copy link
Contributor Author

ykaliuta commented Jan 8, 2025

@ykaliuta I think this is something that is covered by the recent refactor?

The intention is to provide declarative framework to perform operations on groups of k8s objects. It was triggered by upgrade case with massive deletion with massive code duplication. The original usecases probably covered by garbage collection. Some other applications removed from the code already. I'll talk to @lburgazzoli if something like that still can be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

5 participants