Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "kserve: add grpconly envoy filter (#888)" #931

Merged
merged 1 commit into from
Jul 12, 2024

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Mar 21, 2024

This reverts commit 48e0b8b (#888).

This was a temporary fix for https://issues.redhat.com/browse/RHOAIENG-165 which does not affect caikit+tgis, tgis or ovms anymore.

testing instructions:

  1. check out manifests: add grpconly label to apply global EnvoyFilter #888 and spin it up
  2. check that the kserve-temporary-fixes feature has been added and that grpc-only EnvoyFilter exists.
  3. Spin this PR up
  4. check that the kserve-temporary-fixes feature has been deleted and that the grpc-only EnvoyFilter has been deleted

@openshift-ci openshift-ci bot requested review from etirelli and ykaliuta March 21, 2024 13:50
@zdtsw zdtsw requested review from bartoszmajsak and israel-hdez and removed request for ykaliuta March 21, 2024 15:30
@israel-hdez
Copy link
Contributor

How / where is the formal fix done?

@bartoszmajsak
Copy link
Contributor

I would also like to address another thing as part of this "revert".

How do we make sure this filter is removed from the already running clusters?

We should have a logic to handle such cases. Before we come up with something more advanced I think we can simply add some logic to the bootstrap process (main.go) and remove FeatureTracker by name appNamespace + "-" + name ("opendatahub-kserve-temporary-fixes"). Though for that we have to expose FT funcs to be usable from main.go.

WDYT @aslakknutsen @zdtsw ?

@zdtsw
Copy link
Member

zdtsw commented Mar 22, 2024

How / where is the formal fix done?

same question: is a fix already merged somewhere in upstream? or how/why need a revert here

@zdtsw
Copy link
Member

zdtsw commented Mar 22, 2024

I would also like to address another thing as part of this "revert".

How do we make sure this filter is removed from the already running clusters?

We should have a logic to handle such cases. Before we come up with something more advanced I think we can simply add some logic to the bootstrap process (main.go) and remove FeatureTracker by name appNamespace + "-" + name ("opendatahub-kserve-temporary-fixes"). Though for that we have to expose FT funcs to be usable from main.go.

WDYT @aslakknutsen @zdtsw ?

will that be suitable to catch all the cases (e.g remove unuse resource, old Featuretracker, temp. workaround) into upgrade.go than in main.go (or have a new file only handle these "cleanup" only)

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Mar 22, 2024

@israel-hdez it was an upstream fix in the python grpcio module. Updating this dependency in text-generation-inference and caikit fixed the issue.

@bartoszmajsak
Copy link
Contributor

I would also like to address another thing as part of this "revert".
How do we make sure this filter is removed from the already running clusters?
We should have a logic to handle such cases. Before we come up with something more advanced I think we can simply add some logic to the bootstrap process (main.go) and remove FeatureTracker by name appNamespace + "-" + name ("opendatahub-kserve-temporary-fixes"). Though for that we have to expose FT funcs to be usable from main.go.
WDYT @aslakknutsen @zdtsw ?

will that be suitable to catch all the cases (e.g remove unuse resource, old Featuretracker, temp. workaround) into upgrade.go than in main.go (or have a new file only handle these "cleanup" only)

I wouldn't say it captures all cases, as some Features can have custom cleanup logic, but for this given one it would suffice. I need to think it through.

@israel-hdez
Copy link
Contributor

israel-hdez commented Mar 22, 2024

@israel-hdez it was an upstream fix in the python grpcio module. Updating this dependency in text-generation-inference and caikit fixed the issue.

@dtrifiro You should, then, get in touch with dashboard team and coordinate this PR to go together with an update to the images, for ODH 2.10 release. I still see older images here: https://github.com/opendatahub-io/odh-dashboard/blob/4b8b9b80b725914635e1d593ccf3f22969a799d3/manifests/modelserving/kustomization.yaml.

Also, please follow-up Bartosz and Wen feedback:

How do we make sure this filter is removed from the already running clusters? We should have a logic to handle such cases. Before we come up with something more advanced I think we can simply [...] remove FeatureTracker.

@dtrifiro
Copy link
Contributor Author

Regarding the manifests: the manifests you linked include grpcio 1.62, which is not affected by RHOAIENG-165, meaning this is out already.

Regarding the resources cleanup: working on a solution, starting from upgrade.go

@israel-hdez
Copy link
Contributor

Regarding the manifests: the manifests you linked include grpcio 1.62, which is not affected by RHOAIENG-165, meaning this is out already.

It calls my attention that the manifests are from March 1st, while the workaround PR #888 was opened March 4th.

Regarding the resources cleanup: working on a solution, starting from upgrade.go

Thanks for following-up.

@dtrifiro dtrifiro marked this pull request as draft March 26, 2024 17:29
@dtrifiro
Copy link
Contributor Author

One corner I have: odh 2.9 has been released, but RHOAI 2.9 has been dropped. Do we really need to do a proper cleanup of the envoyfilter? The temporary fix has never been deployed on RHOAI for sure.

Regarding the cleanup: I have a naive implementation in upgrade.go that deletes the enovyfilter by selecting he kind, instance name and namespace, but I haven't been so far able to do something similar using the FeatureTracker API as discussed.

@zdtsw @bartoszmajsak

@zdtsw
Copy link
Member

zdtsw commented Mar 27, 2024

One corner I have: odh 2.9 has been released, but RHOAI 2.9 has been dropped. Do we really need to do a proper cleanup of the envoyfilter? The temporary fix has never been deployed on RHOAI for sure.

Regarding the cleanup: I have a naive implementation in upgrade.go that deletes the enovyfilter by selecting he kind, instance name and namespace, but I haven't been so far able to do something similar using the FeatureTracker API as discussed.

@zdtsw @bartoszmajsak

we no need to get the first "workaround" out for downstream, aka i think RHOAIENG-165 should be closed instead.

as for the "cleanup" we will still need it for ODH.

@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from 3c5b7ff to b4925f8 Compare March 27, 2024 15:57
@dtrifiro dtrifiro marked this pull request as ready for review March 27, 2024 15:57
@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from b4925f8 to 1eada22 Compare March 27, 2024 16:14
Copy link
Contributor

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outcome seems to be the needed one (notice that I didn't try the code).

I'll let the operator owners to decide if this is good, since I see there is also the CleanupExistingResource which seems to have a similar goal.

var multiErr *multierror.Error
for _, name := range toDelete {
obj := &unstructured.Unstructured{}
obj.SetGroupVersionKind(schema.GroupVersionKind{Kind: "FeatureTracker", Group: "features.opendatahub.io", Version: "v1"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if you use our own types: https://github.com/opendatahub-io/opendatahub-operator/blob/incubation/apis/features/v1/groupversion_info.go#L30

Something like v1.GroupVersion.WithKind("FeatureTracker") (assuming the right v1 is imported).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated instantiating a FeatureTracker using NewFeatureTracker

}

crd := &apiextv1.CustomResourceDefinition{}
if err := cli.Get(ctx, client.ObjectKey{Name: "featuretrackers.features.opendatahub.io"}, crd); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for sanity check? I see no references...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a sanity check, following the scheme of other similar functions. I'm assuming it could be removed, since we try to get the FeatureTracker by name and ignore NotFound errors (assuming trying to fetch a non-existing crd returns a NotFound too)

}

toDelete := []string{
"opendatahub-kserve-temporary-fixes", // Workaround for RHOAIENG-165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszmajsak would the opendatahub- prefix change? is there a const for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed so that it uses dscApplicationsNamespace

@dtrifiro dtrifiro marked this pull request as draft March 28, 2024 17:07
@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from 1eada22 to ae5f0e3 Compare March 29, 2024 12:24
@dtrifiro dtrifiro marked this pull request as ready for review March 29, 2024 12:30
@openshift-ci openshift-ci bot requested a review from cam-garrison March 29, 2024 12:30
@zdtsw
Copy link
Member

zdtsw commented May 7, 2024

should we finalize this PR before ODH 2.12.0 code freeze? (10th May)

@dtrifiro
Copy link
Contributor Author

@zdtsw #957 hasn't been merged yet and I haven't had the bandwidth to work on this, I'm afraid it will have to wait some more time

@zdtsw
Copy link
Member

zdtsw commented May 10, 2024

@zdtsw #957 hasn't been merged yet and I haven't had the bandwidth to work on this, I'm afraid it will have to wait some more time

no worries, lets leave it for odh 2.13.0

@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch 2 times, most recently from 7db8f30 to 8d72f58 Compare June 24, 2024 11:36
@dtrifiro dtrifiro marked this pull request as ready for review June 24, 2024 11:36
@openshift-ci openshift-ci bot requested review from Sara4994 and zdtsw June 24, 2024 11:39
@zdtsw zdtsw requested review from israel-hdez and removed request for Sara4994 June 24, 2024 11:43
@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from 8d72f58 to e27524e Compare June 25, 2024 08:38
@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jul 3, 2024

Is there anything holding this back?

@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from e27524e to 4f4d497 Compare July 9, 2024 12:59
Copy link

openshift-ci bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lburgazzoli for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw
Copy link
Member

zdtsw commented Jul 9, 2024

@israel-hdez can you do a final review and we can have it finalized.

@dtrifiro dtrifiro force-pushed the remove-grpc-temp-fix branch from 4f4d497 to 96783ad Compare July 9, 2024 15:36
@bartoszmajsak
Copy link
Contributor

bartoszmajsak commented Jul 9, 2024

@dtrifiro could you add steps on how to test it in the PR description? I would like to give it a spin.

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Jul 9, 2024

@bartoszmajsak

  1. check out manifests: add grpconly label to apply global EnvoyFilter #888 and spin it up
  2. check that the kserve-temporary-fixes feature has been added and that grpc-only EnvoyFilter exists.
  3. Spin this PR up
  4. check that the kserve-temporary-fixes feature has been deleted and that the grpc-only EnvoyFilter has been deleted

Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change the cleanup actually does not work

This partially reverts commit 48e0b8b and removes the corresponding FeatureTracker.

Related: RHOAIENG-165, opendatahub-io#888
@zdtsw
Copy link
Member

zdtsw commented Jul 12, 2024

LGTM

Copy link
Contributor

@cam-garrison cam-garrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with app namespace change now added to FT name

@zdtsw zdtsw merged commit 562d09c into opendatahub-io:incubation Jul 12, 2024
7 of 8 checks passed
@dtrifiro dtrifiro deleted the remove-grpc-temp-fix branch July 15, 2024 09:09
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Jul 24, 2024
This partially reverts commit 48e0b8b and removes the corresponding FeatureTracker.

Related: RHOAIENG-165, opendatahub-io#888
(cherry picked from commit 562d09c)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Jul 24, 2024
This partially reverts commit 48e0b8b and removes the corresponding FeatureTracker.

Related: RHOAIENG-165, opendatahub-io#888
(cherry picked from commit 562d09c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants