-
Notifications
You must be signed in to change notification settings - Fork 41
feat(controller): ConfigMap image mapping override #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
o Add operator config detection in configMapUpdatePredicate o Update feature flags when operator config changes o Trigger reconciliation of all LlamaStackDistributions on operator config change Signed-off-by: Derek Higgins <[email protected]>
69e2550
to
b644848
Compare
Replace the deprecated corev1.Endpoints API with discoveryv1.EndpointSlice in test utility functions to fix linting warnings. Changes: - Add discoveryv1 import to tests/e2e/test_utils.go - Replace corev1.Endpoints with discoveryv1.EndpointSlice in logServiceEndpoints() - Refactor logServiceEndpoints() into smaller helper functions to reduce complexity: - getEndpointSlices() - retrieves EndpointSlices for a service - logEndpointSliceDetails() - logs details of a single EndpointSlice - logEndpoints() - logs endpoint addresses and returns ready/not-ready counts - logPorts() - logs ports from an EndpointSlice This resolves the staticcheck warning: SA1019: corev1.Endpoints is deprecated: This API is deprecated in v1.33+ Signed-off-by: Derek Higgins <[email protected]>
Implements a mechanism for the Llama Stack Operator to read and apply LLS Distribution image updates from a ConfigMap, enabling independent patching for security fixes or bug fixes without requiring a new LLS Operator or RHOAI version. - Add RHODS version detection from ClusterServiceVersion - Add ImageMappingOverrides field to LlamaStackDistributionReconciler - Implement parseImageMappingOverrides() to read image-overrides from ConfigMap - Add RBAC permissions for operators.coreos.com/clusterserviceversions - Support symbolic name mapping (e.g., rhdev-2.25) to specific images - Included unit tests The operator now reads image overrides from the 'image-overrides' key in the operator ConfigMap, supporting YAML format with version-to-image mappings. Overrides take precedence over default distribution images and are refreshed on each reconciler initialization. Closes: RHAIENG-1079 Signed-off-by: Derek Higgins <[email protected]>
b644848
to
677ecac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, great stuff :). Here's my first round of review.
My main points are:
- We must not reference RHODS in this upstream operator. Why and where do you need the RHODS version ? (i can't really see why this is needed). This is really important.
- Probably not related to this PR, nevertheless: The enabling of network policies should not be a 'feature flag' (which is about enabling/disabling feature that has its own configuration), but a 'configuration option' (you can combine both, feature flag and configuration option but you should not 'misuse' a feature flag as configuration of a permanent feature)
### Symbolic Name Format | ||
Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override. | |
Use the format `rh-dev-<major>.<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be to allow a nested approach:
rh-dev:
- version: 2.25
image: quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25@sha256:3bc98555
- version: 2.26
....
quickstart:
- version: 1.0
image: ....
This makes the version available for the code without the need of parsing the key. I'm always a fan of having stable keys without additional semantic information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really an overrides
or is a configmap always used to define the version ? Where does the original version come from ?
When it always picked up from the ConfigMap, then it's not really an override. Ideally I would keep that information in a single place, also for the default case.
3. When deploying a LlamaStackDistribution, the operator checks for overrides matching the current RHOAI version | ||
4. If an override exists, it uses the specified image instead of the default distribution image | ||
5. Changes to the ConfigMap automatically trigger reconciliation of all LlamaStackDistribution resources | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention here also, from where the vesion is picked up when the ConfigMap is not present.
kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}' | ||
``` | ||
|
||
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version. | |
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another newer minor version (e.g. 2.26) then the distribution will revert back to the default for that version as there is no entry yet in the ConfigMap. |
|
||
```bash | ||
kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}' | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confessedly harder, when the version is stored in a map's value, so maybe my former comment on putting the version into a value, not the key, might make these kind of updates harder.
Name of the referent. | ||
This field is effectively required, but due to backwards compatibility is | ||
allowed to be empty. Instances of this type with an empty value here are | ||
almost certainly wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, the name of the referenced ConfigMap ? Maybe we can add this here instead of just Name of referent.
(and if so, shouldn't it be reference
as the referent is the operator itself ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that the descriptions here are picked up from somewhere else, also not necessarily coming from the operator code. How have those been added ?
default: "" | ||
description: |- | ||
Name of the referent. | ||
This field is effectively required, but due to backwards compatibility is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Just out of curiosity, why was this name ever allowed to be empty ?
Request is the name chosen for a request in the referenced claim. | ||
If empty, everything from the claim is made available, otherwise | ||
only the result of this request. | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is not clear to me. What is "the result of this request" ?
- apiGroups: | ||
- operators.coreos.com | ||
resources: | ||
- clusterserviceversions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a tricky permission as this is a cluster-wide permissions and a platform engineer or "LLS admin" might not have this right and so this might be an unwanted elevation of permissions.
From the description of the PR I understand that this is needed to get the version of RHODS. I'm not sure, why we do need this and it's not really appropriate for an upstream operator to have any OpenShift references (including references to RHODS).
The operator should be also independent of OpenDataHub, and be able to be installed standalone on Kubernetes.
I wouldn't add this dependency here to this OpenShift specific resource (well, its operator specific but let's face it, this is more or less only used by OpenShift in the wild)
// Feature flags | ||
EnableNetworkPolicy bool | ||
// RHODS version | ||
RHODSVersion string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said, we can't used a reference to RHODS in the upstream operator. We could patch it in midstream, but maybe we can abstract this away to make it more general ? I'm not yet clear why we need this version.
} | ||
|
||
// Update feature flags | ||
EnableNetworkPolicy, err := parseFeatureFlags(configMap.Data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is network policy enabled/disable via a feature flag (which is used for enabling/disabling experimental features which become eventually permanent features (or dropped)).
For me it looks like the usage of NetworkPolicies is an admin decision so that it shoudl be part of the regular configuration. A feature flag could handle if this configuration has any effect, but the overall switch network policy yes/no should be part of LLSD.
Lets deal with this first as the rest of the PR might change based on this, So the objective of this PR is to allow the operator to set a new image for a specific major/minor version. The code needs to know the RHODS version in order to know if a config map for If ^ is correct then perhaps at least part of this PR should be mid/downstream ? |
I wonder why we need to add the version number at all to the key. Couldn't the operator just unconditionally pick up the image under the key |
Because we've asserted "The major and minor versions of the LLS distribution and the LLS operator must match, but the z-stream version can differ.", if the minor version is upgrade (either manually or automatically) then the operator needs to know to stop using the image and instead switch over to the default for the new version. If what we want is to unconditionally use a specific image, we already have |
Also
Fix for deprecated Endpoints Generated by cursor but output tested here here , search for the string "Found 1 pods in namespace"