feat(controller): ConfigMap image mapping override #169

derekhiggins · 2025-10-01T10:36:32Z

feat: ConfigMap image mapping overrides for LLS Distro

Implements a mechanism for the Llama Stack Operator to read and apply
LLS Distribution image updates from a ConfigMap, enabling independent patching
for security fixes or bug fixes without requiring a new LLS Operator or RHOAI version.

- Add RHODS version detection from ClusterServiceVersion
- Add ImageMappingOverrides field to LlamaStackDistributionReconciler
- Implement parseImageMappingOverrides() to read image-overrides from ConfigMap
- Add RBAC permissions for operators.coreos.com/clusterserviceversions
- Support symbolic name mapping (e.g., rhdev-2.25) to specific images
- Included unit tests

The operator now reads image overrides from the 'image-overrides' key in the
operator ConfigMap, supporting YAML format with version-to-image mappings.
Overrides take precedence over default distribution images and are refreshed
on each reconciler initialization.

Closes: RHAIENG-1079
Signed-off-by: Derek Higgins <[email protected]>

Fix operator config ConfigMap watching

o Add operator config detection in configMapUpdatePredicate
o Update feature flags when operator config changes
o Trigger reconciliation of all LlamaStackDistributions on operator config change

Also

 fix: Replace deprecated corev1.Endpoints with discoveryv1.EndpointSlice

Fix for deprecated Endpoints Generated by cursor but output tested here here , search for the string "Found 1 pods in namespace"

o Add operator config detection in configMapUpdatePredicate o Update feature flags when operator config changes o Trigger reconciliation of all LlamaStackDistributions on operator config change Signed-off-by: Derek Higgins <[email protected]>

Replace the deprecated corev1.Endpoints API with discoveryv1.EndpointSlice in test utility functions to fix linting warnings. Changes: - Add discoveryv1 import to tests/e2e/test_utils.go - Replace corev1.Endpoints with discoveryv1.EndpointSlice in logServiceEndpoints() - Refactor logServiceEndpoints() into smaller helper functions to reduce complexity: - getEndpointSlices() - retrieves EndpointSlices for a service - logEndpointSliceDetails() - logs details of a single EndpointSlice - logEndpoints() - logs endpoint addresses and returns ready/not-ready counts - logPorts() - logs ports from an EndpointSlice This resolves the staticcheck warning: SA1019: corev1.Endpoints is deprecated: This API is deprecated in v1.33+ Signed-off-by: Derek Higgins <[email protected]>

Implements a mechanism for the Llama Stack Operator to read and apply LLS Distribution image updates from a ConfigMap, enabling independent patching for security fixes or bug fixes without requiring a new LLS Operator or RHOAI version. - Add RHODS version detection from ClusterServiceVersion - Add ImageMappingOverrides field to LlamaStackDistributionReconciler - Implement parseImageMappingOverrides() to read image-overrides from ConfigMap - Add RBAC permissions for operators.coreos.com/clusterserviceversions - Support symbolic name mapping (e.g., rhdev-2.25) to specific images - Included unit tests The operator now reads image overrides from the 'image-overrides' key in the operator ConfigMap, supporting YAML format with version-to-image mappings. Overrides take precedence over default distribution images and are refreshed on each reconciler initialization. Closes: RHAIENG-1079 Signed-off-by: Derek Higgins <[email protected]>

rhuss

Thanks for the PR, great stuff :). Here's my first round of review.

My main points are:

We must not reference RHODS in this upstream operator. Why and where do you need the RHODS version ? (i can't really see why this is needed). This is really important.
Probably not related to this PR, nevertheless: The enabling of network policies should not be a 'feature flag' (which is about enabling/disabling feature that has its own configuration), but a 'configuration option' (you can combine both, feature flag and configuration option but you should not 'misuse' a feature flag as configuration of a permanent feature)

rhuss · 2025-10-08T08:48:51Z

README.md

+
+### Symbolic Name Format
+
+Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.


Suggested change

Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.

Use the format `rh-dev-<major>.<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.

An alternative could be to allow a nested approach:

rh-dev: - version: 2.25 image: quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25@sha256:3bc98555 - version: 2.26 .... quickstart: - version: 1.0 image: ....

This makes the version available for the code without the need of parsing the key. I'm always a fan of having stable keys without additional semantic information.

Is it really an overrides or is a configmap always used to define the version ? Where does the original version come from ?

When it always picked up from the ConfigMap, then it's not really an override. Ideally I would keep that information in a single place, also for the default case.

rhuss · 2025-10-08T08:54:22Z

README.md

+3. When deploying a LlamaStackDistribution, the operator checks for overrides matching the current RHOAI version
+4. If an override exists, it uses the specified image instead of the default distribution image
+5. Changes to the ConfigMap automatically trigger reconciliation of all LlamaStackDistribution resources
+


Maybe mention here also, from where the vesion is picked up when the ConfigMap is not present.

rhuss · 2025-10-08T08:56:42Z

README.md

+kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}'
+```
+
+This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version.


Suggested change

This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version.

This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another newer minor version (e.g. 2.26) then the distribution will revert back to the default for that version as there is no entry yet in the ConfigMap.

rhuss · 2025-10-08T08:57:47Z

README.md

+
+```bash
+kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}'
+```


this is confessedly harder, when the version is stored in a map's value, so maybe my former comment on putting the version into a value, not the key, might make these kind of updates harder.

rhuss · 2025-10-08T09:00:46Z

config/crd/bases/llamastack.io_llamastackdistributions.yaml

                                        Name of the referent.
+                                        This field is effectively required, but due to backwards compatibility is
+                                        allowed to be empty. Instances of this type with an empty value here are
+                                        almost certainly wrong.


You mean, the name of the referenced ConfigMap ? Maybe we can add this here instead of just Name of referent. (and if so, shouldn't it be reference as the referent is the operator itself ?)

It looks like that the descriptions here are picked up from somewhere else, also not necessarily coming from the operator code. How have those been added ?

rhuss · 2025-10-08T09:02:07Z

config/crd/bases/llamastack.io_llamastackdistributions.yaml

+                                      default: ""
                                      description: |-
                                        Name of the referent.
+                                        This field is effectively required, but due to backwards compatibility is


Same as above. Just out of curiosity, why was this name ever allowed to be empty ?

rhuss · 2025-10-08T09:03:47Z

config/crd/bases/llamastack.io_llamastackdistributions.yaml

+                                    Request is the name chosen for a request in the referenced claim.
+                                    If empty, everything from the claim is made available, otherwise
+                                    only the result of this request.
+                                  type: string


This description is not clear to me. What is "the result of this request" ?

rhuss · 2025-10-08T09:10:36Z

config/rbac/role.yaml

+- apiGroups:
+  - operators.coreos.com
+  resources:
+  - clusterserviceversions


this is a tricky permission as this is a cluster-wide permissions and a platform engineer or "LLS admin" might not have this right and so this might be an unwanted elevation of permissions.

From the description of the PR I understand that this is needed to get the version of RHODS. I'm not sure, why we do need this and it's not really appropriate for an upstream operator to have any OpenShift references (including references to RHODS).

The operator should be also independent of OpenDataHub, and be able to be installed standalone on Kubernetes.

I wouldn't add this dependency here to this OpenShift specific resource (well, its operator specific but let's face it, this is more or less only used by OpenShift in the wild)

rhuss · 2025-10-08T09:11:37Z

controllers/llamastackdistribution_controller.go

 	// Feature flags
 	EnableNetworkPolicy bool
+	// RHODS version
+	RHODSVersion string


As said, we can't used a reference to RHODS in the upstream operator. We could patch it in midstream, but maybe we can abstract this away to make it more general ? I'm not yet clear why we need this version.

rhuss · 2025-10-08T09:15:36Z

controllers/llamastackdistribution_controller.go

+	}
+
+	// Update feature flags
+	EnableNetworkPolicy, err := parseFeatureFlags(configMap.Data)


Why is network policy enabled/disable via a feature flag (which is used for enabling/disabling experimental features which become eventually permanent features (or dropped)).

For me it looks like the usage of NetworkPolicies is an admin decision so that it shoudl be part of the regular configuration. A feature flag could handle if this configuration has any effect, but the overall switch network policy yes/no should be part of LLSD.

derekhiggins · 2025-10-09T11:31:41Z

We must not reference RHODS in this upstream operator. Why and where do you need the RHODS version ? (i can't really see why this is needed). This is really important.

Lets deal with this first as the rest of the PR might change based on this, So the objective of this PR is to allow the operator to set a new image for a specific major/minor version. The code needs to know the RHODS version in order to know if a config map for rh-dev-2.25 is relevant i.e. to stop using it when we move onto rh-dev-2.26.

If ^ is correct then perhaps at least part of this PR should be mid/downstream ?

rhuss · 2025-10-09T11:40:06Z

I wonder why we need to add the version number at all to the key. Couldn't the operator just unconditionally pick up the image under the key rh-dev for the default image and the image just is either a 2.25 or 2.26 version ? Does the operator really need to know about this version or just deploy the configured image ?

derekhiggins · 2025-10-09T11:59:20Z

I wonder why we need to add the version number at all to the key. Couldn't the operator just unconditionally pick up the image under the key rh-dev for the default image and the image just is either a 2.25 or 2.26 version ? Does the operator really need to know about this version or just deploy the configured image ?

Because we've asserted "The major and minor versions of the LLS distribution and the LLS operator must match, but the z-stream version can differ.", if the minor version is upgrade (either manually or automatically) then the operator needs to know to stop using the image and instead switch over to the default for the new version.

If what we want is to unconditionally use a specific image, we already have distribution.image for that in the llsd resource

derekhiggins requested review from VaishnaviHire, cdoern, leseb, mfleader, nathan-weinberg, rhdedgar and rhuss as code owners October 1, 2025 10:36

Fix operator config ConfigMap watching

c1621ec

o Add operator config detection in configMapUpdatePredicate o Update feature flags when operator config changes o Trigger reconciliation of all LlamaStackDistributions on operator config change Signed-off-by: Derek Higgins <[email protected]>

derekhiggins force-pushed the image-mapping branch 2 times, most recently from 69e2550 to b644848 Compare October 1, 2025 11:40

derekhiggins added 2 commits October 1, 2025 12:59

derekhiggins force-pushed the image-mapping branch from b644848 to 677ecac Compare October 1, 2025 12:00

rhuss requested changes Oct 8, 2025

View reviewed changes


		### Symbolic Name Format

		Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.

	Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.
	Use the format `rh-dev-<major>.<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.

	This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version.
	This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another newer minor version (e.g. 2.26) then the distribution will revert back to the default for that version as there is no entry yet in the ConfigMap.

feat(controller): ConfigMap image mapping override #169

Are you sure you want to change the base?

feat(controller): ConfigMap image mapping override #169

Uh oh!

Conversation

derekhiggins commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhuss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekhiggins commented Oct 9, 2025

Uh oh!

rhuss commented Oct 9, 2025

Uh oh!

derekhiggins commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derekhiggins commented Oct 1, 2025 •

edited

Loading