-
Notifications
You must be signed in to change notification settings - Fork 41
feat(controller): ConfigMap image mapping override #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -104,6 +104,47 @@ Example to create a run.yaml ConfigMap, and a LlamaStackDistribution that refere | |||||
kubectl apply -f config/samples/example-with-configmap.yaml | ||||||
``` | ||||||
|
||||||
## Image Mapping Overrides | ||||||
|
||||||
The operator supports ConfigMap-driven image updates for LLS Distribution images. This allows independent patching for security fixes or bug fixes without requiring a new operator version. | ||||||
|
||||||
### Configuration | ||||||
|
||||||
Create or update the operator ConfigMap with an `image-overrides` key: | ||||||
|
||||||
```yaml | ||||||
apiVersion: v1 | ||||||
kind: ConfigMap | ||||||
metadata: | ||||||
name: llama-stack-operator-config | ||||||
namespace: llama-stack-k8s-operator-system | ||||||
data: | ||||||
image-overrides: | | ||||||
rh-dev-2.25: quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25@sha256:3bc98555 | ||||||
``` | ||||||
|
||||||
### Symbolic Name Format | ||||||
|
||||||
Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override. | ||||||
|
||||||
### How It Works | ||||||
|
||||||
1. The operator reads image overrides from the `image-overrides` key in the operator ConfigMap | ||||||
2. Overrides are parsed as YAML with version-to-image mappings | ||||||
3. When deploying a LlamaStackDistribution, the operator checks for overrides matching the current RHOAI version | ||||||
4. If an override exists, it uses the specified image instead of the default distribution image | ||||||
5. Changes to the ConfigMap automatically trigger reconciliation of all LlamaStackDistribution resources | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe mention here also, from where the vesion is picked up when the ConfigMap is not present. |
||||||
### Example Usage | ||||||
|
||||||
To update the LLS Distribution image for RHOAI 2.25: | ||||||
|
||||||
```bash | ||||||
kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}' | ||||||
``` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is confessedly harder, when the version is stored in a map's value, so maybe my former comment on putting the version into a value, not the key, might make these kind of updates harder. |
||||||
|
||||||
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## Developer Guide | ||||||
|
||||||
### Prerequisites | ||||||
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,14 @@ rules: | |
- patch | ||
- update | ||
- watch | ||
- apiGroups: | ||
- operators.coreos.com | ||
resources: | ||
- clusterserviceversions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a tricky permission as this is a cluster-wide permissions and a platform engineer or "LLS admin" might not have this right and so this might be an unwanted elevation of permissions. From the description of the PR I understand that this is needed to get the version of RHODS. I'm not sure, why we do need this and it's not really appropriate for an upstream operator to have any OpenShift references (including references to RHODS). The operator should be also independent of OpenDataHub, and be able to be installed standalone on Kubernetes. I wouldn't add this dependency here to this OpenShift specific resource (well, its operator specific but let's face it, this is more or less only used by OpenShift in the wild) |
||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- apiGroups: | ||
- rbac.authorization.k8s.io | ||
resources: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ import ( | |
"github.com/llamastack/llama-stack-k8s-operator/pkg/cluster" | ||
"github.com/llamastack/llama-stack-k8s-operator/pkg/deploy" | ||
"github.com/llamastack/llama-stack-k8s-operator/pkg/featureflags" | ||
olmv1alpha1 "github.com/operator-framework/api/pkg/operators/v1alpha1" | ||
"gopkg.in/yaml.v3" | ||
appsv1 "k8s.io/api/apps/v1" | ||
corev1 "k8s.io/api/core/v1" | ||
|
@@ -80,11 +81,19 @@ const ( | |
// When a ConfigMap's data changes, it automatically triggers reconciliation of the referencing | ||
// LlamaStackDistribution, which recalculates a content-based hash and updates the deployment's | ||
// pod template annotations. This causes Kubernetes to restart the pods with the updated configuration. | ||
// | ||
// Operator ConfigMap Watching Feature: | ||
// This reconciler also watches for changes to the operator configuration ConfigMap. When the operator | ||
// config changes, it triggers reconciliation of all LlamaStackDistribution resources. | ||
type LlamaStackDistributionReconciler struct { | ||
client.Client | ||
Scheme *runtime.Scheme | ||
// Feature flags | ||
EnableNetworkPolicy bool | ||
// RHODS version | ||
RHODSVersion string | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As said, we can't used a reference to RHODS in the upstream operator. We could patch it in midstream, but maybe we can abstract this away to make it more general ? I'm not yet clear why we need this version. |
||
// Image mapping overrides | ||
ImageMappingOverrides map[string]string | ||
// Cluster info | ||
ClusterInfo *cluster.ClusterInfo | ||
httpClient *http.Client | ||
|
@@ -458,6 +467,40 @@ func (r *LlamaStackDistributionReconciler) configMapUpdatePredicate(e event.Upda | |
return false | ||
} | ||
|
||
// Check if this is the operator config ConfigMap | ||
if r.handleOperatorConfigUpdate(newConfigMap) { | ||
return true | ||
} | ||
|
||
// Handle referenced ConfigMap updates | ||
return r.handleReferencedConfigMapUpdate(oldConfigMap, newConfigMap) | ||
} | ||
|
||
// handleOperatorConfigUpdate processes updates to the operator config ConfigMap. | ||
func (r *LlamaStackDistributionReconciler) handleOperatorConfigUpdate(configMap *corev1.ConfigMap) bool { | ||
operatorNamespace, err := deploy.GetOperatorNamespace() | ||
if err != nil { | ||
return false | ||
} | ||
|
||
if configMap.Name != operatorConfigData || configMap.Namespace != operatorNamespace { | ||
return false | ||
} | ||
|
||
// Update feature flags | ||
EnableNetworkPolicy, err := parseFeatureFlags(configMap.Data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is network policy enabled/disable via a feature flag (which is used for enabling/disabling experimental features which become eventually permanent features (or dropped)). For me it looks like the usage of NetworkPolicies is an admin decision so that it shoudl be part of the regular configuration. A feature flag could handle if this configuration has any effect, but the overall switch network policy yes/no should be part of LLSD. |
||
if err != nil { | ||
log.FromContext(context.Background()).Error(err, "Failed to parse feature flags") | ||
} else { | ||
r.EnableNetworkPolicy = EnableNetworkPolicy | ||
} | ||
|
||
r.ImageMappingOverrides = ParseImageMappingOverrides(configMap.Data) | ||
return true | ||
} | ||
|
||
// handleReferencedConfigMapUpdate processes updates to referenced ConfigMaps. | ||
func (r *LlamaStackDistributionReconciler) handleReferencedConfigMapUpdate(oldConfigMap, newConfigMap *corev1.ConfigMap) bool { | ||
// Only proceed if this ConfigMap is referenced by any LlamaStackDistribution | ||
if !r.isConfigMapReferenced(newConfigMap) { | ||
return false | ||
|
@@ -623,6 +666,27 @@ func (r *LlamaStackDistributionReconciler) manuallyCheckConfigMapReference(confi | |
|
||
// findLlamaStackDistributionsForConfigMap maps ConfigMap changes to LlamaStackDistribution reconcile requests. | ||
func (r *LlamaStackDistributionReconciler) findLlamaStackDistributionsForConfigMap(ctx context.Context, configMap client.Object) []reconcile.Request { | ||
logger := log.FromContext(ctx).WithValues( | ||
"configMapName", configMap.GetName(), | ||
"configMapNamespace", configMap.GetNamespace()) | ||
|
||
operatorNamespace, err := deploy.GetOperatorNamespace() | ||
if err != nil { | ||
logger.Error(err, "Failed to get operator namespace for config map event processing") | ||
return nil | ||
} | ||
// If the operator config was changed, we reconcile all LlamaStackDistributions | ||
if configMap.GetName() == operatorConfigData && configMap.GetNamespace() == operatorNamespace { | ||
// List all LlamaStackDistribution resources across all namespaces | ||
allLlamaStacks := llamav1alpha1.LlamaStackDistributionList{} | ||
err = r.List(ctx, &allLlamaStacks) | ||
if err != nil { | ||
logger.Error(err, "Failed to list all LlamaStackDistributions for operator config change") | ||
return nil | ||
} | ||
return r.convertToReconcileRequests(allLlamaStacks) | ||
} | ||
|
||
// Try field indexer lookup first | ||
attachedLlamaStacks, found := r.tryFieldIndexerLookup(ctx, configMap) | ||
if !found { | ||
|
@@ -1409,53 +1473,119 @@ func NewLlamaStackDistributionReconciler(ctx context.Context, client client.Clie | |
return nil, fmt.Errorf("failed to get operator namespace: %w", err) | ||
} | ||
|
||
// Get the ConfigMap | ||
// If the ConfigMap doesn't exist, create it with default feature flags | ||
// If the ConfigMap exists, parse the feature flags from the Configmap | ||
// Initialize operator config ConfigMap | ||
configMap, err := initializeOperatorConfigMap(ctx, client, operatorNamespace) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
// Detect RHODS version | ||
rhodsVersion := detectRHODSVersion(ctx, client, operatorNamespace) | ||
|
||
// Parse feature flags from ConfigMap | ||
enableNetworkPolicy, err := parseFeatureFlags(configMap.Data) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to parse feature flags: %w", err) | ||
} | ||
|
||
// Parse image mapping overrides from ConfigMap | ||
imageMappingOverrides := ParseImageMappingOverrides(configMap.Data) | ||
|
||
return &LlamaStackDistributionReconciler{ | ||
Client: client, | ||
Scheme: scheme, | ||
EnableNetworkPolicy: enableNetworkPolicy, | ||
RHODSVersion: rhodsVersion, | ||
ImageMappingOverrides: imageMappingOverrides, | ||
ClusterInfo: clusterInfo, | ||
httpClient: &http.Client{Timeout: 5 * time.Second}, | ||
}, nil | ||
} | ||
|
||
// initializeOperatorConfigMap gets or creates the operator config ConfigMap. | ||
func initializeOperatorConfigMap(ctx context.Context, c client.Client, operatorNamespace string) (*corev1.ConfigMap, error) { | ||
configMap := &corev1.ConfigMap{} | ||
configMapName := types.NamespacedName{ | ||
Name: operatorConfigData, | ||
Namespace: operatorNamespace, | ||
} | ||
|
||
if err = client.Get(ctx, configMapName, configMap); err != nil { | ||
if !k8serrors.IsNotFound(err) { | ||
return nil, fmt.Errorf("failed to get ConfigMap: %w", err) | ||
err := c.Get(ctx, configMapName, configMap) | ||
if err == nil { | ||
return configMap, nil | ||
} | ||
|
||
if !k8serrors.IsNotFound(err) { | ||
return nil, fmt.Errorf("failed to get ConfigMap: %w", err) | ||
} | ||
|
||
// ConfigMap doesn't exist, create it with defaults | ||
configMap, err = createDefaultConfigMap(configMapName) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to generate default configMap: %w", err) | ||
} | ||
|
||
if err = c.Create(ctx, configMap); err != nil { | ||
return nil, fmt.Errorf("failed to create ConfigMap: %w", err) | ||
} | ||
|
||
return configMap, nil | ||
} | ||
|
||
// detectRHODSVersion detects the RHODS/RHOAI version from the ClusterServiceVersion. | ||
func detectRHODSVersion(ctx context.Context, c client.Client, operatorNamespace string) string { | ||
csvlist := &olmv1alpha1.ClusterServiceVersionList{} | ||
listOpts := []client.ListOption{ | ||
client.InNamespace(operatorNamespace), | ||
} | ||
err := c.List(ctx, csvlist, listOpts...) | ||
if err != nil { | ||
fmt.Printf("failed to list ClusterServiceVersions: %v\n", err) | ||
return "" | ||
} | ||
|
||
for _, csv := range csvlist.Items { | ||
if strings.HasPrefix(csv.Name, "rhods-operator") { | ||
return csv.Spec.Version.String() | ||
} | ||
} | ||
|
||
// ConfigMap doesn't exist, create it with defaults | ||
configMap, err = createDefaultConfigMap(configMapName) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to generate default configMap: %w", err) | ||
fmt.Printf("failed to find RHODS version from ClusterServiceVersion\n") | ||
return "" | ||
} | ||
|
||
func ParseImageMappingOverrides(configMapData map[string]string) map[string]string { | ||
imageMappingOverrides := make(map[string]string) | ||
|
||
// Look for the image-overrides key in the ConfigMap data | ||
if overridesYAML, exists := configMapData["image-overrides"]; exists { | ||
// Parse the YAML content | ||
var overrides map[string]string | ||
if err := yaml.Unmarshal([]byte(overridesYAML), &overrides); err != nil { | ||
// Log error but continue with empty overrides | ||
fmt.Printf("failed to parse image-overrides YAML: %v\n", err) | ||
return imageMappingOverrides | ||
} | ||
|
||
if err = client.Create(ctx, configMap); err != nil { | ||
return nil, fmt.Errorf("failed to create ConfigMap: %w", err) | ||
// Copy the parsed overrides to our result map | ||
for version, image := range overrides { | ||
imageMappingOverrides[version] = image | ||
} | ||
} | ||
|
||
// Parse feature flags from ConfigMap | ||
enableNetworkPolicy, err := parseFeatureFlags(configMap.Data) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to parse feature flags: %w", err) | ||
} | ||
return &LlamaStackDistributionReconciler{ | ||
Client: client, | ||
Scheme: scheme, | ||
EnableNetworkPolicy: enableNetworkPolicy, | ||
ClusterInfo: clusterInfo, | ||
httpClient: &http.Client{Timeout: 5 * time.Second}, | ||
}, nil | ||
return imageMappingOverrides | ||
} | ||
|
||
// NewTestReconciler creates a reconciler for testing, allowing injection of a custom http client and feature flags. | ||
func NewTestReconciler(client client.Client, scheme *runtime.Scheme, clusterInfo *cluster.ClusterInfo, | ||
httpClient *http.Client, enableNetworkPolicy bool) *LlamaStackDistributionReconciler { | ||
return &LlamaStackDistributionReconciler{ | ||
Client: client, | ||
Scheme: scheme, | ||
ClusterInfo: clusterInfo, | ||
httpClient: httpClient, | ||
EnableNetworkPolicy: enableNetworkPolicy, | ||
Client: client, | ||
Scheme: scheme, | ||
ClusterInfo: clusterInfo, | ||
httpClient: httpClient, | ||
EnableNetworkPolicy: enableNetworkPolicy, | ||
RHODSVersion: "", | ||
ImageMappingOverrides: make(map[string]string), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative could be to allow a nested approach:
This makes the version available for the code without the need of parsing the key. I'm always a fan of having stable keys without additional semantic information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really an
overrides
or is a configmap always used to define the version ? Where does the original version come from ?When it always picked up from the ConfigMap, then it's not really an override. Ideally I would keep that information in a single place, also for the default case.