Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,47 @@ Example to create a run.yaml ConfigMap, and a LlamaStackDistribution that refere
kubectl apply -f config/samples/example-with-configmap.yaml
```

## Image Mapping Overrides

The operator supports ConfigMap-driven image updates for LLS Distribution images. This allows independent patching for security fixes or bug fixes without requiring a new operator version.

### Configuration

Create or update the operator ConfigMap with an `image-overrides` key:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: llama-stack-operator-config
namespace: llama-stack-k8s-operator-system
data:
image-overrides: |
rh-dev-2.25: quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25@sha256:3bc98555
```

### Symbolic Name Format

Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use the format `rh-dev-<major>-<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.
Use the format `rh-dev-<major>.<minor>` for symbolic names that correspond to RHOAI versions. The operator will automatically detect the current RHOAI version and apply the appropriate override.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative could be to allow a nested approach:

rh-dev:
- version: 2.25
  image: quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25@sha256:3bc98555
- version: 2.26
  ....
quickstart:
- version: 1.0
  image: ....

This makes the version available for the code without the need of parsing the key. I'm always a fan of having stable keys without additional semantic information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really an overrides or is a configmap always used to define the version ? Where does the original version come from ?

When it always picked up from the ConfigMap, then it's not really an override. Ideally I would keep that information in a single place, also for the default case.


### How It Works

1. The operator reads image overrides from the `image-overrides` key in the operator ConfigMap
2. Overrides are parsed as YAML with version-to-image mappings
3. When deploying a LlamaStackDistribution, the operator checks for overrides matching the current RHOAI version
4. If an override exists, it uses the specified image instead of the default distribution image
5. Changes to the ConfigMap automatically trigger reconciliation of all LlamaStackDistribution resources

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention here also, from where the vesion is picked up when the ConfigMap is not present.

### Example Usage

To update the LLS Distribution image for RHOAI 2.25:

```bash
kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"rh-dev-2.25: quay.io/opendatahub/llama-stack:latest"}}'
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confessedly harder, when the version is stored in a map's value, so maybe my former comment on putting the version into a value, not the key, might make these kind of updates harder.


This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another version then the distribution will revert back to the default for that version.
This will cause all LlamaStackDistribution resources using the `rh-dev` name to restart with the new image while using RHOAI version 2.25.Z, once RHOAI is upgraded to another newer minor version (e.g. 2.26) then the distribution will revert back to the default for that version as there is no entry yet in the ConfigMap.


## Developer Guide

### Prerequisites
Expand Down
230 changes: 197 additions & 33 deletions config/crd/bases/llamastack.io_llamastackdistributions.yaml

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,14 @@ rules:
- patch
- update
- watch
- apiGroups:
- operators.coreos.com
resources:
- clusterserviceversions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a tricky permission as this is a cluster-wide permissions and a platform engineer or "LLS admin" might not have this right and so this might be an unwanted elevation of permissions.

From the description of the PR I understand that this is needed to get the version of RHODS. I'm not sure, why we do need this and it's not really appropriate for an upstream operator to have any OpenShift references (including references to RHODS).

The operator should be also independent of OpenDataHub, and be able to be installed standalone on Kubernetes.

I wouldn't add this dependency here to this OpenShift specific resource (well, its operator specific but let's face it, this is more or less only used by OpenShift in the wild)

verbs:
- get
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
Expand Down
3 changes: 3 additions & 0 deletions controllers/kubebuilder_rbac.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ package controllers

// NetworkPolicy permissions - controller creates and manages network policies
//+kubebuilder:rbac:groups=networking.k8s.io,resources=networkpolicies,verbs=get;list;watch;create;update;patch;delete

// ClusterServiceVersion permissions - controller reads CSV to get RHOAI version
//+kubebuilder:rbac:groups=operators.coreos.com,resources=clusterserviceversions,verbs=get;list;watch
188 changes: 159 additions & 29 deletions controllers/llamastackdistribution_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import (
"github.com/llamastack/llama-stack-k8s-operator/pkg/cluster"
"github.com/llamastack/llama-stack-k8s-operator/pkg/deploy"
"github.com/llamastack/llama-stack-k8s-operator/pkg/featureflags"
olmv1alpha1 "github.com/operator-framework/api/pkg/operators/v1alpha1"
"gopkg.in/yaml.v3"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
Expand Down Expand Up @@ -80,11 +81,19 @@ const (
// When a ConfigMap's data changes, it automatically triggers reconciliation of the referencing
// LlamaStackDistribution, which recalculates a content-based hash and updates the deployment's
// pod template annotations. This causes Kubernetes to restart the pods with the updated configuration.
//
// Operator ConfigMap Watching Feature:
// This reconciler also watches for changes to the operator configuration ConfigMap. When the operator
// config changes, it triggers reconciliation of all LlamaStackDistribution resources.
type LlamaStackDistributionReconciler struct {
client.Client
Scheme *runtime.Scheme
// Feature flags
EnableNetworkPolicy bool
// RHODS version
RHODSVersion string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, we can't used a reference to RHODS in the upstream operator. We could patch it in midstream, but maybe we can abstract this away to make it more general ? I'm not yet clear why we need this version.

// Image mapping overrides
ImageMappingOverrides map[string]string
// Cluster info
ClusterInfo *cluster.ClusterInfo
httpClient *http.Client
Expand Down Expand Up @@ -458,6 +467,40 @@ func (r *LlamaStackDistributionReconciler) configMapUpdatePredicate(e event.Upda
return false
}

// Check if this is the operator config ConfigMap
if r.handleOperatorConfigUpdate(newConfigMap) {
return true
}

// Handle referenced ConfigMap updates
return r.handleReferencedConfigMapUpdate(oldConfigMap, newConfigMap)
}

// handleOperatorConfigUpdate processes updates to the operator config ConfigMap.
func (r *LlamaStackDistributionReconciler) handleOperatorConfigUpdate(configMap *corev1.ConfigMap) bool {
operatorNamespace, err := deploy.GetOperatorNamespace()
if err != nil {
return false
}

if configMap.Name != operatorConfigData || configMap.Namespace != operatorNamespace {
return false
}

// Update feature flags
EnableNetworkPolicy, err := parseFeatureFlags(configMap.Data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is network policy enabled/disable via a feature flag (which is used for enabling/disabling experimental features which become eventually permanent features (or dropped)).

For me it looks like the usage of NetworkPolicies is an admin decision so that it shoudl be part of the regular configuration. A feature flag could handle if this configuration has any effect, but the overall switch network policy yes/no should be part of LLSD.

if err != nil {
log.FromContext(context.Background()).Error(err, "Failed to parse feature flags")
} else {
r.EnableNetworkPolicy = EnableNetworkPolicy
}

r.ImageMappingOverrides = ParseImageMappingOverrides(configMap.Data)
return true
}

// handleReferencedConfigMapUpdate processes updates to referenced ConfigMaps.
func (r *LlamaStackDistributionReconciler) handleReferencedConfigMapUpdate(oldConfigMap, newConfigMap *corev1.ConfigMap) bool {
// Only proceed if this ConfigMap is referenced by any LlamaStackDistribution
if !r.isConfigMapReferenced(newConfigMap) {
return false
Expand Down Expand Up @@ -623,6 +666,27 @@ func (r *LlamaStackDistributionReconciler) manuallyCheckConfigMapReference(confi

// findLlamaStackDistributionsForConfigMap maps ConfigMap changes to LlamaStackDistribution reconcile requests.
func (r *LlamaStackDistributionReconciler) findLlamaStackDistributionsForConfigMap(ctx context.Context, configMap client.Object) []reconcile.Request {
logger := log.FromContext(ctx).WithValues(
"configMapName", configMap.GetName(),
"configMapNamespace", configMap.GetNamespace())

operatorNamespace, err := deploy.GetOperatorNamespace()
if err != nil {
logger.Error(err, "Failed to get operator namespace for config map event processing")
return nil
}
// If the operator config was changed, we reconcile all LlamaStackDistributions
if configMap.GetName() == operatorConfigData && configMap.GetNamespace() == operatorNamespace {
// List all LlamaStackDistribution resources across all namespaces
allLlamaStacks := llamav1alpha1.LlamaStackDistributionList{}
err = r.List(ctx, &allLlamaStacks)
if err != nil {
logger.Error(err, "Failed to list all LlamaStackDistributions for operator config change")
return nil
}
return r.convertToReconcileRequests(allLlamaStacks)
}

// Try field indexer lookup first
attachedLlamaStacks, found := r.tryFieldIndexerLookup(ctx, configMap)
if !found {
Expand Down Expand Up @@ -1409,53 +1473,119 @@ func NewLlamaStackDistributionReconciler(ctx context.Context, client client.Clie
return nil, fmt.Errorf("failed to get operator namespace: %w", err)
}

// Get the ConfigMap
// If the ConfigMap doesn't exist, create it with default feature flags
// If the ConfigMap exists, parse the feature flags from the Configmap
// Initialize operator config ConfigMap
configMap, err := initializeOperatorConfigMap(ctx, client, operatorNamespace)
if err != nil {
return nil, err
}

// Detect RHODS version
rhodsVersion := detectRHODSVersion(ctx, client, operatorNamespace)

// Parse feature flags from ConfigMap
enableNetworkPolicy, err := parseFeatureFlags(configMap.Data)
if err != nil {
return nil, fmt.Errorf("failed to parse feature flags: %w", err)
}

// Parse image mapping overrides from ConfigMap
imageMappingOverrides := ParseImageMappingOverrides(configMap.Data)

return &LlamaStackDistributionReconciler{
Client: client,
Scheme: scheme,
EnableNetworkPolicy: enableNetworkPolicy,
RHODSVersion: rhodsVersion,
ImageMappingOverrides: imageMappingOverrides,
ClusterInfo: clusterInfo,
httpClient: &http.Client{Timeout: 5 * time.Second},
}, nil
}

// initializeOperatorConfigMap gets or creates the operator config ConfigMap.
func initializeOperatorConfigMap(ctx context.Context, c client.Client, operatorNamespace string) (*corev1.ConfigMap, error) {
configMap := &corev1.ConfigMap{}
configMapName := types.NamespacedName{
Name: operatorConfigData,
Namespace: operatorNamespace,
}

if err = client.Get(ctx, configMapName, configMap); err != nil {
if !k8serrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get ConfigMap: %w", err)
err := c.Get(ctx, configMapName, configMap)
if err == nil {
return configMap, nil
}

if !k8serrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get ConfigMap: %w", err)
}

// ConfigMap doesn't exist, create it with defaults
configMap, err = createDefaultConfigMap(configMapName)
if err != nil {
return nil, fmt.Errorf("failed to generate default configMap: %w", err)
}

if err = c.Create(ctx, configMap); err != nil {
return nil, fmt.Errorf("failed to create ConfigMap: %w", err)
}

return configMap, nil
}

// detectRHODSVersion detects the RHODS/RHOAI version from the ClusterServiceVersion.
func detectRHODSVersion(ctx context.Context, c client.Client, operatorNamespace string) string {
csvlist := &olmv1alpha1.ClusterServiceVersionList{}
listOpts := []client.ListOption{
client.InNamespace(operatorNamespace),
}
err := c.List(ctx, csvlist, listOpts...)
if err != nil {
fmt.Printf("failed to list ClusterServiceVersions: %v\n", err)
return ""
}

for _, csv := range csvlist.Items {
if strings.HasPrefix(csv.Name, "rhods-operator") {
return csv.Spec.Version.String()
}
}

// ConfigMap doesn't exist, create it with defaults
configMap, err = createDefaultConfigMap(configMapName)
if err != nil {
return nil, fmt.Errorf("failed to generate default configMap: %w", err)
fmt.Printf("failed to find RHODS version from ClusterServiceVersion\n")
return ""
}

func ParseImageMappingOverrides(configMapData map[string]string) map[string]string {
imageMappingOverrides := make(map[string]string)

// Look for the image-overrides key in the ConfigMap data
if overridesYAML, exists := configMapData["image-overrides"]; exists {
// Parse the YAML content
var overrides map[string]string
if err := yaml.Unmarshal([]byte(overridesYAML), &overrides); err != nil {
// Log error but continue with empty overrides
fmt.Printf("failed to parse image-overrides YAML: %v\n", err)
return imageMappingOverrides
}

if err = client.Create(ctx, configMap); err != nil {
return nil, fmt.Errorf("failed to create ConfigMap: %w", err)
// Copy the parsed overrides to our result map
for version, image := range overrides {
imageMappingOverrides[version] = image
}
}

// Parse feature flags from ConfigMap
enableNetworkPolicy, err := parseFeatureFlags(configMap.Data)
if err != nil {
return nil, fmt.Errorf("failed to parse feature flags: %w", err)
}
return &LlamaStackDistributionReconciler{
Client: client,
Scheme: scheme,
EnableNetworkPolicy: enableNetworkPolicy,
ClusterInfo: clusterInfo,
httpClient: &http.Client{Timeout: 5 * time.Second},
}, nil
return imageMappingOverrides
}

// NewTestReconciler creates a reconciler for testing, allowing injection of a custom http client and feature flags.
func NewTestReconciler(client client.Client, scheme *runtime.Scheme, clusterInfo *cluster.ClusterInfo,
httpClient *http.Client, enableNetworkPolicy bool) *LlamaStackDistributionReconciler {
return &LlamaStackDistributionReconciler{
Client: client,
Scheme: scheme,
ClusterInfo: clusterInfo,
httpClient: httpClient,
EnableNetworkPolicy: enableNetworkPolicy,
Client: client,
Scheme: scheme,
ClusterInfo: clusterInfo,
httpClient: httpClient,
EnableNetworkPolicy: enableNetworkPolicy,
RHODSVersion: "",
ImageMappingOverrides: make(map[string]string),
}
}
Loading