Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorino CR reconcile moved to state of the world reconciler. #865

Merged
merged 10 commits into from
Oct 4, 2024

Conversation

Boomatang
Copy link
Contributor

closes: #812

Move the reconcile of the Authorino CR to the state of the world reconcile. There should be no functional change.

@Boomatang Boomatang force-pushed the state-of-the-world/authorino-cr branch from be1d4cc to 6bfbbf3 Compare September 20, 2024 11:37
},
}
return reconciler.Run
}

type AuthorinoCrReconciler struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should start moving these type to their separate files. Otherwise very soon this one will become too long and hard to navigate. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I was thinking that also. Currently I am unsure of how we want to structure the packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding "XX_task.go" files to controllers golang package. Happy to hear from other options.

Comment on lines 103 to 139
kobj := &kuadrantv1beta1.Kuadrant{}
for _, event := range events {
if event.Kind == kuadrantv1beta1.KuadrantKind && event.EventType == controller.CreateEvent {
kobjs := lo.FilterMap(topology.Objects().Roots(), func(item machinery.Object, _ int) (*kuadrantv1beta1.Kuadrant, bool) {
if item.GetName() == event.NewObject.GetName() && item.GetNamespace() == event.NewObject.GetNamespace() && item.GroupVersionKind().Kind == event.NewObject.GetObjectKind().GroupVersionKind().Kind {
return item.(*kuadrantv1beta1.Kuadrant), true
}
return nil, false
})
if len(kobjs) != 1 {
logger.Error(fmt.Errorf("muiltply kuadrant CRs found"), "unexpected behaviour may happen")
}
kobj = kobjs[0]
break
} else if event.Kind == kuadrantv1beta1.AuthorinoKind && event.EventType == controller.DeleteEvent {
kobjs := lo.FilterMap(topology.Objects().Roots(), func(item machinery.Object, _ int) (*kuadrantv1beta1.Kuadrant, bool) {
if item.GetNamespace() == event.NewObject.GetNamespace() && item.GroupVersionKind().Kind == kuadrantv1beta1.KuadrantKind.Kind {
return item.(*kuadrantv1beta1.Kuadrant), true
}
return nil, false
})

if len(kobjs) == 0 {
logger.Info("no possible kuadrant parent, wont create Authorino CR.")
return
}
if len(kobjs) != 1 {
logger.Error(fmt.Errorf("muiltply kuadrant CRs found"), "unexpected behaviour may happen")
}
kobj = kobjs[0]
if kobj.GetDeletionTimestamp() != nil {
logger.Info("kuadrant CR marked for deletion, wont create Authorino CR.")
return
}
break
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had pictured this a bit different in my head:

  • Find a Kuadrant object that is the root of the topology
  • If there is none,
    • delete all Authorino objects that exist in the topology
    • return
  • Otherwise
    • build the desired Authorino object for the Kuadrant root
    • create it if it doesn't exist as a child of the Kuadrant root
    • update it otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think want is being done is mostly what you listed above. The things that are not being done are:

  • deleting the Authorino CRs when the there is not kuadrant root. The cluster is handling this by the owner reference
  • update if otherwise. The current implantation also does not do this. I don't think it would be right to introduction that change now.

Copy link
Contributor

@guicassolato guicassolato Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting the Authorino CRs when the there is not kuadrant root. The cluster is handling this by the owner reference

The cluster will only delete the Authorino CR if it is indeed owned by the Kuadrant one. So I guess what we are discussing is whether other instances of Authorino can co-exist in the same cluster. If so, then 2 questions open up:

  1. Any instance of Authroino that exists in the same namespace as a Kuadrant CR will be linked to the Kuadrant CR by the linking function. Is that what we want?
  2. What does it even mean having a second instance of Authorino running in the cluster while the one managed by Kuadrant is cluster-wide and without label selectors specified?

In general, I guess my thinking had to do with not seeing a need for checking the event type. The root of the topology must be a Kuadrant object or there should be no Authorino CR.

if errors.IsAlreadyExists(err) {
logger.Info("already created authorino Cr")
} else {
logger.Error(err, "failed to create authorino Cr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one. I wonder if the Kuadrant CR should reflect in its status that no corresponding Authorino CR exists and then use this update to the status to retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently the status controller will set the status in the kuadrant CR if the authorino CR is missing or if authorino is not in a ready state. That should be creating the retry events. Not sure if with the status controller outside the state of the world workflow if that can cause a timing issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I missed this reconciliation task that updates the status of the Kuadrant CR represented in the workflow then.

To cover for that, I think Authorino and Limitador reconciliation could be wrapped as two concurrent tasks of a "Data plane components reconciliation" workflow, with a postcondition taking care of the status update on the Kuadrant CR.

The task introduced by this PR then, focused on the Authorino CR only, may need to inject some hint of the status of Authorino (using the sync.Map) for the status updater to pick up – e.g., when it just created the CR or in case of failures like such as in this line.

@Boomatang Boomatang force-pushed the state-of-the-world/authorino-cr branch 2 times, most recently from db03646 to df27d5a Compare September 20, 2024 17:42
controllers/state_of_the_world.go Outdated Show resolved Hide resolved
Comment on lines 154 to 155
Name: "authorino",
Namespace: kobj.Namespace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the selection of the Kuadrant object above correctly, if a second Kuadrant CR is created (in another namespace), another Authorino instance will be created for it too. Is that what we want? I think the only way to support this is by also supporting sharded instances (#274).

If a second Kuadrant CR is created in the same namespace where another exists, then we know we won't create another instance of Authorino in that same namespace (with static name authorino), right? So why even continue if don't reconcile existing Authorino CRs? And if we do decide to reconcile those, as soon as it starts having any field set directly from the Kuadrant CR, then the current code could trigger an infinite loop of reconciliation events with 2+ Kuadrant CRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this matters any more as it is only the first kuadrant resource that we accept as the root.

controllers/state_of_the_world.go Outdated Show resolved Hide resolved
Comment on lines 126 to 128
if item.GetNamespace() == event.OldObject.GetNamespace() && item.GroupVersionKind().Kind == kuadrantv1beta1.KuadrantKind.Kind {
return item.(*kuadrantv1beta1.Kuadrant), true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be obvious why this has to be the Kuadrant CR previously linked to the deleted Authorino object. We know this is true because in line 155, the Authorino object is built to exist in the same namespace as kobj.Namespace. If we continue with this approach for selecting the associated Kuadrant root, maybe add comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is still valid.

To: AuthorinoKind,
Func: func(child machinery.Object) []machinery.Object {
return lo.Filter(kuadrants, func(kuadrant machinery.Object, _ int) bool {
return kuadrant.GetNamespace() == child.GetNamespace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The association is made by name, not namespace. If it is by namespace, all authorinos in kuadrant-system will be linked to the root kuadrant CR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. If we name the Authorino CR after the Kuadrant CR, then we can have just that instance linked and any other one not linked and assumed to have been created by the user manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I don't think we can rename the Authorino CR after the Kuadrant CR. I have tried to do this and it fails because of the status reconciler. Once the the status reconciler is move to the state of the world controller we can do a dynamic name based on the Kuadrant CR.

For now I can narrow this down to the hard coded name for the Authorino CR.

},
}
return reconciler.Run
}

type AuthorinoCrReconciler struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding "XX_task.go" files to controllers golang package. Happy to hear from other options.

controllers/kuadrant_controller.go Show resolved Hide resolved
ReconcileFunc: r.Reconcile,
Events: []controller.ResourceEventMatcher{
{Kind: ptr.To(kuadrantv1beta1.KuadrantKind), EventType: ptr.To(controller.CreateEvent)},
{Kind: ptr.To(kuadrantv1beta1.AuthorinoKind), EventType: ptr.To(controller.DeleteEvent)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will subscribe to events on any authorino in the cluster. Can we filter it to "owned authorino" instead?

Copy link
Contributor

@guicassolato guicassolato Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing more specific than this the subscriber allows us to go is by adding name of the resource. I'm afraid the only way for us to know that is if it's statically named. If it's dynamically named after the Kuadrant CR (as I suggested), then it won't work.

An alternative is labeling the Authorino resources we create and adding the label selector to the watcher. This would the Kuadrant Operator completely ignore any other instance manually created by users, not even adding them to the topology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to be more filtered on what events we get in general. I like the idea of using the labels to do this filtering and I am going to suggest that we create a common one called kuadrant.io/resource as this is not the only resource we should be filtering on. The same steps should be taken for the Limitador CR, as the step is mostly the very same, #871.

However I have come across a possible issue with the approach and you can create the issue today with the topology configmap. When you filter by labels and a user removes said labels, you do not get that event. This causes a number of issues. Operator restarts will try recreate the resource but fail, which is normally okay as you would also get the create event for that resource if it was labeled.

There is a problem here, and I am not to sure how much of an edge case it is, or even how easy it maybe to solve. I have one idea but it is not compatible with the controller-runtime as far as I am aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to ask if adding the filtering by label or by field selector (needs research) can be added in a follow up PR. I am conscious of time and don't want to see this getting held up over bike sheding. For now we would get some more events than we really want.

return
}

aobjs := lo.FilterMap(topology.Objects().Objects().Items(), func(item machinery.Object, _ int) (machinery.Object, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the linking function to say, "I want kuadrant CR childs". One of them should be authorino CR

ObjectMeta: metav1.ObjectMeta{
Name: "authorino",
Namespace: kobj.Namespace,
OwnerReferences: []metav1.OwnerReference{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method to add ownerrefs. Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I looked at that and the ownerrefs method I found at least expects a schema to also be passed in, which we do have access to at this stage. So I went with manually adding the reference.

@Boomatang Boomatang self-assigned this Sep 24, 2024
if len(kobjs) > 1 {
logger.Error(fmt.Errorf("multiple Kuadrant resources found"), "cannot select root Kuadrant resource", "status", "error")
}
kobj := kobjs[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may think about the order these objects are fetched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense are you say we should think about the order the resources are fetched. Is this how I get the the kuadrant CR's before the Authorino CR's or related to the kobjs[0] may not be the oldest on cluster and therefore the wrong one? The age of the CR is something I had though about so possible should fix.

@@ -164,3 +180,30 @@ func (e *EventLogger) Log(ctx context.Context, resourceEvents []controller.Resou

return nil
}

// GetOldestKuadrant returns the oldest kuadrant resource from a list of kuadrant resources that is not marked for deletion.
func GetOldestKuadrant(kuadrants []*kuadrantv1beta1.Kuadrant) (*kuadrantv1beta1.Kuadrant, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably pass the topology here. This is a pattern that likely will repeat a lot across tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, but I am not going to do it in this PR. There was a topic I wish to discuss about unit testing where the input is a topology. I have a place holder issue, Kuadrant/policy-machinery#33. But for now I would like to leave this as is with its unit tests until there is a good way to generate the a topology for a unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a TODO to refactor this ASAP. I do not want to risk people passing different lists of Kuadrant objects to this function. I want to make sure it is going to be always the same Kuadrant object that will be returned to all tasks that depend on it to navigate the topology from the common root.

return err
}

aobjs := lo.FilterMap(topology.Objects().Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: redundant consecutive .Objects() call? 🤔

Suggested change
aobjs := lo.FilterMap(topology.Objects().Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) {
aobjs := lo.FilterMap(topology.Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) {

@@ -26,7 +28,7 @@ import (

var (
ConfigMapGroupKind = schema.GroupKind{Group: corev1.GroupName, Kind: "ConfigMap"}
operatorNamespace = env.GetString("OPERATOR_NAMESPACE", "")
operatorNamespace = env.GetString("OPERATOR_NAMESPACE", "kuadrant-system")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make the panic at the following redundant, do we want to remove the panic now?

if namespace == "" {
panic("namespace must be specified and can not be a blank string")
}

Or we can remove setting a default ns env value and set it across all the makefile integration test commands that expect this env value to be set if we want to keep the panic?

Copy link
Contributor

@KevFan KevFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but this looks good to me 👍

Signed-off-by: Jim Fitzpatrick <[email protected]>

# Conflicts:
#	api/v1beta1/topology.go
#	controllers/state_of_the_world.go
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
- Change logic on finding initial root kuadrant
- Improve logging statements

Signed-off-by: Jim Fitzpatrick <[email protected]>
Kuadrant only expects one kuadrant CR on the cluster. Recent changes enforces the creation of only one authorino. This enforcement was causing the tests to fail as it was creating a second kuadrant CR that would never go to a ready state.

Signed-off-by: Jim Fitzpatrick <[email protected]>
Get the oldest kuadrant CR and use that as the only CR.

Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
@Boomatang Boomatang force-pushed the state-of-the-world/authorino-cr branch from e8899cc to 89fb936 Compare October 3, 2024 15:57
@Boomatang Boomatang force-pushed the state-of-the-world/authorino-cr branch from 89fb936 to 6aa3c68 Compare October 3, 2024 16:10
@Boomatang Boomatang merged commit b3290b5 into Kuadrant:main Oct 4, 2024
24 checks passed
@Boomatang Boomatang deleted the state-of-the-world/authorino-cr branch October 4, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[state-of-the-world reconciler] Authorino deployment
4 participants