-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authorino CR reconcile moved to state of the world reconciler. #865
Authorino CR reconcile moved to state of the world reconciler. #865
Conversation
be1d4cc
to
6bfbbf3
Compare
controllers/state_of_the_world.go
Outdated
}, | ||
} | ||
return reconciler.Run | ||
} | ||
|
||
type AuthorinoCrReconciler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should start moving these type to their separate files. Otherwise very soon this one will become too long and hard to navigate. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I was thinking that also. Currently I am unsure of how we want to structure the packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding "XX_task.go" files to controllers
golang package. Happy to hear from other options.
controllers/state_of_the_world.go
Outdated
kobj := &kuadrantv1beta1.Kuadrant{} | ||
for _, event := range events { | ||
if event.Kind == kuadrantv1beta1.KuadrantKind && event.EventType == controller.CreateEvent { | ||
kobjs := lo.FilterMap(topology.Objects().Roots(), func(item machinery.Object, _ int) (*kuadrantv1beta1.Kuadrant, bool) { | ||
if item.GetName() == event.NewObject.GetName() && item.GetNamespace() == event.NewObject.GetNamespace() && item.GroupVersionKind().Kind == event.NewObject.GetObjectKind().GroupVersionKind().Kind { | ||
return item.(*kuadrantv1beta1.Kuadrant), true | ||
} | ||
return nil, false | ||
}) | ||
if len(kobjs) != 1 { | ||
logger.Error(fmt.Errorf("muiltply kuadrant CRs found"), "unexpected behaviour may happen") | ||
} | ||
kobj = kobjs[0] | ||
break | ||
} else if event.Kind == kuadrantv1beta1.AuthorinoKind && event.EventType == controller.DeleteEvent { | ||
kobjs := lo.FilterMap(topology.Objects().Roots(), func(item machinery.Object, _ int) (*kuadrantv1beta1.Kuadrant, bool) { | ||
if item.GetNamespace() == event.NewObject.GetNamespace() && item.GroupVersionKind().Kind == kuadrantv1beta1.KuadrantKind.Kind { | ||
return item.(*kuadrantv1beta1.Kuadrant), true | ||
} | ||
return nil, false | ||
}) | ||
|
||
if len(kobjs) == 0 { | ||
logger.Info("no possible kuadrant parent, wont create Authorino CR.") | ||
return | ||
} | ||
if len(kobjs) != 1 { | ||
logger.Error(fmt.Errorf("muiltply kuadrant CRs found"), "unexpected behaviour may happen") | ||
} | ||
kobj = kobjs[0] | ||
if kobj.GetDeletionTimestamp() != nil { | ||
logger.Info("kuadrant CR marked for deletion, wont create Authorino CR.") | ||
return | ||
} | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had pictured this a bit different in my head:
- Find a
Kuadrant
object that is the root of the topology - If there is none,
- delete all
Authorino
objects that exist in the topology - return
- delete all
- Otherwise
- build the desired
Authorino
object for theKuadrant
root - create it if it doesn't exist as a child of the
Kuadrant
root - update it otherwise
- build the desired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think want is being done is mostly what you listed above. The things that are not being done are:
- deleting the Authorino CRs when the there is not kuadrant root. The cluster is handling this by the owner reference
- update if otherwise. The current implantation also does not do this. I don't think it would be right to introduction that change now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleting the Authorino CRs when the there is not kuadrant root. The cluster is handling this by the owner reference
The cluster will only delete the Authorino CR if it is indeed owned by the Kuadrant one. So I guess what we are discussing is whether other instances of Authorino can co-exist in the same cluster. If so, then 2 questions open up:
- Any instance of Authroino that exists in the same namespace as a Kuadrant CR will be linked to the Kuadrant CR by the linking function. Is that what we want?
- What does it even mean having a second instance of Authorino running in the cluster while the one managed by Kuadrant is cluster-wide and without label selectors specified?
In general, I guess my thinking had to do with not seeing a need for checking the event type. The root of the topology must be a Kuadrant object or there should be no Authorino CR.
controllers/state_of_the_world.go
Outdated
if errors.IsAlreadyExists(err) { | ||
logger.Info("already created authorino Cr") | ||
} else { | ||
logger.Error(err, "failed to create authorino Cr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting one. I wonder if the Kuadrant CR should reflect in its status that no corresponding Authorino CR exists and then use this update to the status to retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently the status controller will set the status in the kuadrant CR if the authorino CR is missing or if authorino is not in a ready state. That should be creating the retry events. Not sure if with the status controller outside the state of the world workflow if that can cause a timing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I missed this reconciliation task that updates the status of the Kuadrant CR represented in the workflow then.
To cover for that, I think Authorino and Limitador reconciliation could be wrapped as two concurrent tasks of a "Data plane components reconciliation" workflow, with a postcondition taking care of the status update on the Kuadrant CR.
The task introduced by this PR then, focused on the Authorino CR only, may need to inject some hint of the status of Authorino (using the sync.Map
) for the status updater to pick up – e.g., when it just created the CR or in case of failures like such as in this line.
db03646
to
df27d5a
Compare
controllers/state_of_the_world.go
Outdated
Name: "authorino", | ||
Namespace: kobj.Namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the selection of the Kuadrant object above correctly, if a second Kuadrant CR is created (in another namespace), another Authorino instance will be created for it too. Is that what we want? I think the only way to support this is by also supporting sharded instances (#274).
If a second Kuadrant CR is created in the same namespace where another exists, then we know we won't create another instance of Authorino in that same namespace (with static name authorino
), right? So why even continue if don't reconcile existing Authorino CRs? And if we do decide to reconcile those, as soon as it starts having any field set directly from the Kuadrant CR, then the current code could trigger an infinite loop of reconciliation events with 2+ Kuadrant CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this matters any more as it is only the first kuadrant resource that we accept as the root.
controllers/state_of_the_world.go
Outdated
if item.GetNamespace() == event.OldObject.GetNamespace() && item.GroupVersionKind().Kind == kuadrantv1beta1.KuadrantKind.Kind { | ||
return item.(*kuadrantv1beta1.Kuadrant), true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be obvious why this has to be the Kuadrant CR previously linked to the deleted Authorino object. We know this is true because in line 155, the Authorino object is built to exist in the same namespace as kobj.Namespace
. If we continue with this approach for selecting the associated Kuadrant root, maybe add comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is still valid.
api/v1beta1/topology.go
Outdated
To: AuthorinoKind, | ||
Func: func(child machinery.Object) []machinery.Object { | ||
return lo.Filter(kuadrants, func(kuadrant machinery.Object, _ int) bool { | ||
return kuadrant.GetNamespace() == child.GetNamespace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The association is made by name, not namespace. If it is by namespace, all authorinos in kuadrant-system
will be linked to the root kuadrant CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. If we name the Authorino CR after the Kuadrant CR, then we can have just that instance linked and any other one not linked and assumed to have been created by the user manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I don't think we can rename the Authorino CR after the Kuadrant CR. I have tried to do this and it fails because of the status reconciler. Once the the status reconciler is move to the state of the world controller we can do a dynamic name based on the Kuadrant CR.
For now I can narrow this down to the hard coded name for the Authorino CR.
controllers/state_of_the_world.go
Outdated
}, | ||
} | ||
return reconciler.Run | ||
} | ||
|
||
type AuthorinoCrReconciler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding "XX_task.go" files to controllers
golang package. Happy to hear from other options.
controllers/state_of_the_world.go
Outdated
ReconcileFunc: r.Reconcile, | ||
Events: []controller.ResourceEventMatcher{ | ||
{Kind: ptr.To(kuadrantv1beta1.KuadrantKind), EventType: ptr.To(controller.CreateEvent)}, | ||
{Kind: ptr.To(kuadrantv1beta1.AuthorinoKind), EventType: ptr.To(controller.DeleteEvent)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will subscribe to events on any authorino in the cluster. Can we filter it to "owned authorino" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing more specific than this the subscriber allows us to go is by adding name of the resource. I'm afraid the only way for us to know that is if it's statically named. If it's dynamically named after the Kuadrant CR (as I suggested), then it won't work.
An alternative is labeling the Authorino resources we create and adding the label selector to the watcher. This would the Kuadrant Operator completely ignore any other instance manually created by users, not even adding them to the topology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to be more filtered on what events we get in general. I like the idea of using the labels to do this filtering and I am going to suggest that we create a common one called kuadrant.io/resource
as this is not the only resource we should be filtering on. The same steps should be taken for the Limitador CR, as the step is mostly the very same, #871.
However I have come across a possible issue with the approach and you can create the issue today with the topology configmap. When you filter by labels and a user removes said labels, you do not get that event. This causes a number of issues. Operator restarts will try recreate the resource but fail, which is normally okay as you would also get the create event for that resource if it was labeled.
There is a problem here, and I am not to sure how much of an edge case it is, or even how easy it maybe to solve. I have one idea but it is not compatible with the controller-runtime as far as I am aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to ask if adding the filtering by label or by field selector (needs research) can be added in a follow up PR. I am conscious of time and don't want to see this getting held up over bike sheding. For now we would get some more events than we really want.
controllers/state_of_the_world.go
Outdated
return | ||
} | ||
|
||
aobjs := lo.FilterMap(topology.Objects().Objects().Items(), func(item machinery.Object, _ int) (machinery.Object, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the linking function to say, "I want kuadrant CR childs". One of them should be authorino CR
controllers/state_of_the_world.go
Outdated
ObjectMeta: metav1.ObjectMeta{ | ||
Name: "authorino", | ||
Namespace: kobj.Namespace, | ||
OwnerReferences: []metav1.OwnerReference{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a method to add ownerrefs. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I looked at that and the ownerrefs method I found at least expects a schema to also be passed in, which we do have access to at this stage. So I went with manually adding the reference.
controllers/state_of_the_world.go
Outdated
if len(kobjs) > 1 { | ||
logger.Error(fmt.Errorf("multiple Kuadrant resources found"), "cannot select root Kuadrant resource", "status", "error") | ||
} | ||
kobj := kobjs[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may think about the order these objects are fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense are you say we should think about the order the resources are fetched. Is this how I get the the kuadrant CR's before the Authorino CR's or related to the kobjs[0]
may not be the oldest on cluster and therefore the wrong one? The age of the CR is something I had though about so possible should fix.
a37e514
to
e8899cc
Compare
@@ -164,3 +180,30 @@ func (e *EventLogger) Log(ctx context.Context, resourceEvents []controller.Resou | |||
|
|||
return nil | |||
} | |||
|
|||
// GetOldestKuadrant returns the oldest kuadrant resource from a list of kuadrant resources that is not marked for deletion. | |||
func GetOldestKuadrant(kuadrants []*kuadrantv1beta1.Kuadrant) (*kuadrantv1beta1.Kuadrant, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably pass the topology here. This is a pattern that likely will repeat a lot across tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but I am not going to do it in this PR. There was a topic I wish to discuss about unit testing where the input is a topology. I have a place holder issue, Kuadrant/policy-machinery#33. But for now I would like to leave this as is with its unit tests until there is a good way to generate the a topology for a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a TODO to refactor this ASAP. I do not want to risk people passing different lists of Kuadrant objects to this function. I want to make sure it is going to be always the same Kuadrant object that will be returned to all tasks that depend on it to navigate the topology from the common root.
return err | ||
} | ||
|
||
aobjs := lo.FilterMap(topology.Objects().Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundant consecutive .Objects()
call? 🤔
aobjs := lo.FilterMap(topology.Objects().Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) { | |
aobjs := lo.FilterMap(topology.Objects().Children(kobj), func(item machinery.Object, _ int) (machinery.Object, bool) { |
@@ -26,7 +28,7 @@ import ( | |||
|
|||
var ( | |||
ConfigMapGroupKind = schema.GroupKind{Group: corev1.GroupName, Kind: "ConfigMap"} | |||
operatorNamespace = env.GetString("OPERATOR_NAMESPACE", "") | |||
operatorNamespace = env.GetString("OPERATOR_NAMESPACE", "kuadrant-system") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make the panic at the following redundant, do we want to remove the panic now?
kuadrant-operator/controllers/state_of_the_world.go
Lines 152 to 154 in abc2e19
if namespace == "" { | |
panic("namespace must be specified and can not be a blank string") | |
} |
Or we can remove setting a default ns env value and set it across all the makefile integration test commands that expect this env value to be set if we want to keep the panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, but this looks good to me 👍
Signed-off-by: Jim Fitzpatrick <[email protected]> # Conflicts: # api/v1beta1/topology.go # controllers/state_of_the_world.go
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
- Change logic on finding initial root kuadrant - Improve logging statements Signed-off-by: Jim Fitzpatrick <[email protected]>
Kuadrant only expects one kuadrant CR on the cluster. Recent changes enforces the creation of only one authorino. This enforcement was causing the tests to fail as it was creating a second kuadrant CR that would never go to a ready state. Signed-off-by: Jim Fitzpatrick <[email protected]>
Get the oldest kuadrant CR and use that as the only CR. Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
Signed-off-by: Jim Fitzpatrick <[email protected]>
e8899cc
to
89fb936
Compare
Signed-off-by: Jim Fitzpatrick <[email protected]>
89fb936
to
6aa3c68
Compare
closes: #812
Move the reconcile of the Authorino CR to the state of the world reconcile. There should be no functional change.