You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When starting the agent with the proper permissions, it will throw the following error in the logs and hang:
1 node_event_consumer.go:72] Error obtaining information about the agent pod [openshift-infra/hawkular-openshift-agent-qzg21]. err=User "system:serviceaccount:openshift-infra:hawkular-openshift-agent" cannot get pods in project "openshift-infra"
If the SA is given the proper permissions, the pod will still hang. If the pod is restarted it will startup properly.
By hanging like this, its left in a position where its indicating that its ready and running properly (status 1/1). At the very least, if it cannot properly continue, it should exit so that a new pod can be started in its place.
In this case, I believe the agent should wait and attempt to connect a few more times after some delay. We could even use a 'readiness probe' here to determine when the agent reaches a ready state.
The text was updated successfully, but these errors were encountered:
I think a more appropriate action would be to have the pod stay in the 'not ready' phase until the permission is granted. We can use a readiness probe for that. The logs should clearly indicate what the problem is and how to fix it. And once the permission has been granted the pod can continue and enter the ready state.
I don't think we want to restart the pod in this case. That will cause a crashloopback problem which looks like our pods are really unstable and crashing. Is this also more portrayed as an error condition back to the user.
I think the same thing should be done if the permission is revoked. We shouldn't restart the pod in the case, but perhaps log the error and continuously check if the permission has been re-granted. And in this case to also exit the ready state.
When starting the agent with the proper permissions, it will throw the following error in the logs and hang:
1 node_event_consumer.go:72] Error obtaining information about the agent pod [openshift-infra/hawkular-openshift-agent-qzg21]. err=User "system:serviceaccount:openshift-infra:hawkular-openshift-agent" cannot get pods in project "openshift-infra"
If the SA is given the proper permissions, the pod will still hang. If the pod is restarted it will startup properly.
By hanging like this, its left in a position where its indicating that its ready and running properly (status 1/1). At the very least, if it cannot properly continue, it should exit so that a new pod can be started in its place.
In this case, I believe the agent should wait and attempt to connect a few more times after some delay. We could even use a 'readiness probe' here to determine when the agent reaches a ready state.
The text was updated successfully, but these errors were encountered: