Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health checker doesn't restart kubelet on EKS #861

Open
chotiwat opened this issue Feb 14, 2024 · 8 comments
Open

Health checker doesn't restart kubelet on EKS #861

chotiwat opened this issue Feb 14, 2024 · 8 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@chotiwat
Copy link
Contributor

Hi,

The health-checker-kubelet monitor runs systemctl kill against the kubelet service but kubelet doesn't get restarted because the exit code would be 0. EKS containerd nodes only restart kubelet on failure or SIGPIPE https://github.com/awslabs/amazon-eks-ami/blob/master/files/kubelet-containerd.service.

Am I misunderstanding the purpose of the repair function? Is it meant to kill the kubelet so that another system can take care of it, e.g. rotating out the node?

Is there a reason systemctl kill is used instead of systemctl restart? I was testing this functionality by stopping the kubelet manually and the health checker failed to kill the service with the error message Failed to kill unit kubelet.service: No main process to kill (slightly related: #860).

@wangzhen127
Copy link
Member

Here is how kubelet is configured in k/k: https://github.com/kubernetes/kubernetes/blob/f8930f980d2986f9e486b04c14c3e93e57bdbe12/cluster/gce/gci/configure-helper.sh#L1652

systemctl restart is often used by administrators to reconfigure system components. For example https://github.com/GoogleCloudPlatform/k8s-node-tools/blob/47ae266dcfa0af4edf80b87b4d6e2011bf8318d9/kubelet-log-config/kubelet-log-config.yaml#L89.

Its kernel log pattern is different from systemctl kill. We recently start to depend on this difference for containerd. See #847.

@chotiwat
Copy link
Contributor Author

@wangzhen127 Thanks for the explanation, and I see that you added the comment to the code as well 🫡.

Unfortunately, the kubelet service in the EKS AMI is still using Restart=onfailure. Wouldn't it be more robust to be explicit about the behavior of the repair function rather than depending on the system configuration? In this case, I'm assuming the end goal of the repair function is to allow the unhealthy service to start up again, which seems more appropriate to use systemctl restart.

Regarding #847, isn't there a better way to differentiate between unhealthy restarts and normal restarts? I also find it weird that systemctl kill doesn't result in the Stopping ... log. Does that mean it's not graceful (although I don't see --signal=SIGTERM or equivalent in the code)?

I think it's best if we don't have to rely on the difference between systemctl restart and systemctl kill for #847. If that's really not possible, should the repair function be generalized more to support EKS and possibly other cloud providers, perhaps by making the repair command configurable (e.g. custom command or predefined termination modes)?

@wangzhen127
Copy link
Member

I agree that it is desirable to find a better way to differentiate between unhealthy restart and planned restart. Do you have any suggestions?

@chotiwat
Copy link
Contributor Author

I don't have any suggestions at the moment either. I will update this issue if I got time to play around with it and think of something.

@chotiwat
Copy link
Contributor Author

chotiwat commented Jun 9, 2024

I looked into it a bit more. The journald logs by the systemd source is very limited. We could look at the service logs which should have different termination logs between healthy exits and an unhealthy ones, but I'm assuming that's undesirable because we would need to study log patterns for each of the systemd services we want to monitor.

Relying on the current hack, we could still make the repair function more robust by checking the Restart property of the unit with systemctl show and running systemctl start if Restart isn't always or on-success (see this table).

We could also simply always call systemctl start since it's a no-op if the service is already running. This might actually be safer since other systemd service options can alter the restart behavior, e.g. RestartPreventExitStatus.

What do you think?

@wangzhen127
Copy link
Member

I think the reason why the system service is using Restart=always for kubelet and container runtime is because we expect them to always run on the nodes in a cluster. Is there any reason why EKS uses Restart=onfailure? In which case would a user stop kubelet or container runtime?

@chotiwat
Copy link
Contributor Author

I cannot answer for the EKS maintainers, but I don't think it was a wrong choice. In their healthy state, these server services are expected to run forever, never exiting with code 0 on their own, so Restart=on-failure is perfectly reasonable to me. It is actually the recommended choice by the systemd documentation. https://www.man7.org/linux/man-pages/man5/systemd.service.5.html#OPTIONS:~:text=Setting%20this%20to,an%20alternative%20choice.

In which case would a user stop kubelet or container runtime?

Like you said, I assumed most commonly would be admins reconfiguring the system. This wouldn't make always a better or worse choice than on-failure though, since normally admins would use systemctl stop for that and the Restart= option wouldn't matter. On that note, I'm also curious as well why always was picked over on-failure for Google Cloud.

I think it's even more unnatural to restart a service by relying on Restart=always behavior, especially when we can be explicit and run systemctl restart. Who knows how many cloud-based or hand-rolled, bare-metal kubernetes clusters out there don't use Restart=always for the kubelet service. Node problem detector should aim to be as cloud-agnostic as possible.

Since we cannot use systemctl restart due to the log pattern differentiation dependence, the two solutions I suggested earlier should at least support any systemctl configuration. I'm leaning toward the second approach (always running systemctl start after). If you approve, I would be happy to open a PR as well.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants