Gauge for FilesystemIsReadOnly not downgraded to 0 after fixing the problem #474

sharonosbourne · 2020-10-08T07:05:19Z

The problem occurred when filesystem went to read only mode. That was fixed, but still in the metrics I was able to see the counter and gauge set up to 1.
I conducted a test and multiple times injected the FileSystemIsReadOnly to the /dev/kmsg (https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json):

1 log_monitor.go:160] New status generated: &{Source:kernel-monitor Events:[{Severity:info Timestamp:2020-10-08 06:44:16.09315274 +0000 UTC m=+1331754.148888064 Reason:FilesystemIsReadOnly Message:Node condition ReadonlyFilesystem is now: True, reason: FilesystemIsReadOnly}] Conditions:[{Type:KernelDeadlock Status:False Transition:2020-09-22 20:48:21.98500453 +0000 UTC m=+0.040739839 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:True Transition:2020-10-08 06:44:16.09315274 +0000 UTC m=+1331754.148888064 Reason:FilesystemIsReadOnly Message:Remounting filesystem read-only}]}

Still the metrics were shown as 1 and it did not downgraded to 0. Even the the issue with ro filesystem was fixed, still the metric was 1:

problem_counter{reason="FilesystemIsReadOnly"} 1
problem_gauge{reason="FilesystemIsReadOnly",type="ReadonlyFilesystem"} 1

As a workaround the pod was deleted and after that metrics were reset to 0.
What is the reason of that behaviour? The type "permanent"? Is deleting a pod the only solution?

kernel-monitor.json

	{
		"type": "permanent",
		"condition": "ReadonlyFilesystem",
		"reason": "FilesystemIsReadOnly",
		"pattern": "Remounting filesystem read-only"
	}

The text was updated successfully, but these errors were encountered:

fejta-bot · 2021-01-06T07:09:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-02-05T07:53:29Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

adesaegher · 2021-02-17T10:25:55Z

We are also facing a similar issue but with many occurences as we are extensively using PVC with GCP disks.
With a ot of mounting/unmounting operations, the kernel catches many readonly disks events (not on the node root disk) and consequently node-problem-detector set the node as not ready.
We may also find a more precise pattern in kernel-monitor.json to only catch root filesystem events.

fejta-bot · 2021-03-19T11:24:06Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-19T11:24:15Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

TaichiHo · 2024-02-15T23:49:37Z

This looks like a long standing bug that is still happening. Any suggestions here?

/remove-lifecycle rotten

wangzhen127 · 2024-02-26T18:25:10Z

/reopen

k8s-ci-robot · 2024-02-26T18:25:14Z

@wangzhen127: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bsdnet · 2024-02-28T00:50:26Z

Are we deploying the NPD as a Linux daemon or a privileged container?

wangzhen127 · 2024-03-01T01:06:53Z

On GKE, it is deployed as linux daemon

wangzhen127 · 2024-03-01T01:12:35Z

@sharonosbourne do you remember if your issue was due to read-only filesystem in non-boot disk?

k8s-triage-robot · 2024-05-30T01:29:38Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-29T01:59:30Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

aslafy-z · 2024-07-02T15:02:45Z

/remove-lifecycle rotten

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2021

k8s-ci-robot closed this as completed Mar 19, 2021

himanshu-kun mentioned this issue Apr 10, 2023

☂️ Improve health checks based on node conditions gardener/machine-controller-manager#604

Open

10 tasks

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 15, 2024

k8s-ci-robot reopened this Feb 26, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gauge for FilesystemIsReadOnly not downgraded to 0 after fixing the problem #474

Gauge for FilesystemIsReadOnly not downgraded to 0 after fixing the problem #474

sharonosbourne commented Oct 8, 2020

fejta-bot commented Jan 6, 2021

fejta-bot commented Feb 5, 2021

adesaegher commented Feb 17, 2021 •

edited

Loading

fejta-bot commented Mar 19, 2021

k8s-ci-robot commented Mar 19, 2021

TaichiHo commented Feb 15, 2024

wangzhen127 commented Feb 26, 2024

k8s-ci-robot commented Feb 26, 2024

bsdnet commented Feb 28, 2024

wangzhen127 commented Mar 1, 2024

wangzhen127 commented Mar 1, 2024

k8s-triage-robot commented May 30, 2024

k8s-triage-robot commented Jun 29, 2024

aslafy-z commented Jul 2, 2024

Gauge for FilesystemIsReadOnly not downgraded to 0 after fixing the problem #474

Gauge for FilesystemIsReadOnly not downgraded to 0 after fixing the problem #474

Comments

sharonosbourne commented Oct 8, 2020

fejta-bot commented Jan 6, 2021

fejta-bot commented Feb 5, 2021

adesaegher commented Feb 17, 2021 • edited Loading

fejta-bot commented Mar 19, 2021

k8s-ci-robot commented Mar 19, 2021

TaichiHo commented Feb 15, 2024

wangzhen127 commented Feb 26, 2024

k8s-ci-robot commented Feb 26, 2024

bsdnet commented Feb 28, 2024

wangzhen127 commented Mar 1, 2024

wangzhen127 commented Mar 1, 2024

k8s-triage-robot commented May 30, 2024

k8s-triage-robot commented Jun 29, 2024

aslafy-z commented Jul 2, 2024

adesaegher commented Feb 17, 2021 •

edited

Loading