Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearing Old Conditions #1011

Open
daveoy opened this issue Jan 15, 2025 · 2 comments · May be fixed by #1022
Open

Clearing Old Conditions #1011

daveoy opened this issue Jan 15, 2025 · 2 comments · May be fixed by #1022

Comments

@daveoy
Copy link
Contributor

daveoy commented Jan 15, 2025

we have observed that when changing our system log monitor configurations to omit a previously watched condition, the condition persists on the node object.

I have added a bool flag --delete-deprecated-conditions and stringSliceFlag --deprecated-condition-types, plus a handler into the k8sexporter that will delete conditions from a node object on exporter initialization.

would this community be interested in a PR that supplies this feature?

@daveoy
Copy link
Contributor Author

daveoy commented Feb 4, 2025

here's example output, i'll add a PR shortly

I0204 17:20:21.902108       7 problem_client.go:127] Deleting deprecated conditions [GPUMMUErrorXid31 JournaldCGroupOOMKilling JournaldGPUApplicationError JournaldGPUECCUncorrectableError JournaldGPUFallenOffBus JournaldGPUFault JournaldGPUGSPTimeoutXid119 JournaldGPUInvalidPushBuffer JournaldGPURowRemapFailure JournaldGPUWantsReset JournaldHardwareErrorCorrected JournaldHardwareErrorFatal JournaldHardwareErrorInfo JournaldHardwareErrorInterruptCPU JournaldHardwareErrorInterruptMemory JournaldHardwareErrorInterruptPCIe JournaldHardwareErrorInterruptUnknown JournaldHardwareErrorRecoverable JournaldKernelDeadlock JournaldKernelFailedToGetEntry JournaldKernelFailedToGetNextEntry JournaldKernelHardlock JournaldKernelOops JournaldKernelWatchLoopStarted JournaldLocalDiskErrors JournaldNFSStorageFault JournaldNVSwitchFailure JournaldNVSXidNonFatal JournaldPCIAER JournaldPersistentStorageFault JournaldReadonlyFilesystem JournaldSystemOOMKilling JournaldTaskHung JournaldUnregisterNetDevice] (if present)...
I0204 17:20:21.911443       7 problem_client.go:140] Deleting deprecated condition JournaldGPUApplicationError
I0204 17:20:21.911461       7 problem_client.go:140] Deleting deprecated condition JournaldHardwareErrorInterruptUnknown
I0204 17:20:21.911465       7 problem_client.go:140] Deleting deprecated condition JournaldGPUFallenOffBus
I0204 17:20:21.911467       7 problem_client.go:140] Deleting deprecated condition JournaldPersistentStorageFault
I0204 17:20:21.911470       7 problem_client.go:140] Deleting deprecated condition JournaldHardwareErrorInterruptMemory
I0204 17:20:21.911472       7 problem_client.go:140] Deleting deprecated condition JournaldReadonlyFilesystem
I0204 17:20:21.911474       7 problem_client.go:140] Deleting deprecated condition JournaldHardwareErrorFatal
I0204 17:20:21.911477       7 problem_client.go:140] Deleting deprecated condition JournaldKernelDeadlock
I0204 17:20:21.911480       7 problem_client.go:140] Deleting deprecated condition JournaldLocalDiskErrors
I0204 17:20:21.911483       7 problem_client.go:140] Deleting deprecated condition JournaldHardwareErrorInterruptPCIe
I0204 17:20:21.911486       7 problem_client.go:140] Deleting deprecated condition JournaldKernelHardlock
I0204 17:20:21.911490       7 problem_client.go:140] Deleting deprecated condition JournaldHardwareErrorInterruptCPU
I0204 17:20:21.911492       7 problem_client.go:140] Deleting deprecated condition JournaldGPURowRemapFailure
I0204 17:20:21.911494       7 problem_client.go:140] Deleting deprecated condition JournaldGPUECCUncorrectableError
I0204 17:20:21.911496       7 problem_client.go:140] Deleting deprecated condition JournaldGPUWantsReset
I0204 17:20:21.911498       7 problem_client.go:140] Deleting deprecated condition JournaldGPUFault
I0204 17:20:21.911500       7 problem_client.go:140] Deleting deprecated condition JournaldGPUGSPTimeoutXid119
I0204 17:20:21.911502       7 problem_client.go:140] Deleting deprecated condition JournaldGPUInvalidPushBuffer

@daveoy
Copy link
Contributor Author

daveoy commented Feb 4, 2025

removed 120k+ conditions over 20k+ nodes using the linked PR's code just this morning. this is in addition to 4k-6k worth of conditions i removed while testing this 3 weeks back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant