From a956e966b2cd71b72e013c70f25c9cb7103bc1e7 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Wed, 8 Jan 2025 02:03:21 +0530
Subject: [PATCH 01/13] KEP-4819: Container log rotation on Disk perssure

---
 .../README.md                                 | 451 ++++++++++++++++++
 .../4819-log-rotate-on-disk-pressure/kep.yaml |  15 +
 2 files changed, 466 insertions(+)
 create mode 100644 keps/sig-node/4819-log-rotate-on-disk-pressure/README.md
 create mode 100644 keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml

diff --git a/keps/sig-node/4819-log-rotate-on-disk-pressure/README.md b/keps/sig-node/4819-log-rotate-on-disk-pressure/README.md
new file mode 100644
index 00000000000..28f2d8ce079
--- /dev/null
+++ b/keps/sig-node/4819-log-rotate-on-disk-pressure/README.md
@@ -0,0 +1,451 @@
+<!--
+**Note:** When your KEP is complete, all of these comment blocks should be removed.
+
+To get started with this template:
+
+- [ ] **Pick a hosting SIG.**
+  Make sure that the problem space is something the SIG is interested in taking
+  up. KEPs should not be checked in without a sponsoring SIG.
+- [ ] **Create an issue in kubernetes/enhancements**
+  When filing an enhancement tracking issue, please make sure to complete all
+  fields in that template. One of the fields asks for a link to the KEP. You
+  can leave that blank until this KEP is filed, and then go back to the
+  enhancement and add the link.
+- [ ] **Make a copy of this template directory.**
+  Copy this template into the owning SIG's directory and name it
+  `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
+  leading-zero padding) assigned to your enhancement above.
+- [ ] **Fill out as much of the kep.yaml file as you can.**
+  At minimum, you should fill in the "Title", "Authors", "Owning-sig",
+  "Status", and date-related fields.
+- [ ] **Fill out this file as best you can.**
+  At minimum, you should fill in the "Summary" and "Motivation" sections.
+  These should be easy if you've preflighted the idea of the KEP with the
+  appropriate SIG(s).
+- [ ] **Create a PR for this KEP.**
+  Assign it to people in the SIG who are sponsoring this process.
+- [ ] **Merge early and iterate.**
+  Avoid getting hung up on specific details and instead aim to get the goals of
+  the KEP clarified and merged quickly. The best way to do this is to just
+  start with the high-level sections and fill out details incrementally in
+  subsequent PRs.
+
+Just because a KEP is merged does not mean it is complete or approved. Any KEP
+marked as `provisional` is a working document and subject to change. You can
+denote sections that are under active debate as follows:
+
+```
+<<[UNRESOLVED optional short context or usernames ]>>
+Stuff that is being argued.
+<<[/UNRESOLVED]>>
+```
+
+When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
+focused. If you disagree with what is already in a document, open a new PR
+with suggested changes.
+
+One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
+You do not need a new KEP to move from beta to GA, for example. If
+new details emerge that belong in the KEP, edit the KEP. Once a feature has become
+"implemented", major changes should get new KEPs.
+
+The canonical place for the latest set of instructions (and the likely source
+of this file) is [here](/keps/NNNN-kep-template/README.md).
+
+**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
+it is marked `implementable`, must be approved by each of the KEP approvers.
+If none of those approvers are still appropriate, then changes to that list
+should be approved by the remaining approvers and/or the owning SIG (or
+SIG Architecture for cross-cutting KEPs).
+-->
+# KEP-4819: Container log rotation on Disk perssure
+
+<!--
+This is the title of your KEP. Keep it short, simple, and descriptive. A good
+title can help communicate what the KEP is and should be considered as part of
+any review.
+-->
+
+<!--
+A table of contents is helpful for quickly jumping to sections of a KEP and for
+highlighting any additional information provided beyond the standard KEP
+template.
+
+Ensure the TOC is wrapped with
+  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
+tags, and then generate with `hack/update-toc.sh`.
+-->
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+    - [Goals](#goals)
+    - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+    - [User Stories (Optional)](#user-stories-optional)
+        - [Story 1](#story-1)
+        - [Story 2](#story-2)
+    - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
+    - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+    - [Test Plan](#test-plan)
+        - [Prerequisite testing updates](#prerequisite-testing-updates)
+        - [Unit tests](#unit-tests)
+        - [Integration tests](#integration-tests)
+        - [e2e tests](#e2e-tests)
+    - [Graduation Criteria](#graduation-criteria)
+    - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+    - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+    - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+    - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+    - [Monitoring Requirements](#monitoring-requirements)
+    - [Dependencies](#dependencies)
+    - [Scalability](#scalability)
+    - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+<!--
+**ACTION REQUIRED:** In order to merge code into a release, there must be an
+issue in [kubernetes/enhancements] referencing this KEP and targeting a release
+milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
+of the targeted release**.
+
+For enhancements that make changes to code or processes/procedures in core
+Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
+Signoff checklist to be completed.
+
+Check these off as they are completed for the Release Team to track. These
+checklist items _must_ be updated for the enhancement to be released.
+-->
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+    - [ ] e2e Tests for all Beta API Operations (endpoints)
+    - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
+    - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+    - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
+- [ ] (R) Production readiness review completed
+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+<!--
+**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
+-->
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+Rotate containers logs when there is disk pressure on kubelet host.
+
+## Motivation
+
+### What would you like to be added?
+
+The [ContainerLogManager](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers in case of disk pressure on host.
+
+### Why is this needed?
+
+It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.
+
+For example, kubelet has
+```
+containerLogMaxSize = 200M
+containerLogMaxFiles = 6
+```
+
+### Spec 1
+
+Continuously generating 10Mib with 0.1 sec sleep in between
+```
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: generate-huge-logs
+spec:
+  template:
+    spec:
+      containers:
+      - name: log-generator
+        image: busybox
+        command: ["/bin/sh", "-c"]
+        args:
+          - |
+            # Generate huge log entries to stdout
+            start_time=$(date +%s)
+            log_size=0
+            target_size=$((4 * 1024 * 1024 * 1024))  # 4 GB target size in bytes
+            while [ $log_size -lt $target_size ]; do
+              # Generate 1 MB of random data and write it to stdout
+              echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
+              log_size=$(($log_size + 1048576))  # Increment size by 1MB
+              sleep 0.1  # Sleep to control log generation speed
+            done
+            end_time=$(date +%s)
+            echo "Log generation completed in $((end_time - start_time)) seconds"
+      restartPolicy: Never
+  backoffLimit: 4
+```
+File sizes
+```
+-rw-r----- 1 root root  24142862 Jan  1 11:41 0.log
+-rw-r--r-- 1 root root 183335398 Jan  1 11:40 0.log.20250101-113948.gz
+-rw-r--r-- 1 root root 364144934 Jan  1 11:40 0.log.20250101-114003.gz
+-rw-r--r-- 1 root root 487803789 Jan  1 11:40 0.log.20250101-114023.gz
+-rw-r--r-- 1 root root 577188544 Jan  1 11:41 0.log.20250101-114047.gz
+-rw-r----- 1 root root 730449620 Jan  1 11:41 0.log.20250101-114115
+```
+
+### Spec 2
+
+Continuously generating 10Mib with 10 sec sleep in between
+```
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: generate-huge-logs
+spec:
+  template:
+    spec:
+      containers:
+      - name: log-generator
+        image: busybox
+        command: ["/bin/sh", "-c"]
+        args:
+          - |
+            # Generate huge log entries to stdout
+            start_time=$(date +%s)
+            log_size=0
+            target_size=$((4 * 1024 * 1024 * 1024))  # 4 GB target size in bytes
+            while [ $log_size -lt $target_size ]; do
+              # Generate 1 MB of random data and write it to stdout
+              echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
+              log_size=$(($log_size + 1048576))  # Increment size by 1MB
+              sleep 0.1  # Sleep to control log generation speed
+            done
+            end_time=$(date +%s)
+            echo "Log generation completed in $((end_time - start_time)) seconds"
+      restartPolicy: Never
+  backoffLimit: 4
+```
+
+File sizes
+```
+-rw-r----- 1 root root 181176268 Jan  1 11:31 0.log
+-rw-r--r-- 1 root root 183336647 Jan  1 11:20 0.log.20250101-111730.gz
+-rw-r--r-- 1 root root 183323382 Jan  1 11:23 0.log.20250101-112026.gz
+-rw-r--r-- 1 root root 183327676 Jan  1 11:26 0.log.20250101-112321.gz
+-rw-r--r-- 1 root root 183336376 Jan  1 11:29 0.log.20250101-112616.gz
+-rw-r----- 1 root root 205360966 Jan  1 11:29 0.log.20250101-112911
+```
+
+
+If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on kubelet host and that can affect other pods running in the same kubelets.
+
+### Goals
+
+- Rotate and Clean all container logs on kubelet Disk pressure
+
+## Proposal
+
+<!--
+This is where we get down to the specifics of what the proposal actually is.
+This should have enough detail that reviewers can understand exactly what
+you're proposing, but should not include things like API designs or
+implementation. What is the desired outcome and how do we measure success?.
+The "Design Details" section below is for the real
+nitty-gritty.
+-->
+
+
+### Risks and Mitigations
+
+No identified risk.
+
+## Design Details
+
+Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.
+
+- `logRotateDiskCheckInterval` is the time interval within which the ContainerLogManager will check Disk usage on the kubelet host.
+- `logRotateDiskPressureThreshold` is the threshold of overall Disk usage on the kubelet. If actual Disk usage is equal or more than this threshold, it will rotate logs of all the containers of the kubelet.
+
+### Test Plan
+
+<!--
+**Note:** *Not required until targeted at a release.*
+The goal is to ensure that we don't accept enhancements with inadequate testing.
+
+All code is expected to have adequate tests (eventually with coverage
+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
+when drafting this test plan.
+
+[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
+-->
+
+[X] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+<!--
+Based on reviewers feedback describe what additional tests need to be added prior
+implementing this enhancement to ensure the enhancements have also solid foundations.
+-->
+
+##### Unit tests
+- Add detailed unit tests with 100% coverage.
+- `<package>`: `<date>` - `<test coverage>`
+
+##### Integration tests
+- Scenarios will be covered in e2e tests. 
+
+##### e2e tests
+- Set very high value for `containerLogMaxSize` and `containerLogMaxFiles` to disable periodic log rotation.
+- Add test under `kubernetes/test/e2e_node/container_log_rotation_test.go`.
+- Set very low values for  `logRotateDiskCheckInterval` and `logRotateDiskPressureThreshold`. Create a pod with generating heavy logs and expect the container logs to be rotated after `logRotateDiskCheckInterval` and Disk usage not going more than `logRotateDiskPressureThreshold`.
+
+### Graduation Criteria
+
+**Note:** *Not required until targeted at a release.*
+
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+<!--
+Pick one of these and delete the rest.
+
+Documentation is available on [feature gate lifecycle] and expectations, as
+well as the [existing list] of feature gates.
+
+[feature gate lifecycle]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
+[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
+-->
+
+- [X] Other
+    - Describe the mechanism:
+    - Will enabling / disabling the feature require downtime of the control
+      plane? Yes (kubelet restart)
+    - Will enabling / disabling the feature require downtime or reprovisioning
+      of a node? Yes
+
+###### Does enabling the feature change any default behavior?
+No
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+Yes
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+
+###### Are there any tests for feature enablement/disablement?
+Add UTs.
+
+### Rollout, Upgrade and Rollback Planning
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+No identified risk.
+
+###### What specific metrics should inform a rollback?
+No identified risk.
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+e2e tests covered.
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+No
+
+### Monitoring Requirements
+
+<!--
+This section must be completed when targeting beta to a release.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+-->
+
+###### How can an operator determine if the feature is in use by workloads?
+Emit cleanup logs.
+
+###### How can someone using this feature know that it is working for their instance?
+Yes, from logs.
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+Na
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+NA
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+NA
+
+### Dependencies
+
+<!--
+This section must be completed when targeting beta to a release.
+-->
+
+###### Does this feature depend on any specific services running in the cluster?
+No
+
+### Scalability
+
+###### Will enabling / using this feature result in any new API calls?
+No
+
+###### Will enabling / using this feature result in introducing new API types?
+No
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+<!--
+Describe them, providing:
+  - Which API(s):
+  - Estimated increase:
+-->
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+No
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+No
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+ContainerLogManager of kubelet will use more CPU cycle then now.
+
+###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
+No
+
+### Troubleshooting
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+###### What are other known failure modes?
+NA
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+NA
+
+## Implementation History
+NA
+
+## Drawbacks
+No identified drawbacks.
\ No newline at end of file
diff --git a/keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml
new file mode 100644
index 00000000000..65dd74b43a0
--- /dev/null
+++ b/keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml
@@ -0,0 +1,15 @@
+title: Log rotate on Disk pressure
+kep-number: TBD
+authors:
+  - "@Zeel-Patel"
+  - "@rishabh325"
+owning-sig: sig-node
+status: provisional
+editor: "@Zeel-Patel"
+creation-date: 2025-01-08
+last-updated: 2025-01-08
+reviewers:
+  - TBD
+approvers:
+  - TBD
+latest-milestone: TBD

From a4a1305d88eddbe05d2190fc1f618054d04a1fff Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Wed, 8 Jan 2025 15:20:55 +0530
Subject: [PATCH 02/13] Change KEP number to k8s issue number

---
 .../README.md                                                   | 2 +-
 .../kep.yaml                                                    | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename keps/sig-node/{4819-log-rotate-on-disk-pressure => 129447-log-rotate-on-disk-pressure}/README.md (99%)
 rename keps/sig-node/{4819-log-rotate-on-disk-pressure => 129447-log-rotate-on-disk-pressure}/kep.yaml (100%)

diff --git a/keps/sig-node/4819-log-rotate-on-disk-pressure/README.md b/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
similarity index 99%
rename from keps/sig-node/4819-log-rotate-on-disk-pressure/README.md
rename to keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
index 28f2d8ce079..bcfbf6b4137 100644
--- a/keps/sig-node/4819-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
@@ -58,7 +58,7 @@ If none of those approvers are still appropriate, then changes to that list
 should be approved by the remaining approvers and/or the owning SIG (or
 SIG Architecture for cross-cutting KEPs).
 -->
-# KEP-4819: Container log rotation on Disk perssure
+# KEP-129447: Container log rotation on Disk perssure
 
 <!--
 This is the title of your KEP. Keep it short, simple, and descriptive. A good
diff --git a/keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml
similarity index 100%
rename from keps/sig-node/4819-log-rotate-on-disk-pressure/kep.yaml
rename to keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml

From a0171522d438f1af83089c5d199e42d9754e0765 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Thu, 9 Jan 2025 22:41:47 +0530
Subject: [PATCH 03/13] address comments

---
 .../README.md                                 | 154 +-----------------
 .../kep.yaml                                  |   4 +-
 2 files changed, 6 insertions(+), 152 deletions(-)

diff --git a/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md b/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
index bcfbf6b4137..3e2a8837813 100644
--- a/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
@@ -1,82 +1,5 @@
-<!--
-**Note:** When your KEP is complete, all of these comment blocks should be removed.
-
-To get started with this template:
-
-- [ ] **Pick a hosting SIG.**
-  Make sure that the problem space is something the SIG is interested in taking
-  up. KEPs should not be checked in without a sponsoring SIG.
-- [ ] **Create an issue in kubernetes/enhancements**
-  When filing an enhancement tracking issue, please make sure to complete all
-  fields in that template. One of the fields asks for a link to the KEP. You
-  can leave that blank until this KEP is filed, and then go back to the
-  enhancement and add the link.
-- [ ] **Make a copy of this template directory.**
-  Copy this template into the owning SIG's directory and name it
-  `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
-  leading-zero padding) assigned to your enhancement above.
-- [ ] **Fill out as much of the kep.yaml file as you can.**
-  At minimum, you should fill in the "Title", "Authors", "Owning-sig",
-  "Status", and date-related fields.
-- [ ] **Fill out this file as best you can.**
-  At minimum, you should fill in the "Summary" and "Motivation" sections.
-  These should be easy if you've preflighted the idea of the KEP with the
-  appropriate SIG(s).
-- [ ] **Create a PR for this KEP.**
-  Assign it to people in the SIG who are sponsoring this process.
-- [ ] **Merge early and iterate.**
-  Avoid getting hung up on specific details and instead aim to get the goals of
-  the KEP clarified and merged quickly. The best way to do this is to just
-  start with the high-level sections and fill out details incrementally in
-  subsequent PRs.
-
-Just because a KEP is merged does not mean it is complete or approved. Any KEP
-marked as `provisional` is a working document and subject to change. You can
-denote sections that are under active debate as follows:
-
-```
-<<[UNRESOLVED optional short context or usernames ]>>
-Stuff that is being argued.
-<<[/UNRESOLVED]>>
-```
-
-When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
-focused. If you disagree with what is already in a document, open a new PR
-with suggested changes.
-
-One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
-You do not need a new KEP to move from beta to GA, for example. If
-new details emerge that belong in the KEP, edit the KEP. Once a feature has become
-"implemented", major changes should get new KEPs.
-
-The canonical place for the latest set of instructions (and the likely source
-of this file) is [here](/keps/NNNN-kep-template/README.md).
-
-**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
-it is marked `implementable`, must be approved by each of the KEP approvers.
-If none of those approvers are still appropriate, then changes to that list
-should be approved by the remaining approvers and/or the owning SIG (or
-SIG Architecture for cross-cutting KEPs).
--->
 # KEP-129447: Container log rotation on Disk perssure
 
-<!--
-This is the title of your KEP. Keep it short, simple, and descriptive. A good
-title can help communicate what the KEP is and should be considered as part of
-any review.
--->
-
-<!--
-A table of contents is helpful for quickly jumping to sections of a KEP and for
-highlighting any additional information provided beyond the standard KEP
-template.
-
-Ensure the TOC is wrapped with
-  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
-tags, and then generate with `hack/update-toc.sh`.
--->
-
-<!-- toc -->
 - [Release Signoff Checklist](#release-signoff-checklist)
 - [Summary](#summary)
 - [Motivation](#motivation)
@@ -108,24 +31,9 @@ tags, and then generate with `hack/update-toc.sh`.
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
 - [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
-<!-- /toc -->
 
 ## Release Signoff Checklist
 
-<!--
-**ACTION REQUIRED:** In order to merge code into a release, there must be an
-issue in [kubernetes/enhancements] referencing this KEP and targeting a release
-milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
-of the targeted release**.
-
-For enhancements that make changes to code or processes/procedures in core
-Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
-Signoff checklist to be completed.
-
-Check these off as they are completed for the Release Team to track. These
-checklist items _must_ be updated for the enhancement to be released.
--->
-
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
 - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
@@ -143,9 +51,6 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
 - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
 
-<!--
-**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
--->
 
 [kubernetes.io]: https://kubernetes.io/
 [kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -160,8 +65,7 @@ Rotate containers logs when there is disk pressure on kubelet host.
 
 ### What would you like to be added?
 
-The [ContainerLogManager](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers in case of disk pressure on host.
-
+(https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers that has exceeded the configured log retention quota in case of disk pressure on host.
 ### Why is this needed?
 
 It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.
@@ -262,19 +166,10 @@ If the pod had been generating logs in Gigabytes with minimal delay, it can caus
 
 ### Goals
 
-- Rotate and Clean all container logs on kubelet Disk pressure
+- Rotate and Clean all container logs on kubelet Disk pressure that has exceeded the configured log retention quota
 
 ## Proposal
 
-<!--
-This is where we get down to the specifics of what the proposal actually is.
-This should have enough detail that reviewers can understand exactly what
-you're proposing, but should not include things like API designs or
-implementation. What is the desired outcome and how do we measure success?.
-The "Design Details" section below is for the real
-nitty-gritty.
--->
-
 
 ### Risks and Mitigations
 
@@ -289,28 +184,12 @@ Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold
 
 ### Test Plan
 
-<!--
-**Note:** *Not required until targeted at a release.*
-The goal is to ensure that we don't accept enhancements with inadequate testing.
-
-All code is expected to have adequate tests (eventually with coverage
-expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
-when drafting this test plan.
-
-[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
--->
-
 [X] I/we understand the owners of the involved components may require updates to
 existing tests to make this code solid enough prior to committing the changes necessary
 to implement this enhancement.
 
 ##### Prerequisite testing updates
 
-<!--
-Based on reviewers feedback describe what additional tests need to be added prior
-implementing this enhancement to ensure the enhancements have also solid foundations.
--->
-
 ##### Unit tests
 - Add detailed unit tests with 100% coverage.
 - `<package>`: `<date>` - `<test coverage>`
@@ -330,16 +209,6 @@ implementing this enhancement to ensure the enhancements have also solid foundat
 
 ###### How can this feature be enabled / disabled in a live cluster?
 
-<!--
-Pick one of these and delete the rest.
-
-Documentation is available on [feature gate lifecycle] and expectations, as
-well as the [existing list] of feature gates.
-
-[feature gate lifecycle]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
-[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
--->
-
 - [X] Other
     - Describe the mechanism:
     - Will enabling / disabling the feature require downtime of the control
@@ -375,13 +244,6 @@ No
 
 ### Monitoring Requirements
 
-<!--
-This section must be completed when targeting beta to a release.
-
-For GA, this section is required: approvers should be able to confirm the
-previous answers based on experience in the field.
--->
-
 ###### How can an operator determine if the feature is in use by workloads?
 Emit cleanup logs.
 
@@ -399,10 +261,6 @@ NA
 
 ### Dependencies
 
-<!--
-This section must be completed when targeting beta to a release.
--->
-
 ###### Does this feature depend on any specific services running in the cluster?
 No
 
@@ -416,12 +274,6 @@ No
 
 ###### Will enabling / using this feature result in any new calls to the cloud provider?
 
-<!--
-Describe them, providing:
-  - Which API(s):
-  - Estimated increase:
--->
-
 ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
 No
 
@@ -429,7 +281,7 @@ No
 No
 
 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
-ContainerLogManager of kubelet will use more CPU cycle then now.
+CPU cycles usage of ContainerLogManager of kubelet will increase.
 
 ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
 No
diff --git a/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml
index 65dd74b43a0..005811b3fe0 100644
--- a/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml
+++ b/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml
@@ -9,7 +9,9 @@ editor: "@Zeel-Patel"
 creation-date: 2025-01-08
 last-updated: 2025-01-08
 reviewers:
-  - TBD
+  - "@kannon92"
+  - "@ffromani"
+  - "@harshanarayana"
 approvers:
   - TBD
 latest-milestone: TBD

From f390c3eb2372ba4e1fb5929e32a40b1c905127df Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Thu, 9 Jan 2025 23:00:21 +0530
Subject: [PATCH 04/13] Link k/e issue in KEP

---
 .../README.md                                                   | 2 +-
 .../kep.yaml                                                    | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename keps/sig-node/{129447-log-rotate-on-disk-pressure => 5032-log-rotate-on-disk-pressure}/README.md (99%)
 rename keps/sig-node/{129447-log-rotate-on-disk-pressure => 5032-log-rotate-on-disk-pressure}/kep.yaml (100%)

diff --git a/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
similarity index 99%
rename from keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
rename to keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
index 3e2a8837813..2f378d6a0f8 100644
--- a/keps/sig-node/129447-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
@@ -1,4 +1,4 @@
-# KEP-129447: Container log rotation on Disk perssure
+# KEP-5032: Container log rotation on Disk perssure
 
 - [Release Signoff Checklist](#release-signoff-checklist)
 - [Summary](#summary)
diff --git a/keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml
similarity index 100%
rename from keps/sig-node/129447-log-rotate-on-disk-pressure/kep.yaml
rename to keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml

From aeee67c7b7b85e97202bd85d63072bfa62517b31 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Fri, 10 Jan 2025 01:20:39 +0530
Subject: [PATCH 05/13] add motivation

---
 keps/sig-node/5032-log-rotate-on-disk-pressure/README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
index 2f378d6a0f8..e40cf0814ca 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
@@ -63,6 +63,8 @@ Rotate containers logs when there is disk pressure on kubelet host.
 
 ## Motivation
 
+- A lot of out kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us.
+
 ### What would you like to be added?
 
 (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers that has exceeded the configured log retention quota in case of disk pressure on host.

From a2cc4bdfadf7b2cde07af838fc06e3538cf496bf Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Fri, 10 Jan 2025 22:59:54 +0530
Subject: [PATCH 06/13] address comments

---
 keps/sig-node/5032-log-rotate-on-disk-pressure/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
index e40cf0814ca..a545e855369 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
@@ -175,7 +175,7 @@ If the pod had been generating logs in Gigabytes with minimal delay, it can caus
 
 ### Risks and Mitigations
 
-No identified risk.
+Risk of tmp copy creation of log failing as there is no disk space left.  
 
 ## Design Details
 
@@ -216,7 +216,7 @@ to implement this enhancement.
     - Will enabling / disabling the feature require downtime of the control
       plane? Yes (kubelet restart)
     - Will enabling / disabling the feature require downtime or reprovisioning
-      of a node? Yes
+      of a node? No, restart of kubelet with updated configurations and version should work.
 
 ###### Does enabling the feature change any default behavior?
 No

From 85f8a59722c68bd3bee416b57b9de49b519a0768 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel.patel@uber.com>
Date: Mon, 13 Jan 2025 10:53:09 +0530
Subject: [PATCH 07/13] address comments

---
 .../README.md                                 | 37 ++++++++++++-------
 .../5032-log-rotate-on-disk-pressure/kep.yaml |  2 +-
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
index a545e855369..1424d7ac714 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
@@ -59,15 +59,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 
 ## Summary
 
-Rotate containers logs when there is disk pressure on kubelet host.
+Clean and Rotate containers logs when there is disk pressure on kubelet host.
 
 ## Motivation
 
-- A lot of out kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us.
+- We manage kubernetes ecosystem at our organization. A lot of our kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us.
 
 ### What would you like to be added?
 
-(https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers that has exceeded the configured log retention quota in case of disk pressure on host.
+Log cleanup should be another form of eviction to make space like we do with Images and containers.
+
 ### Why is this needed?
 
 It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.
@@ -168,10 +169,10 @@ If the pod had been generating logs in Gigabytes with minimal delay, it can caus
 
 ### Goals
 
-- Rotate and Clean all container logs on kubelet Disk pressure that has exceeded the configured log retention quota
+- Rotate and Clean all container logs on kubelet Disk pressure that has exceeded the configured log retention quota.
 
 ## Proposal
-
+- On disk pressure, analyse log paths of all containers. Logs of containers with existing combined log size exceeding `containerLogMaxSize`*`containerLogMaxFiles` should be deleted till the combined log size is within `containerLogMaxSize`*`containerLogMaxFiles`.
 
 ### Risks and Mitigations
 
@@ -179,10 +180,9 @@ Risk of tmp copy creation of log failing as there is no disk space left.
 
 ## Design Details
 
-Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.
+- When there is disk pressure in nodefs.available/imagefs.available, ContainerLogManager's rotateLogs (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60) should be added to list of functions to be run on disk pressure. (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/eviction/helpers.go#L1195-#L1230)
+- And container logs exceeding the kubelet config set for log limit should be deleted. On disk pressure, analyse log paths of all containers. Logs of containers with existing combined log size exceeding `containerLogMaxSize`*`containerLogMaxFiles` should be deleted till the combined log size is within `containerLogMaxSize`*`containerLogMaxFiles`.
 
-- `logRotateDiskCheckInterval` is the time interval within which the ContainerLogManager will check Disk usage on the kubelet host.
-- `logRotateDiskPressureThreshold` is the threshold of overall Disk usage on the kubelet. If actual Disk usage is equal or more than this threshold, it will rotate logs of all the containers of the kubelet.
 
 ### Test Plan
 
@@ -200,10 +200,10 @@ to implement this enhancement.
 - Scenarios will be covered in e2e tests. 
 
 ##### e2e tests
-- Set very high value for `containerLogMaxSize` and `containerLogMaxFiles` to disable periodic log rotation.
-- Add test under `kubernetes/test/e2e_node/container_log_rotation_test.go`.
-- Set very low values for  `logRotateDiskCheckInterval` and `logRotateDiskPressureThreshold`. Create a pod with generating heavy logs and expect the container logs to be rotated after `logRotateDiskCheckInterval` and Disk usage not going more than `logRotateDiskPressureThreshold`.
-
+- Add test under `kubernetes/test/e2e_node`.
+- Set low value for `containerLogMaxSize` and `containerLogMaxFiles`.
+- Create a pod with generating heavy logs and expect the container's combined log size to bw within `containerLogMaxSize`*`containerLogMaxFiles`.
+- 
 ### Graduation Criteria
 
 **Note:** *Not required until targeted at a release.*
@@ -302,4 +302,15 @@ NA
 NA
 
 ## Drawbacks
-No identified drawbacks.
\ No newline at end of file
+No identified drawbacks.
+
+## Alternatives
+
+### Only Rotation on disk pressure
+Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.
+
+- `logRotateDiskCheckInterval` is the time interval within which the ContainerLogManager will check Disk usage on the kubelet host.
+- `logRotateDiskPressureThreshold` is the threshold of overall Disk usage on the kubelet. If actual Disk usage is equal or more than this threshold, it will rotate logs of all the containers of the kubelet.
+
+### DaemonSet to cleanup logs
+Provide a means for an external tool to trigger the kubelet to rotate its logs. That would move the policy decisions outside of the kubelet, for example, into a DaemonSet.
\ No newline at end of file
diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml
index 005811b3fe0..db887f0e1f6 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml
@@ -1,5 +1,5 @@
 title: Log rotate on Disk pressure
-kep-number: TBD
+kep-number: 5032
 authors:
   - "@Zeel-Patel"
   - "@rishabh325"

From 0031c389a408e4cdc9b4561269d4d09cff1979fc Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Mon, 13 Jan 2025 11:39:39 +0530
Subject: [PATCH 08/13] update toc

---
 .../README.md                                 | 44 +++++++++----------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
index 1424d7ac714..568db3657f4 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
@@ -1,36 +1,34 @@
 # KEP-5032: Container log rotation on Disk perssure
 
+<!-- toc -->
 - [Release Signoff Checklist](#release-signoff-checklist)
 - [Summary](#summary)
 - [Motivation](#motivation)
-    - [Goals](#goals)
-    - [Non-Goals](#non-goals)
+  - [What would you like to be added?](#what-would-you-like-to-be-added)
+  - [Why is this needed?](#why-is-this-needed)
+  - [Spec 1](#spec-1)
+  - [Spec 2](#spec-2)
+  - [Goals](#goals)
 - [Proposal](#proposal)
-    - [User Stories (Optional)](#user-stories-optional)
-        - [Story 1](#story-1)
-        - [Story 2](#story-2)
-    - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
-    - [Risks and Mitigations](#risks-and-mitigations)
+  - [Risks and Mitigations](#risks-and-mitigations)
 - [Design Details](#design-details)
-    - [Test Plan](#test-plan)
-        - [Prerequisite testing updates](#prerequisite-testing-updates)
-        - [Unit tests](#unit-tests)
-        - [Integration tests](#integration-tests)
-        - [e2e tests](#e2e-tests)
-    - [Graduation Criteria](#graduation-criteria)
-    - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
-    - [Version Skew Strategy](#version-skew-strategy)
-- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
-    - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
-    - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
-    - [Monitoring Requirements](#monitoring-requirements)
-    - [Dependencies](#dependencies)
-    - [Scalability](#scalability)
-    - [Troubleshooting](#troubleshooting)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+      - [Unit tests](#unit-tests)
+      - [Integration tests](#integration-tests)
+      - [e2e tests](#e2e-tests)
+  - [Graduation Criteria](#graduation-criteria)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
 - [Implementation History](#implementation-history)
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
-- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
+  - [Only Rotation on disk pressure](#only-rotation-on-disk-pressure)
+  - [DaemonSet to cleanup logs](#daemonset-to-cleanup-logs)
+<!-- /toc -->
 
 ## Release Signoff Checklist
 

From 46dfaafda768228565627fd4c4f784b00fe62f39 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Thu, 23 Jan 2025 12:35:45 +0530
Subject: [PATCH 09/13] rename KEP

---
 .../README.md                                                   | 2 +-
 .../kep.yaml                                                    | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename keps/sig-node/{5032-log-rotate-on-disk-pressure => 5032-log-eviction-on-disk-pressure}/README.md (99%)
 rename keps/sig-node/{5032-log-rotate-on-disk-pressure => 5032-log-eviction-on-disk-pressure}/kep.yaml (100%)

diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
similarity index 99%
rename from keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
rename to keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
index 568db3657f4..0ca9ad94faf 100644
--- a/keps/sig-node/5032-log-rotate-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
@@ -1,4 +1,4 @@
-# KEP-5032: Container log rotation on Disk perssure
+# KEP-5032: Container log eviction on Disk perssure
 
 <!-- toc -->
 - [Release Signoff Checklist](#release-signoff-checklist)
diff --git a/keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml b/keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml
similarity index 100%
rename from keps/sig-node/5032-log-rotate-on-disk-pressure/kep.yaml
rename to keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml

From 33b53ffa2bef608e0417c9705684cf1a604aec19 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Fri, 31 Jan 2025 11:50:32 +0530
Subject: [PATCH 10/13] update as per commit

---
 .../5032-log-eviction-on-disk-pressure/README.md       | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
index 0ca9ad94faf..caf65e8f60f 100644
--- a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
@@ -1,4 +1,4 @@
-# KEP-5032: Container log eviction on Disk perssure
+# KEP-5032: Container log Split and Rotate to avoid Disk perssure
 
 <!-- toc -->
 - [Release Signoff Checklist](#release-signoff-checklist)
@@ -57,7 +57,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 
 ## Summary
 
-Clean and Rotate containers logs when there is disk pressure on kubelet host.
+Split, Clean and Rotate containers logs tp avoid disk pressure on kubelet host.
 
 ## Motivation
 
@@ -65,7 +65,7 @@ Clean and Rotate containers logs when there is disk pressure on kubelet host.
 
 ### What would you like to be added?
 
-Log cleanup should be another form of eviction to make space like we do with Images and containers.
+We expect that the log file size is always under limit which can help such disk pressure issues in the future.
 
 ### Why is this needed?
 
@@ -167,10 +167,10 @@ If the pod had been generating logs in Gigabytes with minimal delay, it can caus
 
 ### Goals
 
-- Rotate and Clean all container logs on kubelet Disk pressure that has exceeded the configured log retention quota.
+- Split large log files in size containerLogMaxSize, Rotate and Clean them.
 
 ## Proposal
-- On disk pressure, analyse log paths of all containers. Logs of containers with existing combined log size exceeding `containerLogMaxSize`*`containerLogMaxFiles` should be deleted till the combined log size is within `containerLogMaxSize`*`containerLogMaxFiles`.
+- The container log rotation working now shpuld work as is, but it will ensure that before rotating file, it is under the size limit set..
 
 ### Risks and Mitigations
 

From 6b5e77cd606ffbf6bafa8e14eccdf04f8c236569 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Thu, 6 Feb 2025 23:34:36 +0530
Subject: [PATCH 11/13] address comments

---
 .../README.md                                 | 28 +++++++++++--------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
index caf65e8f60f..98299562155 100644
--- a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
@@ -57,15 +57,15 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 
 ## Summary
 
-Split, Clean and Rotate containers logs tp avoid disk pressure on kubelet host.
+Split, Clean and Rotate container logs to avoid disk pressure on kubelet host.
 
 ## Motivation
 
-- We manage kubernetes ecosystem at our organization. A lot of our kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us.
+- We manage kubernetes ecosystem at our organization. A lot of our kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib logs generated in 15 minutes. We had kubelet configs containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files (tar log files of pods) were of size around 500-600Gib. We observed that container log rotation was slow for us.
 
 ### What would you like to be added?
 
-We expect that the log file size is always under limit which can help such disk pressure issues in the future.
+We expect that the log file size is always under set kubelet config limit (i.e., containerLogMaxSize) which can help such disk pressure issues in the future.
 
 ### Why is this needed?
 
@@ -163,24 +163,28 @@ File sizes
 ```
 
 
-If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on kubelet host and that can affect other pods running in the same kubelets.
+If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on kubelet host and that can affect other pods running in the same kubelet.
 
 ### Goals
 
-- Split large log files in size containerLogMaxSize, Rotate and Clean them.
+- There is a ContainerLogManager in every kubelet. It runs an infinite go routine and checks active log file(on which all container log read write happens) size. If that exceeds the above mentioned limit (containerLogMaxSize), it starts parallel workers. Each worker creates a tar of the active file, deletes tars till there are containerLogMaxFiles files in total and creates a new active file for container.
+- Goal is to Split large active log file in size containerLogMaxSize, and then do the rest of the operations done by containerLogManager.
 
 ## Proposal
-- The container log rotation working now shpuld work as is, but it will ensure that before rotating file, it is under the size limit set..
+- The container log rotation working now shpuld work as is, but it will ensure that before rotating file, it is under the size limit set. This way, every tar present for a container under host will surely be under containerLogMaxSize. This can avoid disk pressure on the host.
 
 ### Risks and Mitigations
 
-Risk of tmp copy creation of log failing as there is no disk space left.  
+Do not see any risk as of now.
 
 ## Design Details
 
-- When there is disk pressure in nodefs.available/imagefs.available, ContainerLogManager's rotateLogs (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60) should be added to list of functions to be run on disk pressure. (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/eviction/helpers.go#L1195-#L1230)
-- And container logs exceeding the kubelet config set for log limit should be deleted. On disk pressure, analyse log paths of all containers. Logs of containers with existing combined log size exceeding `containerLogMaxSize`*`containerLogMaxFiles` should be deleted till the combined log size is within `containerLogMaxSize`*`containerLogMaxFiles`.
-
+1. Implement a new function (splitAndRotateLatestLog) to be called by rotateLog function (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L235-L257)
+2. The rotateLog is being called by each worker for the container assigned to it.
+3. It does cleanup to delete all non-rotated (rotated logs have time suffix in name) and .tmp files generated (and if not deleted) in last log rotation.
+4. Then it rotates un rotated files (it does not rotate the active file) and deletes oldest rotated files till containerLogMaxFiles-2 files are left. This is because the non rotated active file be rotated and new active file will be created. Which will add upto containerLogMaxFiles.
+5. Before doing this exact above step, in the new design, it will split the large active log file in size of containerLogMaxSize and name them <active-log-file-name>.part<i>.
+6. Let's say it created n parts, after this, it can do rotate of n-1 parts, keeping last nth part active and do delete. (Basically step 4)
 
 ### Test Plan
 
@@ -200,8 +204,8 @@ to implement this enhancement.
 ##### e2e tests
 - Add test under `kubernetes/test/e2e_node`.
 - Set low value for `containerLogMaxSize` and `containerLogMaxFiles`.
-- Create a pod with generating heavy logs and expect the container's combined log size to bw within `containerLogMaxSize`*`containerLogMaxFiles`.
-- 
+- Create a pod with generating heavy logs and expect the container's combined log size to be within `containerLogMaxSize`*`containerLogMaxFiles`.
+
 ### Graduation Criteria
 
 **Note:** *Not required until targeted at a release.*

From c8787c336a8a5782f85e8d8c4bd13cf2b26e3aaf Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Sat, 8 Feb 2025 02:38:38 +0530
Subject: [PATCH 12/13] fix formattings

---
 .../5032-log-eviction-on-disk-pressure/README.md      | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
index 98299562155..6c3edaf76b2 100644
--- a/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
+++ b/keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
@@ -179,12 +179,13 @@ Do not see any risk as of now.
 
 ## Design Details
 
-1. Implement a new function (splitAndRotateLatestLog) to be called by rotateLog function (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L235-L257)
+1. Implement a new function (splitAndRotateLatestLog) to be called by rotateLog function (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L313-L346)
 2. The rotateLog is being called by each worker for the container assigned to it.
-3. It does cleanup to delete all non-rotated (rotated logs have time suffix in name) and .tmp files generated (and if not deleted) in last log rotation.
-4. Then it rotates un rotated files (it does not rotate the active file) and deletes oldest rotated files till containerLogMaxFiles-2 files are left. This is because the non rotated active file be rotated and new active file will be created. Which will add upto containerLogMaxFiles.
-5. Before doing this exact above step, in the new design, it will split the large active log file in size of containerLogMaxSize and name them <active-log-file-name>.part<i>.
-6. Let's say it created n parts, after this, it can do rotate of n-1 parts, keeping last nth part active and do delete. (Basically step 4)
+3. It does cleanup to delete all original files for which compressed files are present and .tmp files generated (and if not deleted) in last log rotation.
+4. It then deletes oldest rotated files till containerLogMaxFiles-2 files are left. This is because the non rotated active file will be rotated and new active file will be created. Which will add upto containerLogMaxFiles.
+5. Then it compresses un compressed files (it does not compress the active file) and then rotates active file.
+6. Before doing step 4, in the new design, it will split the large active log file in size of containerLogMaxSize and name them \<active-log-file-name\>.part\<i\>. And rotate parts.
+7. Let's say it created n parts, after this, it can do rotate of n-1 parts, keeping last nth part active and do delete+compress. (Basically step 4 and 5)
 
 ### Test Plan
 

From ff34ee633ea8b449c5c207152a201c2bad9d0b31 Mon Sep 17 00:00:00 2001
From: Zeel-Patel <zeel1401patel@gmail.com>
Date: Mon, 10 Feb 2025 19:11:49 +0530
Subject: [PATCH 13/13] add reviewer

---
 keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml b/keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml
index db887f0e1f6..4360d53f965 100644
--- a/keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml
+++ b/keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml
@@ -12,6 +12,7 @@ reviewers:
   - "@kannon92"
   - "@ffromani"
   - "@harshanarayana"
+  - "@leonzz"
 approvers:
   - TBD
 latest-milestone: TBD