Skip to content

michalschott/kubernetes-resources-cpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kubernetes CPU Throttling and Resource Management Demonstration

This project is designed to empirically demonstrate the adverse effects of CPU limits in Kubernetes by analyzing cgroup throttling statistics when Pods are subjected to heavy, sustained load.

We deploy an identical, CPU-intensive server application into a Kind cluster across four distinct Pod configurations and use a dedicated client to flood them with requests, observing the resulting throttling.


🧭 Table of Contents


🚀 Project Goal

This project is designed to use specific cgroup metrics to prove why setting CPU limits is detrimental to application performance and should generally be avoided, aligning with best practices that suggest allowing Pods to burst when extra node capacity is available.

We will visualize and quantify the performance characteristics and resource consumption using highly specific resource utilization data and cgroup throttling statistics collected directly from the server Pods.

Here's a visual representation of the problem CPU limits can cause: CPU Limits Throttling

The four key resource configurations being tested are:

  1. No Limits or Requests: (The worst case for stability)
  2. Request Only: (The recommended configuration for maximum burst performance)
  3. Limit Only: (Hard-capped performance)
  4. Request equals Limit: (Hard-capped, but with resource guarantee)

📊 Key Server Metrics

To precisely demonstrate CPU throttling, the Go server application exposes specific resource utilization and cgroup metrics in its response logs. These metrics are crucial for quantifying the difference between available CPU time and actual CPU time consumed (due to throttling).

Metric Key Description Relevance to Project Goal
rusage_user_seconds The amount of time spent by the process in user mode (CPU time). Measures actual work done by the application logic.
rusage_system_seconds The amount of time spent by the process in kernel mode. Measures time spent by the OS on behalf of the application (e.g., I/O).
num_of_cores Total CPU consumption in cores (calculated as (user + system) / elapsed_seconds). Shows the average CPU cores consumed by the process, directly revealing throttling effects.
cgroup.nr_periods_delta Total number of scheduling periods elapsed. Baseline for cgroup activity.
cgroup.nr_throttled_delta The number of times the process was throttled (i.e., paused due to hitting a CPU limit). CRITICAL: Direct count of throttling incidents. Expected to be high for Pods with CPU limits.
cgroup.throttled_time_ns_delta Cumulative time (in nanoseconds) the process has been throttled. CRITICAL: The total time the application was suspended due to the CPU limit.
cgroup.cpu_limit The detected CPU limit value, expressed in cores (if quota/period are set). Confirms the limit applied by Kubernetes/cgroups.

📦 Directory Structure

Directory Purpose Key Files
server/ The Go application that simulates a CPU-intensive workload. Dockerfile, main.go
client/ The Go application used as a load generator to bombard the server pods with requests. Dockerfile, main.go
deploy/ The Kubernetes manifests, configured via Kustomize, to deploy the four distinct server Pods and the client load balancer. kustomization.yaml, deployments.yaml

🧪 The Four Server Deployment Cases

The deploy/ manifests create four distinct server Deployments, each exposing a separate Service, to test the different resource configurations:

Case Configuration (resources.limits / resources.requests) CPU Limit Status Expected Behavior Under Stress
1. No Requests/Limits No requests, No limits. No Limit Unpredictable performance; first to be evicted under resource pressure. Can consume all available CPU.
2. Request Only requests.cpu set, limits.cpu unset. No Limit Receives guaranteed baseline CPU share. Can burst beyond the request to use all available CPU capacity on the node.
3. Limit Only requests.cpu unset, limits.cpu set. Hard-Capped No guaranteed CPU share, but hard-capped at the limit value. Will experience throttling immediately upon hitting the cap.
4. Request equals Limit requests.cpu = limits.cpu (e.g., 500m). Hard-Capped Receives a fixed CPU allocation, but is hard-capped and will be throttled when it attempts to use more than the limit.

🛠️ Setup and Execution

Prerequisites

You must have the following tools installed and available in your PATH:

  • Docker (or compatible container runtime)
  • kind (Kubernetes IN Docker)
  • kubectl (Kubernetes command-line tool)
  • make (or run the commands manually)

Installation Steps

  1. Clone the Repository:

    git clone [Your Repository URL]
    cd kubernetes-resources-cpu
  2. Build, Deploy, and Run: Use the provided Makefile to handle all steps, including tool checks, image builds, Kind cluster creation, metrics server deployment, and application deployment.

    make deploy

    Note: The make deploy target includes a check to ensure kind, kubectl, and docker are installed.

  3. Verify Cluster Status: Ensure all four server deployments and the client deployment are ready.

    kubectl get pods -n stresstest
  4. Run the Load Test: The client service will automatically begin sending traffic to the four server services.


👀 Analysis and Observation

Once the load test is running, use the following commands and focus on the cgroup metrics to observe the resource utilization and throttling:

  1. Monitor Pod CPU Utilization: Use kubectl top to observe the CPU usage reported by the Kubelet. Note that this value often reflects the throttled consumption, not the demand.

    kubectl top pods -n stresstest
  2. Monitor Node CPU Utilization: Check the total resource pressure on the node itself. When the node is fully saturated, throttling will become most pronounced on the limited Pods.

    kubectl top node
  3. Check Client Logs for Throttling Metrics: The client pod collects and displays the detailed server-side performance data. The logs are the primary source for the throttling data. Compare the cgroup.nr_throttled_delta and cgroup.throttled_time_ns_delta between the Pods with limits (Case 3 and 4) and the Pods without limits (Case 1 and 2).

    kubectl logs -f -n stresstest -l app=client
  4. Grafana: This step focuses on using Grafana to visualize how the Linux scheduler distributes available CPU capacity when Pods are terminated. First, expose the Grafana service port to access the dashboards from your local machine.

    kubectl port-forward -n stresstest svc/prometheus-grafana 8080:80
    • URL: Grafana
    • Credentials:
      • username: admin
      • password: prom-operator

    Once logged in, open the following dashboards:

    Observe the usage lines: Pods without limits (server-1 and server-2) should be collectively consuming the vast majority of the idle CPU on the node.

    Remove the Request Only Pod (server-2). Since this Pod had no limit, it was already bursting to consume free CPU.

    kubectl delete pod -n stresstest server-2

    Observe the freed CPU capacity on the Node dashboard. The other Pods (especially server-1) should immediately absorb the released capacity and increase their CPU usage, demonstrating the CFS proportional sharing and its ability to burst.

    Remove No Requests/Limits Pod (server-1). This is the final Pod that had no limits and was consuming free capacity.

    kubectl delete pod -n stresstest server-1

    Observe the remaining free CPU on the Node dashboard. The majority of this capacity will now be unclaimed and unused, as the remaining limited Pods (server-3 and server-4) cannot use it due to their hard quotas.

    💡 Conclusion: The CPU capacity released by the bursting Pods (server-1 and server-2) is redistributed among other bursting Pods. The capacity released by the last bursting Pod (server-1) remains unused because the remaining limited Pods (server-3 and server-4) are forbidden from touching it, resulting in wasted, paid-for compute resources.


🗑️ Cleanup

To remove the Kubernetes cluster and clean up all deployed resources:

make clean

💻 Linux CPU Scheduler and Time Slices

The core mechanism governing how container workloads receive CPU time is the Linux kernel's Completely Fair Scheduler (CFS). The CFS is an implementation of a proportional share scheduler, meaning it aims to give every runnable task (thread) a "fair" and equal share of the CPU time, unless weights are applied. This fairness is achieved by allocating CPU time in short intervals called time slices or quantums.

When a task is ready to run, the CFS places it into a red-black tree (a data structure used for efficient scheduling) and tracks its accumulated runtime, or vruntime (virtual runtime). The scheduler constantly picks the task with the lowest vruntime to run next. When a task runs for its allocated time slice, its vruntime increases, pushing it further down the priority list and allowing another task to run. This continuous, rapid switching gives the appearance of simultaneous execution.

In the context of Kubernetes CPU Requests, the proportional share is modified: a higher CPU request translates into a greater scheduling weight (cpu.shares or cpu.weight), which effectively grants the container more frequent or longer time slices, ensuring it receives its guaranteed share of CPU time, especially under contention.

This time-slice mechanism ensures that setting a CPU Limit for a container means its total CPU consumption (the sum of its time slices) within a given period (e.g., 100ms) can never exceed its hard-capped quota. Once the quota is used up, the container is forcibly suspended until the next period begins, which is precisely what is measured by your cgroup.throttled_time_ns_delta metric.


⚖️ How CPU Limits Cause Throttling: The CFS Bandwidth Control

To explain how CPU Limits cause the throttling, we need to focus on the Linux kernel mechanism called CFS Bandwidth Control. This is a separate cgroup feature from the proportional sharing used for CPU Requests.

The CFS Bandwidth Parameters

A Kubernetes CPU Limit is enforced by the kernel using two specific parameters within the container's control group:

  • cpu.cfs_period_us (CFS Period): This is a fixed time window, usually set to 100,000 microseconds (100ms). This is the accounting window for CPU usage.
  • cpu.cfs_quota_us (CFS Quota): This is the total amount of CPU time (in microseconds) the container's processes are allowed to consume within the defined period.

The Kubelet (via the container runtime) translates your Kubernetes CPU Limit into this quota using a simple formula:

Quota = CPU Limit (in cores) × Period

Example: If you set a CPU Limit of 500m (0.5 CPU), the quota is calculated as:

Quota = 0.5 × 100ms = 50ms

The Throttling Mechanism

This quota creates a hard cap on the container's CPU usage, regardless of available capacity on the Node:

  • As soon as the container's processes run, they consume the 50ms quota for the current 100ms period.
  • If the container attempts to use more than 50ms of CPU time before the 100ms period is over, the kernel's scheduler will immediately suspend (throttle) all of its threads.
  • The threads are prevented from running, even if the Node has many idle CPU cores. They remain suspended until the start of the next 100ms period, at which point the quota is refilled, and the processes are allowed to run again.

This mechanism explains why Request Only pod (server-2) can burst and never throttles (since no quota is set), while Limit Only (server-3) and Request lower than Limit (server-4) pods will throttle the instant their quota is exhausted under heavy load, severely degrading their performance.

❓When to use CPU limits?

❌ Setting Kubernetes CPU limits is generally not considered a best practice because they are enforced using the Linux kernel's CFS Bandwidth Control, which can lead to severe performance degradation through CPU throttling.

When a container hits its hard limit, it is forcibly suspended (throttled) until the next time period, even if the node has abundant idle CPU capacity. This artificial suspension significantly increases latency and reduces application throughput, which is measured by metrics like cgroup.throttled_time_ns_delta.

The recommended approach is to set CPU requests only, which guarantees a proportional share of CPU time during contention while allowing the workload to burst and utilize any available spare capacity on the node, maximizing resource efficiency.

CPU limits should only be set in specific, highly controlled scenarios:

  • Hard Multi-Tenancy/Isolation: In environments where strict fairness and hard isolation between tenants are paramount, a limit acts as a crucial guardrail to prevent any single workload from monopolizing resources and impacting others.
  • Cost Control/Optimization: When managing costs on expensive cloud instances, limits can ensure a group of applications collectively does not exceed a certain total usage threshold, making budgeting predictable.
  • Debugging Runaway Processes: A temporary limit can be used during debugging to contain a misbehaving or greedy application, preventing it from destabilizing the entire node by consuming all available CPU.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published