Keep GPU keeps shared GPUs from being reclaimed while you prep data, debug, or coordinate multi-stage pipelines. It allocates just enough VRAM and issues lightweight CUDA work so schedulers observe an “active” device—without running a full training job.
- 🧾 License: MIT
- 📚 Docs: https://keepgpu.readthedocs.io
On many clusters, idle GPUs are reaped or silently shared after a short grace period. The cost of losing your reservation (or discovering another job has taken your card) can dwarf the cost of a tiny keep-alive loop. KeepGPU is a minimal, auditable guardrail:
- Predictable – Single-purpose controller with explicit resource knobs (VRAM size, interval, utilization backoff).
- Polite – Uses NVML to read utilization and backs off when the GPU is busy.
- Portable – Typer/Rich CLI for humans; Python API for orchestrators and notebooks.
- Observable – Structured logging and optional file logs for auditing what kept the GPU alive.
- Power-aware – Uses intervalled elementwise ops instead of heavy matmul floods to present “busy” utilization while keeping power and thermals lower (see
CudaGPUController._run_mat_batchfor the loop). - NVML-backed – GPU telemetry comes from
nvidia-ml-py(thepynvmlmodule), with optionalrocm-smisupport when you install therocmextra.
pip install keep-gpu
# Hold GPU 0 with 1 GiB VRAM and throttle if utilization exceeds 25%
keep-gpu --gpu-ids 0 --vram 1GiB --busy-threshold 25 --interval 60- CUDA (example: cu121)
pip install --index-url https://download.pytorch.org/whl/cu121 torch pip install keep-gpu
- ROCm (example: rocm6.1)
pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch pip install keep-gpu[rocm]
- CPU-only
pip install torch pip install keep-gpu
Flags that matter:
--vram(1GiB,750MB, or bytes): how much memory to pin.--interval(seconds): sleep between keep-alive bursts.--busy-threshold: skip work when NVML reports higher utilization.--gpu-ids: target a subset; otherwise all visible GPUs are guarded.
from keep_gpu.single_gpu_controller.cuda_gpu_controller import CudaGPUController
with CudaGPUController(rank=0, interval=0.5, vram_to_keep="1GiB", busy_threshold=20):
preprocess_dataset() # GPU is marked busy while you run CPU-heavy code
train_model() # GPU freed after exiting the contextNeed multiple devices?
from keep_gpu.global_gpu_controller.global_gpu_controller import GlobalGPUController
with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy_threshold=30):
run_pipeline_stage()- Battle-tested keep-alive loop built on PyTorch.
- NVML-based utilization monitoring (by way of
nvidia-ml-py) to avoid hogging busy GPUs; optional ROCm SMI support by way ofpip install keep-gpu[rocm]. - CLI + API parity: same controllers power both code paths.
- Continuous docs + CI: mkdocs + mkdocstrings build in CI to keep guidance up to date.
- Install dev extras:
pip install -e ".[dev]"(add.[rocm]if you need ROCm SMI). - Fast CUDA checks:
pytest tests/cuda_controller tests/global_controller tests/utilities/test_platform_manager.py tests/test_cli_thresholds.py - ROCm-only tests carry
@pytest.mark.rocm; run withpytest --run-rocm tests/rocm_controller. - Markers:
rocm(needs ROCm stack) andlarge_memory(opt-in locally).
- Start a simple JSON-RPC server on stdin/stdout (default):
keep-gpu-mcp-server
- Or expose it over HTTP (JSON-RPC 2.0 by way of POST):
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
- Example request (one per line):
{"id": 1, "method": "start_keep", "params": {"gpu_ids": [0], "vram": "512MB", "interval": 60, "busy_threshold": 20}} - Methods:
start_keep,stop_keep(optionaljob_id, default stops all),status(optionaljob_id),list_gpus(basic info). - Minimal client config (stdio MCP):
servers: keepgpu: command: ["keep-gpu-mcp-server"] adapter: stdio
- Minimal client config (HTTP MCP):
servers: keepgpu: url: http://127.0.0.1:8765/ adapter: http
- Remote/SSH tunnel example (HTTP):
Client config (replace hostname/tunnel as needed):
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
For untrusted networks, put the server behind your own auth/reverse-proxy or tunnel by way of SSH (for example,servers: keepgpu: url: http://gpu-box.example.com:8765/ adapter: http
ssh -L 8765:localhost:8765 gpu-box).
Contributions are welcome—especially around ROCm support, platform fallbacks, and scheduler-specific recipes. Open an issue or PR if you hit edge cases on your cluster. See docs/contributing.md for dev setup, test commands, and PR tips.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
If you find KeepGPU useful in your research or work, please cite it as:
@software{Wangmerlyn_KeepGPU_2025,
author = {Wang, Siyuan and Shi, Yaorui and Liu, Yida and Yin, Yuqi},
title = {KeepGPU: a simple CLI app that keeps your GPUs running},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17129114},
url = {https://github.com/Wangmerlyn/KeepGPU},
note = {GitHub repository},
keywords = {ai, hpc, gpu, cluster, cuda, torch, debug}
}