Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,14 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy

### MCP endpoint (experimental)

- Start a simple JSON-RPC server on stdin/stdout:
- Start a simple JSON-RPC server on stdin/stdout (default):
```bash
keep-gpu-mcp-server
```
- Or expose it over HTTP (JSON-RPC 2.0 by way of POST):
```bash
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
```
- Example request (one per line):
```json
{"id": 1, "method": "start_keep", "params": {"gpu_ids": [0], "vram": "512MB", "interval": 60, "busy_threshold": 20}}
Expand All @@ -108,6 +112,26 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy
command: ["keep-gpu-mcp-server"]
adapter: stdio
```
- Minimal client config (HTTP MCP):
```yaml
servers:
keepgpu:
url: http://127.0.0.1:8765/
adapter: http
```
- Remote/SSH tunnel example (HTTP):
```bash
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
```
Client config (replace hostname/tunnel as needed):
```yaml
servers:
keepgpu:
url: http://gpu-box.example.com:8765/
adapter: http
```
For untrusted networks, put the server behind your own auth/reverse-proxy or
tunnel by way of SSH (for example, `ssh -L 8765:localhost:8765 gpu-box`).

## Contributing

Expand Down
4 changes: 4 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,15 @@ expectations so you can get productive quickly and avoid surprises in CI.
## MCP server (experimental)

- Start: `keep-gpu-mcp-server` (stdin/stdout JSON-RPC)
- HTTP option: `keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765`
- Methods: `start_keep`, `stop_keep`, `status`, `list_gpus`
- Example request:
```json
{"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}
```
- Remote tip: for shared clusters, prefer HTTP behind your own auth/reverse-proxy
or tunnel with SSH (`ssh -L 8765:localhost:8765 gpu-box`), then point your MCP
client at `http://127.0.0.1:8765/`.

## Pull requests

Expand Down
59 changes: 17 additions & 42 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@ understand the minimum knobs you need to keep a GPU occupied.
- Python 3.9+ (matching the version in your environment/cluster image).
- Optional but recommended: `nvidia-smi` in `PATH` for utilization monitoring (CUDA) or `rocm-smi` if you install the `rocm` extra.

!!! warning "ROCm & multi-tenant clusters"
The current release focuses on CUDA devices. ROCm/AMD support is experimental;
controllers will raise `NotImplementedError` if CUDA is unavailable.
!!! info "Platforms"
CUDA is the primary path; ROCm is supported by way of the `rocm` extra
(requires a ROCm-enabled PyTorch build). CPU-only environments can import
the package but controllers will not start.

## Install

Expand Down Expand Up @@ -39,51 +40,19 @@ understand the minimum knobs you need to keep a GPU occupied.
pip install keep-gpu
```

## For contributors

- Install dev extras: `pip install -e ".[dev]"` (append `.[rocm]` if you need ROCm SMI).
- Fast CUDA checks: `pytest tests/cuda_controller tests/global_controller tests/utilities/test_platform_manager.py tests/test_cli_thresholds.py`
- ROCm-only tests are marked `rocm`; run with `pytest --run-rocm tests/rocm_controller`.

## MCP endpoint (experimental)

For automation clients that speak JSON-RPC (MCP-style), KeepGPU ships a tiny
stdin/stdout server:

```bash
keep-gpu-mcp-server
# each request is a single JSON line; example:
echo '{"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}' | keep-gpu-mcp-server
```

Supported methods:
- `start_keep(gpu_ids?, vram?, interval?, busy_threshold?, job_id?)`
- `status(job_id?)`
- `stop_keep(job_id?)` (no job_id stops all)
- `list_gpus()` (basic info)

### Example MCP client config (stdio)

If your agent expects an MCP server definition, a minimal stdio config looks like:

```yaml
servers:
keepgpu:
description: "KeepGPU MCP server"
command: ["keep-gpu-mcp-server"]
adapter: stdio
```

Tools exposed: `start_keep`, `stop_keep`, `status`, `list_gpus`. Each request is
a single JSON line; see above for an example payload.

=== "Editable dev install"
```bash
git clone https://github.com/Wangmerlyn/KeepGPU.git
cd KeepGPU
pip install -e .[dev]
```

## Pick your interface

- **CLI** – fastest way to reserve GPUs from a shell; see [CLI Playbook](guides/cli.md).
- **Python module** – embed keep-alive loops inside orchestration code; see [Python API Recipes](guides/python.md).
- **MCP server** – expose KeepGPU over JSON-RPC (stdio or HTTP) for agents; see [MCP Server](guides/mcp.md).

## Sanity check

1. Make sure PyTorch can see at least one device:
Expand Down Expand Up @@ -119,7 +88,8 @@ ready to hand the GPU back, hit `Ctrl+C`—controllers will release VRAM and exi

## KeepGPU inside Python

The CLI wraps the same controllers you can import directly:
Prefer code-level control? Import the controllers directly (full recipes in
[Python API Recipes](guides/python.md)):

```python
from keep_gpu.single_gpu_controller.cuda_gpu_controller import CudaGPUController
Expand All @@ -141,3 +111,8 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=60):

From here, jump to the CLI Playbook for scenario-driven guidance or the API
recipes if you need to embed KeepGPU in orchestration scripts.

## For contributors

Developing locally? See [Contributing](contributing.md) for dev install, test
commands (including CUDA/ROCm markers), and PR tips.
85 changes: 85 additions & 0 deletions docs/guides/mcp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# MCP Server

Expose KeepGPU as a minimal JSON-RPC server (MCP-style) so agents or remote
orchestrators can start/stop keep-alive jobs and inspect GPU state.

## When to use this

- You run KeepGPU from an agent (LangChain, custom orchestrator, etc.) instead of a shell.
- You want to keep GPUs alive on a remote box over TCP rather than stdio.
- You need a quick way to list GPU utilization/memory by way of the same interface.

## Quick start

=== "stdio (default)"
```bash
keep-gpu-mcp-server
```
Send one JSON request per line:
```bash
echo '{"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}' | keep-gpu-mcp-server
```

=== "HTTP"
```bash
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
curl -X POST http://127.0.0.1:8765/ \
-H "content-type: application/json" \
-d '{"id":1,"method":"status"}'
```

Supported methods:

- `start_keep(gpu_ids?, vram?, interval?, busy_threshold?, job_id?)`
- `stop_keep(job_id?)` (omit `job_id` to stop all)
- `status(job_id?)` (omit `job_id` to list active jobs)
- `list_gpus()` (detailed info by way of NVML/ROCm SMI/torch)

## Client configs (MCP-style)

=== "stdio adapter"
```yaml
servers:
keepgpu:
description: "KeepGPU MCP server"
command: ["keep-gpu-mcp-server"]
adapter: stdio
```

=== "HTTP adapter"
```yaml
servers:
keepgpu:
url: http://127.0.0.1:8765/
adapter: http
```

## Remote/cluster usage

- Run on the GPU host:
```bash
keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
```
- Point your client at the host:
```yaml
servers:
keepgpu:
url: http://gpu-box.example.com:8765/
adapter: http
```
- If the network is untrusted, tunnel instead of exposing the port:
```bash
ssh -L 8765:localhost:8765 gpu-box.example.com
```
Then use `http://127.0.0.1:8765/` in your MCP config. For multi-user clusters,
consider fronting the service with your own auth/reverse-proxy.

## Responses you can expect

```json
{"id":1,"result":{"job_id":"<uuid>"}} # start_keep
{"id":2,"result":{"stopped":["<uuid>"]}} # stop_keep
{"id":3,"result":{"active":true,"job_id":"<uuid>","params":{"gpu_ids":[0]}}}
{"id":4,"result":{"active_jobs":[{"job_id":"<uuid>","params":{"gpu_ids":[0]}}]}}
{"id":5,"result":{"gpus":[{"id":0,"platform":"cuda","name":"A100","memory_total":...,"memory_used":...,"utilization":12}]}}
```
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,13 @@ during longer CPU-bound sections of your workflow.
for pinning cards on clusters, workstations, or Jupyter.
- :material-code-tags: **[Python API Recipes](guides/python.md)** – Drop-in snippets
for wrapping preprocessing stages or orchestration scripts.
- :material-lan: **[MCP Server](guides/mcp.md)** – Expose KeepGPU by way of JSON-RPC
(stdio/HTTP) for agents and remote orchestration.
- :material-diagram-project: **[How KeepGPU Works](concepts/architecture.md)** –
Learn how controllers allocate VRAM and throttle themselves.
- :material-book-open-outline: **[Reference](reference/cli.md)** – Full option list
plus mkdocstrings API reference.

!!! tip "Prefer a fast skim?"
The left sidebar mirrors the lifecycle: overview → guides → concepts →
The left sidebar mirrors the lifecycle: overview → usage → concepts →
references. Jump straight to what you need; sections stand on their own.
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ nav:
- Overview:
- Welcome: index.md
- Getting Started: getting-started.md
- Guides:
- Usage:
- CLI Playbook: guides/cli.md
- Python API Recipes: guides/python.md
- MCP Server: guides/mcp.md
- Concepts:
- How KeepGPU Works: concepts/architecture.md
- Reference:
Expand Down
Loading