Wangmerlyn · Wangmerlyn · Dec 9, 2025 · Dec 9, 2025 · Dec 9, 2025 · Dec 9, 2025
diff --git a/README.md b/README.md
@@ -92,10 +92,14 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy
 
 ### MCP endpoint (experimental)
 
-- Start a simple JSON-RPC server on stdin/stdout:
+- Start a simple JSON-RPC server on stdin/stdout (default):
   ```bash
   keep-gpu-mcp-server
   ```
+- Or expose it over HTTP (JSON-RPC 2.0 by way of POST):
+  ```bash
+  keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
+  ```
 - Example request (one per line):
   ```json
   {"id": 1, "method": "start_keep", "params": {"gpu_ids": [0], "vram": "512MB", "interval": 60, "busy_threshold": 20}}
@@ -108,6 +112,26 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=90, busy
       command: ["keep-gpu-mcp-server"]
       adapter: stdio
   ```
+- Minimal client config (HTTP MCP):
+  ```yaml
+  servers:
+    keepgpu:
+      url: http://127.0.0.1:8765/
+      adapter: http
+  ```
+- Remote/SSH tunnel example (HTTP):
+  ```bash
+  keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
+  ```
+  Client config (replace hostname/tunnel as needed):
+  ```yaml
+  servers:
+    keepgpu:
+      url: http://gpu-box.example.com:8765/
+      adapter: http
+  ```
+  For untrusted networks, put the server behind your own auth/reverse-proxy or
+  tunnel by way of SSH (for example, `ssh -L 8765:localhost:8765 gpu-box`).
 
 ## Contributing
 

diff --git a/docs/contributing.md b/docs/contributing.md
@@ -42,11 +42,15 @@ expectations so you can get productive quickly and avoid surprises in CI.
 ## MCP server (experimental)
 
 - Start: `keep-gpu-mcp-server` (stdin/stdout JSON-RPC)
+- HTTP option: `keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765`
 - Methods: `start_keep`, `stop_keep`, `status`, `list_gpus`
 - Example request:
   ```json
   {"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}
   ```
+- Remote tip: for shared clusters, prefer HTTP behind your own auth/reverse-proxy
+  or tunnel with SSH (`ssh -L 8765:localhost:8765 gpu-box`), then point your MCP
+  client at `http://127.0.0.1:8765/`.
 
 ## Pull requests
 

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -9,9 +9,10 @@ understand the minimum knobs you need to keep a GPU occupied.
 - Python 3.9+ (matching the version in your environment/cluster image).
 - Optional but recommended: `nvidia-smi` in `PATH` for utilization monitoring (CUDA) or `rocm-smi` if you install the `rocm` extra.
 
-!!! warning "ROCm & multi-tenant clusters"
-    The current release focuses on CUDA devices. ROCm/AMD support is experimental;
-    controllers will raise `NotImplementedError` if CUDA is unavailable.
+!!! info "Platforms"
+    CUDA is the primary path; ROCm is supported by way of the `rocm` extra
+    (requires a ROCm-enabled PyTorch build). CPU-only environments can import
+    the package but controllers will not start.
 
 ## Install
 
@@ -39,51 +40,19 @@ understand the minimum knobs you need to keep a GPU occupied.
     pip install keep-gpu
     ```
 
-## For contributors
-
-- Install dev extras: `pip install -e ".[dev]"` (append `.[rocm]` if you need ROCm SMI).
-- Fast CUDA checks: `pytest tests/cuda_controller tests/global_controller tests/utilities/test_platform_manager.py tests/test_cli_thresholds.py`
-- ROCm-only tests are marked `rocm`; run with `pytest --run-rocm tests/rocm_controller`.
-
-## MCP endpoint (experimental)
-
-For automation clients that speak JSON-RPC (MCP-style), KeepGPU ships a tiny
-stdin/stdout server:
-
-```bash
-keep-gpu-mcp-server
-# each request is a single JSON line; example:
-echo '{"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}' | keep-gpu-mcp-server
-```
-
-Supported methods:
-- `start_keep(gpu_ids?, vram?, interval?, busy_threshold?, job_id?)`
-- `status(job_id?)`
-- `stop_keep(job_id?)` (no job_id stops all)
-- `list_gpus()` (basic info)
-
-### Example MCP client config (stdio)
-
-If your agent expects an MCP server definition, a minimal stdio config looks like:
-
-```yaml
-servers:
-  keepgpu:
-    description: "KeepGPU MCP server"
-    command: ["keep-gpu-mcp-server"]
-    adapter: stdio
-```
-
-Tools exposed: `start_keep`, `stop_keep`, `status`, `list_gpus`. Each request is
-a single JSON line; see above for an example payload.
-
 === "Editable dev install"
     ```bash
     git clone https://github.com/Wangmerlyn/KeepGPU.git
     cd KeepGPU
     pip install -e .[dev]
     ```
 
+## Pick your interface
+
+- **CLI** – fastest way to reserve GPUs from a shell; see [CLI Playbook](guides/cli.md).
+- **Python module** – embed keep-alive loops inside orchestration code; see [Python API Recipes](guides/python.md).
+- **MCP server** – expose KeepGPU over JSON-RPC (stdio or HTTP) for agents; see [MCP Server](guides/mcp.md).
+
 ## Sanity check
 
 1. Make sure PyTorch can see at least one device:
@@ -119,7 +88,8 @@ ready to hand the GPU back, hit `Ctrl+C`—controllers will release VRAM and exi
 
 ## KeepGPU inside Python
 
-The CLI wraps the same controllers you can import directly:
+Prefer code-level control? Import the controllers directly (full recipes in
+[Python API Recipes](guides/python.md)):
 
 ```python
 from keep_gpu.single_gpu_controller.cuda_gpu_controller import CudaGPUController
@@ -141,3 +111,8 @@ with GlobalGPUController(gpu_ids=[0, 1], vram_to_keep="750MB", interval=60):
 
 From here, jump to the CLI Playbook for scenario-driven guidance or the API
 recipes if you need to embed KeepGPU in orchestration scripts.
+
+## For contributors
+
+Developing locally? See [Contributing](contributing.md) for dev install, test
+commands (including CUDA/ROCm markers), and PR tips.
diff --git a/docs/guides/mcp.md b/docs/guides/mcp.md
@@ -0,0 +1,85 @@
+# MCP Server
+
+Expose KeepGPU as a minimal JSON-RPC server (MCP-style) so agents or remote
+orchestrators can start/stop keep-alive jobs and inspect GPU state.
+
+## When to use this
+
+- You run KeepGPU from an agent (LangChain, custom orchestrator, etc.) instead of a shell.
+- You want to keep GPUs alive on a remote box over TCP rather than stdio.
+- You need a quick way to list GPU utilization/memory by way of the same interface.
+
+## Quick start
+
+=== "stdio (default)"
+    ```bash
+    keep-gpu-mcp-server
+    ```
+    Send one JSON request per line:
+    ```bash
+    echo '{"id":1,"method":"start_keep","params":{"gpu_ids":[0],"vram":"512MB","interval":60,"busy_threshold":20}}' | keep-gpu-mcp-server
+    ```
+
+=== "HTTP"
+    ```bash
+    keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
+    curl -X POST http://127.0.0.1:8765/ \
+      -H "content-type: application/json" \
+      -d '{"id":1,"method":"status"}'
+    ```
+
+Supported methods:
+
+- `start_keep(gpu_ids?, vram?, interval?, busy_threshold?, job_id?)`
+- `stop_keep(job_id?)` (omit `job_id` to stop all)
+- `status(job_id?)` (omit `job_id` to list active jobs)
+- `list_gpus()` (detailed info by way of NVML/ROCm SMI/torch)
+
+## Client configs (MCP-style)
+
+=== "stdio adapter"
+    ```yaml
+    servers:
+      keepgpu:
+        description: "KeepGPU MCP server"
+        command: ["keep-gpu-mcp-server"]
+        adapter: stdio
+    ```
+
+=== "HTTP adapter"
+    ```yaml
+    servers:
+      keepgpu:
+        url: http://127.0.0.1:8765/
+        adapter: http
+    ```
+
+## Remote/cluster usage
+
+- Run on the GPU host:
+  ```bash
+  keep-gpu-mcp-server --mode http --host 0.0.0.0 --port 8765
+  ```
+- Point your client at the host:
+  ```yaml
+  servers:
+    keepgpu:
+      url: http://gpu-box.example.com:8765/
+      adapter: http
+  ```
+- If the network is untrusted, tunnel instead of exposing the port:
+  ```bash
+  ssh -L 8765:localhost:8765 gpu-box.example.com
+  ```
+  Then use `http://127.0.0.1:8765/` in your MCP config. For multi-user clusters,
+  consider fronting the service with your own auth/reverse-proxy.
+
+## Responses you can expect
+
+```json
+{"id":1,"result":{"job_id":"<uuid>"}}                # start_keep
+{"id":2,"result":{"stopped":["<uuid>"]}}            # stop_keep
+{"id":3,"result":{"active":true,"job_id":"<uuid>","params":{"gpu_ids":[0]}}}
+{"id":4,"result":{"active_jobs":[{"job_id":"<uuid>","params":{"gpu_ids":[0]}}]}}
+{"id":5,"result":{"gpus":[{"id":0,"platform":"cuda","name":"A100","memory_total":...,"memory_used":...,"utilization":12}]}}
+```
diff --git a/docs/index.md b/docs/index.md
@@ -32,11 +32,13 @@ during longer CPU-bound sections of your workflow.
   for pinning cards on clusters, workstations, or Jupyter.
 - :material-code-tags: **[Python API Recipes](guides/python.md)** – Drop-in snippets
   for wrapping preprocessing stages or orchestration scripts.
+- :material-lan: **[MCP Server](guides/mcp.md)** – Expose KeepGPU by way of JSON-RPC
+  (stdio/HTTP) for agents and remote orchestration.
 - :material-diagram-project: **[How KeepGPU Works](concepts/architecture.md)** –
   Learn how controllers allocate VRAM and throttle themselves.
 - :material-book-open-outline: **[Reference](reference/cli.md)** – Full option list
   plus mkdocstrings API reference.
 
 !!! tip "Prefer a fast skim?"
-    The left sidebar mirrors the lifecycle: overview → guides → concepts →
+    The left sidebar mirrors the lifecycle: overview → usage → concepts →
     references. Jump straight to what you need; sections stand on their own.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -15,9 +15,10 @@ nav:
   - Overview:
       - Welcome: index.md
       - Getting Started: getting-started.md
-  - Guides:
+  - Usage:
       - CLI Playbook: guides/cli.md
       - Python API Recipes: guides/python.md
+      - MCP Server: guides/mcp.md
   - Concepts:
       - How KeepGPU Works: concepts/architecture.md
   - Reference: