|
| 1 | +--- |
| 2 | +title: "Testing Environments" |
| 3 | +description: "Test scenarios, tools, and environment logic locally" |
| 4 | +icon: "flask-vial" |
| 5 | +--- |
| 6 | + |
| 7 | +Before deploying, test locally. See [Sandboxing](/guides/sandboxing) for Docker vs no-Docker patterns. |
| 8 | + |
| 9 | +## Local Testing |
| 10 | + |
| 11 | +| Environment | `local_test.py` | |
| 12 | +|-------------|-----------------| |
| 13 | +| No Docker | `from env import env` | |
| 14 | +| Docker | `env.connect_url("http://localhost:8765/mcp")` | |
| 15 | + |
| 16 | +Both use the same API after setup: |
| 17 | + |
| 18 | +```python |
| 19 | +async with env: |
| 20 | + tools = env.as_tools() # List available tools |
| 21 | + result = await env.call_tool("my_tool", arg="val") # Call a tool |
| 22 | +``` |
| 23 | + |
| 24 | +## Testing Scenarios Directly |
| 25 | + |
| 26 | +Scenarios are async generators. `hud.eval()` drives them automatically, but you can test the logic directly—this is exactly what runs at the start and end of `hud.eval()`: |
| 27 | + |
| 28 | +```python |
| 29 | +async def checkout(user_id: str, amount: int = 100): |
| 30 | + # Setup + prompt (first yield) — runs at hud.eval() start |
| 31 | + answer = yield f"Complete checkout for {user_id}, ${amount}" |
| 32 | + |
| 33 | + # Evaluation (second yield) — runs after agent submits |
| 34 | + yield 1.0 if "success" in answer.lower() else 0.0 |
| 35 | + |
| 36 | +async def test(): |
| 37 | + gen = checkout("alice", 50) |
| 38 | + prompt = await anext(gen) # What hud.eval() does at start |
| 39 | + reward = await gen.asend("Success!") # What hud.eval() does after submit |
| 40 | + assert reward == 1.0 |
| 41 | +``` |
| 42 | + |
| 43 | +If your scenario tests pass, `hud.eval()` will behave identically. |
| 44 | + |
| 45 | +## Mocking |
| 46 | + |
| 47 | +`env.mock()` intercepts at the tool layer—agents only see tools: |
| 48 | + |
| 49 | +```python |
| 50 | +env.mock() # All tools return fake responses |
| 51 | +env.mock_tool("send_email", {"status": "sent"}) |
| 52 | + |
| 53 | +# Check mock state |
| 54 | +assert env.is_mock == True |
| 55 | +``` |
| 56 | + |
| 57 | +## Hot-Reload |
| 58 | + |
| 59 | +For Docker environments, `hud dev -w path` reloads Python on save: |
| 60 | + |
| 61 | +```bash |
| 62 | +hud dev -w scenarios -w tools --port 8765 |
| 63 | +``` |
| 64 | + |
| 65 | +System services (postgres, VNC, browsers) persist across reloads. |
| 66 | + |
| 67 | +## Debugging Build Failures |
| 68 | + |
| 69 | +`hud build` runs the exact same pipeline as **New → Environment** on [hud.ai](https://hud.ai)—so if it passes locally, it'll work in production. If the build fails or the container crashes on startup, use `hud debug` to run a 5-phase compliance test: |
| 70 | + |
| 71 | +```bash |
| 72 | +hud debug my-env:latest |
| 73 | +``` |
| 74 | + |
| 75 | +Output shows exactly which phase failed: |
| 76 | +``` |
| 77 | +✓ Phase 1: Docker image exists |
| 78 | +✓ Phase 2: MCP server responds to initialize |
| 79 | +✗ Phase 3: Tool discovery failed |
| 80 | + → Error: Connection refused on port 8005 |
| 81 | + → Hint: Backend service may not be starting |
| 82 | +``` |
| 83 | + |
| 84 | +You can also debug a directory (builds first) or stop at a specific phase: |
| 85 | + |
| 86 | +```bash |
| 87 | +hud debug . # Build and debug current directory |
| 88 | +hud debug . --max-phase 3 # Stop after phase 3 |
| 89 | +hud debug --config mcp.json # Debug from config file |
| 90 | +``` |
| 91 | + |
| 92 | +## Useful Environment Properties |
| 93 | + |
| 94 | +```python |
| 95 | +# Check parallelization (for running multiple evals) |
| 96 | +env.is_parallelizable # True if all connections are remote |
| 97 | + |
| 98 | +# List what's connected |
| 99 | +env.connections # Dict of connection names → connectors |
| 100 | +env.is_connected # True if in async context |
| 101 | + |
| 102 | +# Resources and prompts (beyond tools) |
| 103 | +await env.list_resources() # MCP resources |
| 104 | +await env.list_prompts() # MCP prompts |
| 105 | +``` |
0 commit comments