| title | Agents |
|---|---|
| description | Using popular agents and integrating your own |
How to evaluate on existing agents and integrate your own. This is particularly useful for benchmarking your agent, optimizing its prompts, using it as a scaffold for RL, or using it to generate SFT datasets.
Harbor comes with most popular agents pre-integrated. You can run the following command and reference the --agent flag to see a list of all available agents:
harbor run --helpRight now, Harbor includes Terminus-2, Claude Code, Codex CLI, Gemini CLI, OpenHands, Mini-SWE-Agent, and more.
Most agents need API credentials to connect to a model provider. Harbor supports several ways to pass credentials, depending on the agent and provider.
Set the relevant API key in your shell before running harbor run. The agent picks it up automatically.
# Anthropic API (Claude Code)
export ANTHROPIC_API_KEY=sk-ant-...
harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6
# OpenAI API (Codex CLI)
export OPENAI_API_KEY=sk-...
harbor run -p ./task -a codex -m openai/o3Use --ae to pass environment variables directly to the agent container without exporting them in your shell:
harbor run -p ./task -a claude-code \
--ae ANTHROPIC_API_KEY=sk-ant-...This is useful for one-off runs or when you need different credentials per run. Variables passed via --ae are merged into the agent's environment and take effect inside the container.
Some agents have built-in support for cloud providers that use credential files or token-based auth rather than simple API keys.
Claude Code detects Bedrock mode via the CLAUDE_CODE_USE_BEDROCK environment variable. Set your AWS credentials and region, then run:
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
# Option A: Standard AWS credential chain
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
# Option B: Bedrock API key auth
export AWS_BEARER_TOKEN_BEDROCK=...
harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6The agent passes through AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_PROFILE, and AWS_REGION into the container automatically.
Optional variables:
ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION— separate region for the small/fast model (Haiku)DISABLE_PROMPT_CACHING=1— disable prompt caching (not available in all Bedrock regions)
Claude Code detects Vertex AI mode via the CLAUDE_CODE_USE_VERTEX environment variable. Authentication uses Application Default Credentials (ADC).
First, ensure you have ADC configured on your host:
gcloud auth application-default loginThen run:
export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_PROJECT_ID=my-gcp-project
export CLOUD_ML_REGION=us-east5 # or "global" for automatic routing
harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6The agent automatically locates your ADC credentials file (checking GOOGLE_APPLICATION_CREDENTIALS first, then the default ~/.config/gcloud/application_default_credentials.json), uploads it into the container, and sets GOOGLE_APPLICATION_CREDENTIALS to point to it. No manual volume mounting is required.
To use Claude Code with a custom endpoint (OpenRouter, self-hosted proxy, etc.):
export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=sk-or-...
harbor run -p ./task -a claude-code -m openrouter/anthropic/claude-sonnet-4-6When ANTHROPIC_BASE_URL is set, all model aliases (Sonnet, Opus, Haiku, subagent) are pointed to the same model to avoid routing issues.
Harbor supports integrating your own agent without having to modify the Harbor source code.
There are two types of agents:
- External agents which interface with the environment through the
BaseEnvironmentinterface, typically by executing bash commands via theexecmethod. - Installed agents which are agents that are installed directly into the container environment and are executed in headless mode. This is how most agents are integrated and comes with the advantage of bringing custom tools.
To build an external agent, you need to implement the BaseAgent interface which involved defining the following methods:
from harbor.agents.base import BaseAgent
class MyExternalAgent(BaseAgent):
@staticmethod
def name() -> str:
"""The name of the agent."""
pass
def version(self) -> str | None:
"""The version of the agent."""
pass
async def setup(self, environment: BaseEnvironment) -> None:
"""
Run commands to setup the agent & its tools.
"""
pass
async def run(
self,
instruction: str,
environment: BaseEnvironment,
context: AgentContext,
) -> None:
"""
Runs the agent in the environment. Be sure to populate the context with the
results of the agent execution. Ideally, populate the context as the agent
executes in case of a timeout or other error.
Args:
instruction: The task instruction.
environment: The environment in which to complete the task.
context: The context to populate with the results of the agent execution.
"""
passTo build an installed agent, you need to implement the BaseInstalledAgent interface which involved defining the following methods:
from harbor.agents.installed.base import BaseInstalledAgent
class ExecInput(BaseModel):
command: str
cwd: str | None = None
env: dict[str, str] | None = None
timeout_sec: int | None = None
class MyInstalledAgent(BaseInstalledAgent):
@property
def _install_agent_template_path(self) -> Path:
"""
Path to the jinja template script for installing the agent in the container.
"""
pass
def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
"""
Create the commands to run the agent in the container. Usually this is a single
command that passes the instruction to the agent and executes it in headless
mode.
"""
pass
def populate_context_post_run(self, context: AgentContext) -> None:
"""
Populate the context with the results of the agent execution. Assumes the run()
method has already been called. Typically involves parsing a trajectory file.
"""
passTo run a custom agent, you can use the following command:
harbor run -d "<dataset@version>" --agent-import-path path.to.agent:SomeAgent