Skip to content

Latest commit

 

History

History
204 lines (144 loc) · 7.03 KB

File metadata and controls

204 lines (144 loc) · 7.03 KB
title Agents
description Using popular agents and integrating your own

How to evaluate on existing agents and integrate your own. This is particularly useful for benchmarking your agent, optimizing its prompts, using it as a scaffold for RL, or using it to generate SFT datasets.

Existing agents

Harbor comes with most popular agents pre-integrated. You can run the following command and reference the --agent flag to see a list of all available agents:

harbor run --help

Right now, Harbor includes Terminus-2, Claude Code, Codex CLI, Gemini CLI, OpenHands, Mini-SWE-Agent, and more.

Authentication

Most agents need API credentials to connect to a model provider. Harbor supports several ways to pass credentials, depending on the agent and provider.

Environment variables

Set the relevant API key in your shell before running harbor run. The agent picks it up automatically.

# Anthropic API (Claude Code)
export ANTHROPIC_API_KEY=sk-ant-...
harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6

# OpenAI API (Codex CLI)
export OPENAI_API_KEY=sk-...
harbor run -p ./task -a codex -m openai/o3

Agent environment overrides (--ae)

Use --ae to pass environment variables directly to the agent container without exporting them in your shell:

harbor run -p ./task -a claude-code \
  --ae ANTHROPIC_API_KEY=sk-ant-...

This is useful for one-off runs or when you need different credentials per run. Variables passed via --ae are merged into the agent's environment and take effect inside the container.

Provider-specific configuration

Some agents have built-in support for cloud providers that use credential files or token-based auth rather than simple API keys.

Claude Code with AWS Bedrock

Claude Code detects Bedrock mode via the CLAUDE_CODE_USE_BEDROCK environment variable. Set your AWS credentials and region, then run:

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1

# Option A: Standard AWS credential chain
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...

# Option B: Bedrock API key auth
export AWS_BEARER_TOKEN_BEDROCK=...

harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6

The agent passes through AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_PROFILE, and AWS_REGION into the container automatically.

Optional variables:

  • ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION — separate region for the small/fast model (Haiku)
  • DISABLE_PROMPT_CACHING=1 — disable prompt caching (not available in all Bedrock regions)
Claude Code with Google Cloud Vertex AI

Claude Code detects Vertex AI mode via the CLAUDE_CODE_USE_VERTEX environment variable. Authentication uses Application Default Credentials (ADC).

First, ensure you have ADC configured on your host:

gcloud auth application-default login

Then run:

export CLAUDE_CODE_USE_VERTEX=1
export ANTHROPIC_VERTEX_PROJECT_ID=my-gcp-project
export CLOUD_ML_REGION=us-east5          # or "global" for automatic routing

harbor run -p ./task -a claude-code -m anthropic/claude-sonnet-4-6

The agent automatically locates your ADC credentials file (checking GOOGLE_APPLICATION_CREDENTIALS first, then the default ~/.config/gcloud/application_default_credentials.json), uploads it into the container, and sets GOOGLE_APPLICATION_CREDENTIALS to point to it. No manual volume mounting is required.

Claude Code with custom API base URL

To use Claude Code with a custom endpoint (OpenRouter, self-hosted proxy, etc.):

export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=sk-or-...

harbor run -p ./task -a claude-code -m openrouter/anthropic/claude-sonnet-4-6

When ANTHROPIC_BASE_URL is set, all model aliases (Sonnet, Opus, Haiku, subagent) are pointed to the same model to avoid routing issues.

Integrating your own agent

Harbor supports integrating your own agent without having to modify the Harbor source code.

There are two types of agents:

  1. External agents which interface with the environment through the BaseEnvironment interface, typically by executing bash commands via the exec method.
  2. Installed agents which are agents that are installed directly into the container environment and are executed in headless mode. This is how most agents are integrated and comes with the advantage of bringing custom tools.

External agents

To build an external agent, you need to implement the BaseAgent interface which involved defining the following methods:

from harbor.agents.base import BaseAgent

class MyExternalAgent(BaseAgent):
    @staticmethod
    def name() -> str:
        """The name of the agent."""
        pass

    def version(self) -> str | None:
        """The version of the agent."""
        pass

    async def setup(self, environment: BaseEnvironment) -> None:
        """
        Run commands to setup the agent & its tools.
        """
        pass

    async def run(
        self,
        instruction: str,
        environment: BaseEnvironment,
        context: AgentContext,
    ) -> None:
        """
        Runs the agent in the environment. Be sure to populate the context with the
        results of the agent execution. Ideally, populate the context as the agent
        executes in case of a timeout or other error.

        Args:
            instruction: The task instruction.
            environment: The environment in which to complete the task.
            context: The context to populate with the results of the agent execution.
        """
        pass

Installed agents

To build an installed agent, you need to implement the BaseInstalledAgent interface which involved defining the following methods:

from harbor.agents.installed.base import BaseInstalledAgent

class ExecInput(BaseModel):
    command: str
    cwd: str | None = None
    env: dict[str, str] | None = None
    timeout_sec: int | None = None

class MyInstalledAgent(BaseInstalledAgent):
    @property
    def _install_agent_template_path(self) -> Path:
        """
        Path to the jinja template script for installing the agent in the container.
        """
        pass

    def create_run_agent_commands(self, instruction: str) -> list[ExecInput]:
        """
        Create the commands to run the agent in the container. Usually this is a single
        command that passes the instruction to the agent and executes it in headless
        mode.
        """
        pass

    def populate_context_post_run(self, context: AgentContext) -> None:
        """
        Populate the context with the results of the agent execution. Assumes the run()
        method has already been called. Typically involves parsing a trajectory file.
        """
        pass

Running a custom agent

To run a custom agent, you can use the following command:

harbor run -d "<dataset@version>" --agent-import-path path.to.agent:SomeAgent