Pier is an interactive workspace manager for coding agents (Claude Code, Codex, Gemini CLI, OpenHands, Aider, and more). Run agents in managed workspaces (containerized or host), capture and share traces, and interactively solve and develop Harbor benchmark tasks.
uv tool install git+https://github.com/allenai/pierpier start creates a workspace directory and, in container mode, builds and starts a container with your agent installed. All subsequent commands run from inside the workspace.
# From a base image:
pier start -d ./workspace --image ubuntu:24.04 --agent claude-code \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
cd ./workspace
# From a Harbor task:
pier start ./tasks/my-task -d ./workspace --agent claude-code \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
cd ./workspace
# Host mode (no container):
pier start ./tasks/my-task --host -d ./workspace
cd ./workspace
pier skills # install task skills for your agent--agentinstalls a supported coding agent and registers skills fromskills_dirin task.toml. Optional — without it you get a plain workspace. To install multiple agents, runpier start --agent <name>again from the workspace.--imageaccepts any Docker image — a stock OS, a project image with tools pre-installed, etc.--no-mountkeeps workspace files inside the container only (no bind-mount to host).pier stopcopies files back. Note: Harbor's internal mounts (agent logs, verifier output) still write to the host under.pier/.-f/--forceallows starting in a non-empty directory.
cat .task/instruction.md # read the task instruction (Harbor tasks)
# Container mode:
pier exec claude # run the agent interactively
pier exec bash # drop into the container shell
# Non-interactive (scripted or automated use):
pier exec -- claude -p "Read .task/instruction.md and do the task" --dangerously-skip-permissions
# TODO: pier run — run the agent non-interactively and verify, via harbor run
# Run a background process (e.g., live document preview):
pier exec -d -- quarto preview --port 8888 --host 0.0.0.0 --no-browse
# Host mode:
claude # run the agent directlyFor Harbor tasks, score your work with the verifier:
pier verify # run tests/test.sh, print reward, save results and trajectory to .pier/trials/Without a Harbor task (e.g. --image mode), save the trajectory with pier capture:
pier capture # save trajectory to .pier/trials/
pier capture --session-dir <path> -a claude-code # from any agent session outside pierBrowse, export, and review:
pier traces # list traces in .pier/trials/
pier traces -o trace.tar.gz # export latest trace as archive
pier view # web dashboard (via Harbor)
pier summarize # AI summary (via Harbor)Works with all supported agents.
vim ../tasks/my-task/tests/test.sh # edit the verifier
pier verify # re-run — changes picked up immediately
vim ../tasks/my-task/instruction.md # edit the instruction or Dockerfile
pier stop
pier start # rebuild and restartTo start a new task, clone an existing Harbor task as a template (see Harbor's docs for the task format):
git clone https://github.com/laude-institute/harbor
cp -r harbor/examples/tasks/hello-world ./tasks/my-taskExtract the agent's trajectory (conversation, tool use, and cost data) for the current workspace. No Harbor task required.
pier capture # container mode with registered agent
pier capture --session-dir <path> -a claude-code # host mode or external sessionIn container mode with a registered agent (pier start --agent), the trajectory is extracted automatically. Otherwise, pass --session-dir and -a to specify the session location and agent. In container mode, --session-dir refers to a path inside the container.
List or export captured trials. Without -o, lists available trials. With -o, packages them for sharing.
pier traces # list trials
pier traces -o trace.tar.gz # export latest trial
pier traces 2026-04-02_15-30-00 -o t.gz # export specific trial
pier traces --all -o traces.tar.gz # export all trialsLaunch a workspace. Also restarts stopped containers and installs agents into existing workspaces.
# Task-free (any Docker image, no task definition)
pier start -d . --image ubuntu:24.04
pier start -d . --image my-project-image --agent claude-code --ports 8888
# With a Harbor task
pier start ./tasks/my-task -d ./my-workspace
pier start https://github.com/org/repo#tasks/my-task
# Manage existing workspace
pier start --agent claude-code # install agent in current workspace
pier start # restart a stopped containertask_pathcan be a local directory or a remote git reference (URL#path). Optional — omit for task-free mode.-dspecifies the workspace directory. Required for local tasks and task-free mode; defaults to./<task-name>for remote tasks.--imagespecifies the base Docker image (task-free mode). Ignored when a task is provided (the task's Dockerfile is used).--portsexposes container ports to the host (e.g.,--ports 8888).--mounts-jsonadds volume mounts as a JSON array (e.g.,--mounts-json '["./skills:/opt/skills:ro"]').-epasses container-mode environment variables inKEY=VALUEformat (repeatable). Stored in the session and forwarded on everypier exec.--env-fileloads container-mode environment variables from a.envfile. Same behavior as-efor each line.--no-mountkeeps workspace files inside the container only (no bind-mount to host).pier stopcopies files back. Note: Harbor's internal mounts (agent logs, verifier output) still write to the host under.pier/.-f/--forceallows starting in a non-empty directory.--hostskips the container (workspace only).--agentinstalls a coding agent. To install additional agents, runpier start --agent <name>again from the workspace. Whentask_pathis omitted, it operates on the current workspace.
Run a command in the workspace context. Sets workspace env vars so task CLIs find the right workspace.
pier exec bash
pier exec claude # agent auto-detected → full output captured
pier exec -d -- quarto preview --port 8888 --host 0.0.0.0 --no-browse- Container mode: delegates to
docker execin the container's working directory. Agent commands are auto-detected and their output is recorded viascript. Each exec gets its own timestamped directory so sessions don't overwrite each other.CLAUDE_CONFIG_DIRandCODEX_HOMEare always set so structured session logs land in the mounted volume. - Host mode: runs the command directly with
TASK_WORKSPACEset -d/--detach: runs in the background (useful for servers like quarto preview)
Run the verifier and report the reward (requires a Harbor task). Each run creates a timestamped directory under <workspace>/.pier/trials/ in Harbor's trial format.
pier verify # uses agent from pier start --agentThe agent is inferred from the session (set by pier start --agent). If no agent is registered, the verifier still runs and saves the reward but the trajectory (conversation, tool use, and cost data) is skipped.
- Container mode: uses Harbor's
Verifier(same verifier asharbor run). The trajectory is captured automatically when an agent is registered. For unregistered agents (e.g. baked into the image), pass--session-dir(container path) and-a. - Host mode: spins up a temporary container to run the verifier, then tears it down. Pass
--session-dirand-ato capture the trajectory.
Stop the Docker container for the current workspace (container mode only). The workspace directory is preserved. Restart later with pier start (no arguments, from inside the workspace).
Show active workspaces.
Workspace Container Status
──────────────────────────────────────── ────────────────────────────── ────────
/home/user/my-workspace pier-my-workspace-a1b2-main-1 running
/home/user/hello-world — —
Open a web dashboard to browse trial trajectories. Defaults to the current workspace's .pier/ directory.
pier view # current workspace
pier view /path/to/.pier # explicit path
pier view --port 9000 # custom portSummarize trial results using AI (requires an Anthropic API key). Defaults to the current workspace.
pier summarize # summarize failures in current workspace
pier summarize --all # include successful trials too
pier summarize -m sonnet # use a different model (default: haiku)Install task skills for your coding agent (host mode only). Reads skills_dir from task.toml, extracts skills from the task's container image, and registers them via npx skills add. In container mode with --agent, skills are installed automatically.
See DEVELOPER.md for setup, testing, and architecture.