Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 235 additions & 0 deletions python/agents/tau2-benchmark-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# Using ADK Agent with τ-bench

This guide provides instructions on how to set up and use a custom agent that integrates with the Google ADK (Agent Development Kit) within the τ-bench framework.

## 1. Clone the Repository

First, clone the specific version of the τ-bench repository that is compatible with these modifications.

```bash
git clone https://github.com/sierra-research/tau2-bench.git
cd tau2-bench
git checkout cc97b34
```

## 2. Setup and Installation

Follow the standard installation instructions to set up the environment for the project.


Ensure that you have venv:

```bash
sudo apt install python3.13-venv
```

```bash
python3 -m venv .venv
source .venv/bin/activate

pip install -e .
```

You can check tau data:

```bash
tau2 check-data
```

After the standard installation, you need to install the Google ADK package.

```bash
pip install google-adk
```

tenacity dependecy version may conflict with that of tau2 repo. Upgrading it back in favor of tau2.

```bash
pip install --upgrade tenacity
```

**IMPORTANT:** Gemini 3 Pro model makes sending thought signatures mandatory. Tau2 bench relies on litellm for user simulation and non-adk agent simulation. Until https://github.com/BerriAI/litellm/pull/16812 is merged to litellm repository, the PR needs to be applied as shown below:

```bash
git clone --filter=blob:none --quiet https://github.com/BerriAI/litellm.git /tmp/litellm-pr-16812
cd /tmp/litellm-pr-16812
git checkout -q pull/16812/head
git fetch origin pull/16812/head:pr-16812
git checkout pr-16812
pip install .
cd -
```

## 3. Add env params

Create `.env` file at root with the following content.

```bash
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT=your_project_id
GOOGLE_CLOUD_LOCATION=global
VERTEXAI_LOCATION=global
```

## 4. Copy Modified Files

You will need to copy two files into your local `tau2-bench` directory. These files contain the implementation of the ADK agent and its unit tests.

Assuming you have the following files available:

* `tau2_agent/adk_agent.py`
* `tests/test_adk_agent.py`

Copy them to the correct locations within the project:

```bash
cp ../tau2_agent/adk_agent.py src/tau2/agent/adk_agent.py
cp ../tests/test_adk_agent.py tests/test_adk_agent.py
```

## 4.1. Registering the ADK Agent

To enable the `adk_agent` within the `tau2-bench` framework, you need to manually modify the `src/tau2/registry.py` file in your `tau2-bench` repository.

1. **Add the import statement** for `AdkAgent` at the top of `src/tau2/registry.py`:
```python
from tau2.agent.adk_agent import AdkAgent
```
2. **Register the agent** within the `try` block where other default components are registered (look for `registry = Registry()`):
```python
try:
registry = Registry()
logger.debug("Registering default components...")
# ... existing registrations ...
registry.register_agent(AdkAgent, "adk_agent") # Add this line
# ... more existing registrations ...
```
This allows the `adk_agent` to be selected via command-line arguments (e.g., `--agent adk_agent`).

## 5. Running the ADK Agent

Once the files are in place and dependencies are installed, you can run a simulation using the `adk_agent`.

### Changing baseline ADK agent

You can implement an improved agent and replace `AdkLlmAgent` implementation `_create_agent` returns as shown below. Return type is BaseAgent which accommodates workflow agents as well.

```py
def _create_agent(name: str, model: Union[str, BaseLlm], instruction: str, tools: List[Tool]) -> BaseAgent:
adk_tools = [
AdkTool(
types.FunctionDeclaration(
name=tool.openai_schema['function']['name'],
description=tool.openai_schema['function'].get('description', ''),
parameters_json_schema=tool.openai_schema['function']['parameters'],
)
)
for tool in tools
]
return AdkLlmAgent(
model=model,
name=name,
instruction=instruction,
tools=adk_tools,
planner=built_in_planner.BuiltInPlanner(
thinking_config=types.ThinkingConfig(include_thoughts=True),
),
)
```

### Limited run

Here is an example command to run the agent on an airline domain task:

```bash
tau2 run --domain airline --agent adk_agent --agent-llm vertex_ai/gemini-3-pro-preview --user-llm vertex_ai/gemini-3-pro-preview --num-trials 1 --num-tasks 1 --user-llm-args '{"temperature": 1, "reasoning_effort": "high"}' --agent-llm-args '{"temperature": 1, "reasoning_effort": "high"}'
```

Optionally, you can run specific example by using `--task-ids` instead of `--num-tasks`.

**temperature:** When adk_agent is used defaults to 1. The commands in this document sets them explicitly using llm_args for both user and agent models.

**reasoning_level** Only applies to Gemini 3 Pro model. It defaults to high for adk_agent while using this model. Otherwise, it will default to dynamic thinking. Again this document demonsrates setting it explicitly using llm_args.

**NOTE**: It is normal that you will be getting `This model isn't mapped yet` error logs. This is coming from litellm cost calculation workflow used by `--user-llm`. You can suppress is temporarily by swapping `--user-llm vertex_ai/gemini-3-pro-preview` with `--user-llm vertex_ai/gemini-2.5-pro`.

### Viewing trajectories

You can use the following command to view trajectories after following the default options:

```bash
tau2 view
```

### Full run

Full run requires dropping the arg `--task-ids`.

```bash
# Example: Run complete evaluation for all domains
tau2 run \
--domain retail \
--agent adk_agent \
--agent-llm vertex_ai/gemini-3-pro-preview \
--user-llm vertex_ai/gemini-3-pro-preview \
--num-trials 4 \
--save-to gemini_3_pro_retail \
--user-llm-args '{"temperature": 1, "reasoning_effort": "high"}' \
--agent-llm-args '{"temperature": 1, "reasoning_effort": "high"}'


tau2 run \
--domain airline \
--agent adk_agent \
--agent-llm vertex_ai/gemini-3-pro-preview \
--user-llm vertex_ai/gemini-3-pro-preview \
--num-trials 4 \
--save-to gemini_3_pro_airline \
--user-llm-args '{"temperature": 1, "reasoning_effort": "high"}' \
--agent-llm-args '{"temperature": 1, "reasoning_effort": "high"}'


tau2 run \
--domain telecom \
--agent adk_agent \
--agent-llm vertex_ai/gemini-3-pro-preview \
--user-llm vertex_ai/gemini-3-pro-preview \
--num-trials 4 \
--save-to gemini_3_pro_telecom \
--user-llm-args '{"temperature": 1, "reasoning_effort": "high"}' \
--agent-llm-args '{"temperature": 1, "reasoning_effort": "high"}'
```

### Prepare Submission Package

```bash
tau2 submit prepare data/tau2/simulations/gemini_3_pro_*.json --output ./gemini_3_pro_submission
```

This command will:

- Verify all trajectory files are valid
- Check that submission requirements are met
- Compute performance metrics (Pass^k rates)
- Prompt for required metadata (model name, organization, contact email) -> you can pass dummy values here as we are not submitting yet.
- Create a structured submission directory with:
- `submission.json`: Metadata and metrics
- `trajectories/`: Your trajectory files

## 6. Testing the Agent

To verify that the agent is set up correctly, you can run its unit tests using `pytest`.

```bash
pytest tests/test_adk_agent.py
```

To see coverage (optional):

```bash
pip install pytest-cov
```

```bash
pytest --cov=tau2.agent.adk_agent --cov-report=html tests/test_adk_agent.py
````
29 changes: 29 additions & 0 deletions python/agents/tau2-benchmark-agent/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[project]
name = "tau2-benchmark-agent"
version = "0.1.0"
description = "ADK Agent for Tau-Bench"
authors = [{ name = "Google ADK Team", email = "[email protected]" }]
license = "Apache-2.0"
readme = "README.md"
dependencies = [
"google-adk==1.19.0",
"google-genai==1.51.0",
"loguru==0.7.3",
"tenacity==9.1.2",
"tau2 @ git+https://github.com/sierra-research/tau2-bench.git@cc97b34",
]
requires-python = ">=3.10"

[dependency-groups]
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
]

[tool.setuptools.packages.find]
where = ["."]
include = ["tau2_agent*"]

[tool.pytest.ini_options]
pythonpath = "."
asyncio_mode = "auto"
Empty file.
Loading