Fix verify method to include builtin tools in event check #1710

malhotra5 · 2026-01-12T22:09:01Z

Summary

Fix a bug in the AgentBase.verify() method where builtin tools (like finish and think) were not being included when checking event history against available runtime tools. (introduced in #1542)

This caused errors when resuming conversations that used builtin tools:

Cannot resume conversation: tools that were used in history are missing from runtime: ['finish']. Available tools: ['delegate', 'file_editor', 'task_tracker', 'terminal']

The Problem

When verify() checks events to see if used tools exist in the runtime agent:

runtime_names was computed from self.tools (Tool specs like 'TerminalTool')
But builtin tools configured via include_default_tools were not included
Events contain runtime names like 'finish', 'think' for builtin tools

The Fix

Added builtin tool runtime names from include_default_tools to runtime_names when checking against event history:

# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
    tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
    if tool_class is not None:
        runtime_names.add(tool_class.name)

Tests Added

test_agent_verify_builtin_tools_included_in_check - Verifies 'finish' builtin is correctly recognized
test_agent_verify_think_builtin_tool_included - Verifies 'think' builtin is correctly recognized
test_agent_verify_missing_builtin_tool_fails - Verifies failure when a used builtin is not configured

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

@malhotra5 can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:dc2be73-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-dc2be73-python \
  ghcr.io/openhands/agent-server:dc2be73-python

All tags pushed for this build

ghcr.io/openhands/agent-server:dc2be73-golang-amd64
ghcr.io/openhands/agent-server:dc2be73-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:dc2be73-golang-arm64
ghcr.io/openhands/agent-server:dc2be73-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:dc2be73-java-amd64
ghcr.io/openhands/agent-server:dc2be73-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:dc2be73-java-arm64
ghcr.io/openhands/agent-server:dc2be73-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:dc2be73-python-amd64
ghcr.io/openhands/agent-server:dc2be73-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:dc2be73-python-arm64
ghcr.io/openhands/agent-server:dc2be73-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:dc2be73-golang
ghcr.io/openhands/agent-server:dc2be73-java
ghcr.io/openhands/agent-server:dc2be73-python

About Multi-Architecture Support

Each variant tag (e.g., dc2be73-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., dc2be73-python-amd64) are also available if needed

The verify method was failing to recognize builtin tools (finish, think) when checking events against available runtime tools. This caused errors like: 'Cannot resume conversation: tools that were used in history are missing from runtime: ["finish"]. Available tools: [...]' The fix adds builtin tool runtime names from include_default_tools to the runtime_names set when checking against event history. Co-authored-by: openhands <[email protected]>

malhotra5 · 2026-01-12T22:10:50Z

I've confirmed that this fix works in the CLI

Co-authored-by: openhands <[email protected]>

github-actions · 2026-01-12T22:12:35Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
base.py	180	16	91%	187, 261–262, 273–275, 285, 295, 303–304, 413, 427, 464–465, 475–476
TOTAL	15120	4434	70%

enyst

LGTM, thank you!

I just saw an issue with these not being recognized too, probably because they're, idk, elsewhere than the usual tools. I think maybe we can sometime rethink if they could be in self.tools 🤔

all-hands-bot

The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:

Persisted agent has include_default_tools=["FinishTool"]
Runtime agent has include_default_tools=[] (missing FinishTool!)
Events show 'finish' was used
But verification passes because tool specs match (early return taken before checking events)

Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.

Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:

if runtime_names == persisted_names and (
    events is None or self.include_default_tools == persisted.include_default_tools
):
    return self

See inline comments for additional feedback.

all-hands-bot · 2026-01-12T22:19:38Z

openhands-sdk/openhands/sdk/agent/base.py

+            # Add builtin tool names from include_default_tools
+            # These are runtime names like 'finish', 'think'
+            for tool_class_name in self.include_default_tools:
+                tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
+                if tool_class is not None:
+                    runtime_names.add(tool_class.name)


🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.

The runtime_names set now contains a mix of:

Tool spec class names like "TerminalTool" (from line 372)

Builtin runtime names like "finish", "think" (added here)

This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:

Suggested change

# Add builtin tool names from include_default_tools

# These are runtime names like 'finish', 'think'

for tool_class_name in self.include_default_tools:

tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)

if tool_class is not None:

runtime_names.add(tool_class.name)

# Add builtin tool names from include_default_tools

# These are runtime names like 'finish', 'think'

# Note: runtime_names now contains a mix of Tool spec class names

# (e.g. "TerminalTool") and builtin runtime names (e.g. "finish").

# This works because events only reference runtime names.

for tool_class_name in self.include_default_tools:

tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)

if tool_class is not None:

runtime_names.add(tool_class.name)

all-hands-bot · 2026-01-12T22:19:39Z

tests/sdk/conversation/local/test_state_serialization.py

+    # to the check.
+    result = runtime_agent.verify(persisted_agent, events=events_with_finish)
+    assert result is runtime_agent
+


🟠 Important: Missing test coverage for the early return bug.

All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.

Missing scenario that exposes the bug:

Both agents have SAME tools (e.g., both have TerminalTool)

But different include_default_tools (one has FinishTool, other doesn't)

Events show a builtin tool was used

This case currently has a bug where verification incorrectly passes (see review body).

Suggested additional test:

def test_agent_verify_same_tools_different_builtins_fails(): """Test that verify fails when tools match but required builtin is missing.""" llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm") # Persisted agent has FinishTool persisted_agent_obj = Agent( llm=llm, tools=[Tool(name="TerminalTool")], include_default_tools=["FinishTool"], ) # Events show 'finish' was used events_with_finish = [ ActionEvent( source="agent", thought=[], tool_name="finish", tool_call_id="call_123", tool_call=MessageToolCall( id="call_123", name="finish", arguments='{"message": "Done!"}', origin="completion", ), llm_response_id="resp_123", ), ] serialized = persisted_agent_obj.model_dump_json() persisted_agent = AgentBase.model_validate_json(serialized) # Runtime agent MISSING FinishTool runtime_agent = Agent( llm=llm, tools=[Tool(name="TerminalTool")], # Same tool! include_default_tools=[], # Missing FinishTool! ) # Should fail but currently passes due to early return bug with pytest.raises(ValueError, match="missing from runtime.*finish"): runtime_agent.verify(persisted_agent, events=events_with_finish)

all-hands-bot

The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:

Persisted agent has include_default_tools=["FinishTool"]
Runtime agent has include_default_tools=[] (missing FinishTool!)
Events show 'finish' was used
But verification passes because tool specs match (early return taken before checking events)

Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.

Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:

if runtime_names == persisted_names and (
    events is None or self.include_default_tools == persisted.include_default_tools
):
    return self

See inline comments for additional feedback.

all-hands-bot · 2026-01-12T22:19:44Z

openhands-sdk/openhands/sdk/agent/base.py

+            # Add builtin tool names from include_default_tools
+            # These are runtime names like 'finish', 'think'
+            for tool_class_name in self.include_default_tools:
+                tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
+                if tool_class is not None:
+                    runtime_names.add(tool_class.name)


🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.

The runtime_names set now contains a mix of:

Tool spec class names like "TerminalTool" (from line 372)

Builtin runtime names like "finish", "think" (added here)

This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:

Suggested change

# Add builtin tool names from include_default_tools

# These are runtime names like 'finish', 'think'

for tool_class_name in self.include_default_tools:

tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)

if tool_class is not None:

runtime_names.add(tool_class.name)

# Add builtin tool names from include_default_tools

# These are runtime names like 'finish', 'think'

# Note: runtime_names now contains a mix of Tool spec class names

# (e.g. "TerminalTool") and builtin runtime names (e.g. "finish").

# This works because events only reference runtime names.

for tool_class_name in self.include_default_tools:

tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)

if tool_class is not None:

runtime_names.add(tool_class.name)

all-hands-bot · 2026-01-12T22:19:44Z

tests/sdk/conversation/local/test_state_serialization.py

+    # to the check.
+    result = runtime_agent.verify(persisted_agent, events=events_with_finish)
+    assert result is runtime_agent
+


🟠 Important: Missing test coverage for the early return bug.

All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.

Missing scenario that exposes the bug:

Both agents have SAME tools (e.g., both have TerminalTool)

But different include_default_tools (one has FinishTool, other doesn't)

Events show a builtin tool was used

This case currently has a bug where verification incorrectly passes (see review body).

Suggested additional test:

def test_agent_verify_same_tools_different_builtins_fails(): """Test that verify fails when tools match but required builtin is missing.""" llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm") # Persisted agent has FinishTool persisted_agent_obj = Agent( llm=llm, tools=[Tool(name="TerminalTool")], include_default_tools=["FinishTool"], ) # Events show 'finish' was used events_with_finish = [ ActionEvent( source="agent", thought=[], tool_name="finish", tool_call_id="call_123", tool_call=MessageToolCall( id="call_123", name="finish", arguments='{"message": "Done!"}', origin="completion", ), llm_response_id="resp_123", ), ] serialized = persisted_agent_obj.model_dump_json() persisted_agent = AgentBase.model_validate_json(serialized) # Runtime agent MISSING FinishTool runtime_agent = Agent( llm=llm, tools=[Tool(name="TerminalTool")], # Same tool! include_default_tools=[], # Missing FinishTool! ) # Should fail but currently passes due to early return bug with pytest.raises(ValueError, match="missing from runtime.*finish"): runtime_agent.verify(persisted_agent, events=events_with_finish)

malhotra5 · 2026-01-12T22:27:12Z

The critical issue outlined above is not a problem because builtin tools are not serialized as part of agent.tools

So we only need to include built-in tools when comparing directly against the event history

malhotra5 marked this pull request as ready for review January 12, 2026 22:11

Fix line length issues for ruff linter

67d253e

Co-authored-by: openhands <[email protected]>

enyst approved these changes Jan 12, 2026

View reviewed changes

malhotra5 merged commit a076309 into main Jan 12, 2026
21 checks passed

malhotra5 deleted the fix-verify-builtin-tools branch January 12, 2026 22:19

all-hands-bot reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix verify method to include builtin tools in event check #1710

Fix verify method to include builtin tools in event check #1710

Uh oh!

malhotra5 commented Jan 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

malhotra5 commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

enyst left a comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jan 12, 2026

Uh oh!

all-hands-bot Jan 12, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Jan 12, 2026

Uh oh!

all-hands-bot Jan 12, 2026

Uh oh!

malhotra5 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix verify method to include builtin tools in event check #1710

Fix verify method to include builtin tools in event check #1710

Uh oh!

Conversation

malhotra5 commented Jan 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Fix

Tests Added

Checklist

Uh oh!

malhotra5 commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

Uh oh!

all-hands-bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

Uh oh!

all-hands-bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

malhotra5 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

malhotra5 commented Jan 12, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 12, 2026 •

edited

Loading