-
Notifications
You must be signed in to change notification settings - Fork 109
Fix verify method to include builtin tools in event check #1710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The verify method was failing to recognize builtin tools (finish, think) when checking events against available runtime tools. This caused errors like: 'Cannot resume conversation: tools that were used in history are missing from runtime: ["finish"]. Available tools: [...]' The fix adds builtin tool runtime names from include_default_tools to the runtime_names set when checking against event history. Co-authored-by: openhands <[email protected]>
|
I've confirmed that this fix works in the CLI |
Co-authored-by: openhands <[email protected]>
enyst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
I just saw an issue with these not being recognized too, probably because they're, idk, elsewhere than the usual tools. I think maybe we can sometime rethink if they could be in self.tools 🤔
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:
🔴 Critical Issue: Early Return Bypasses Builtin Tool Check
The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:
- Persisted agent has
include_default_tools=["FinishTool"] - Runtime agent has
include_default_tools=[](missing FinishTool!) - Events show 'finish' was used
- But verification passes because tool specs match (early return taken before checking events)
Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.
Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:
if runtime_names == persisted_names and (
events is None or self.include_default_tools == persisted.include_default_tools
):
return selfSee inline comments for additional feedback.
| # Add builtin tool names from include_default_tools | ||
| # These are runtime names like 'finish', 'think' | ||
| for tool_class_name in self.include_default_tools: | ||
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | ||
| if tool_class is not None: | ||
| runtime_names.add(tool_class.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.
The runtime_names set now contains a mix of:
- Tool spec class names like
"TerminalTool"(from line 372) - Builtin runtime names like
"finish","think"(added here)
This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:
| # Add builtin tool names from include_default_tools | |
| # These are runtime names like 'finish', 'think' | |
| for tool_class_name in self.include_default_tools: | |
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | |
| if tool_class is not None: | |
| runtime_names.add(tool_class.name) | |
| # Add builtin tool names from include_default_tools | |
| # These are runtime names like 'finish', 'think' | |
| # Note: runtime_names now contains a mix of Tool spec class names | |
| # (e.g. "TerminalTool") and builtin runtime names (e.g. "finish"). | |
| # This works because events only reference runtime names. | |
| for tool_class_name in self.include_default_tools: | |
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | |
| if tool_class is not None: | |
| runtime_names.add(tool_class.name) |
| # to the check. | ||
| result = runtime_agent.verify(persisted_agent, events=events_with_finish) | ||
| assert result is runtime_agent | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 Important: Missing test coverage for the early return bug.
All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.
Missing scenario that exposes the bug:
- Both agents have SAME tools (e.g., both have TerminalTool)
- But different
include_default_tools(one has FinishTool, other doesn't) - Events show a builtin tool was used
This case currently has a bug where verification incorrectly passes (see review body).
Suggested additional test:
def test_agent_verify_same_tools_different_builtins_fails():
"""Test that verify fails when tools match but required builtin is missing."""
llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm")
# Persisted agent has FinishTool
persisted_agent_obj = Agent(
llm=llm,
tools=[Tool(name="TerminalTool")],
include_default_tools=["FinishTool"],
)
# Events show 'finish' was used
events_with_finish = [
ActionEvent(
source="agent",
thought=[],
tool_name="finish",
tool_call_id="call_123",
tool_call=MessageToolCall(
id="call_123",
name="finish",
arguments='{"message": "Done!"}',
origin="completion",
),
llm_response_id="resp_123",
),
]
serialized = persisted_agent_obj.model_dump_json()
persisted_agent = AgentBase.model_validate_json(serialized)
# Runtime agent MISSING FinishTool
runtime_agent = Agent(
llm=llm,
tools=[Tool(name="TerminalTool")], # Same tool!
include_default_tools=[], # Missing FinishTool!
)
# Should fail but currently passes due to early return bug
with pytest.raises(ValueError, match="missing from runtime.*finish"):
runtime_agent.verify(persisted_agent, events=events_with_finish)
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:
🔴 Critical Issue: Early Return Bypasses Builtin Tool Check
The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:
- Persisted agent has
include_default_tools=["FinishTool"] - Runtime agent has
include_default_tools=[](missing FinishTool!) - Events show 'finish' was used
- But verification passes because tool specs match (early return taken before checking events)
Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.
Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:
if runtime_names == persisted_names and (
events is None or self.include_default_tools == persisted.include_default_tools
):
return selfSee inline comments for additional feedback.
| # Add builtin tool names from include_default_tools | ||
| # These are runtime names like 'finish', 'think' | ||
| for tool_class_name in self.include_default_tools: | ||
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | ||
| if tool_class is not None: | ||
| runtime_names.add(tool_class.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.
The runtime_names set now contains a mix of:
- Tool spec class names like
"TerminalTool"(from line 372) - Builtin runtime names like
"finish","think"(added here)
This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:
| # Add builtin tool names from include_default_tools | |
| # These are runtime names like 'finish', 'think' | |
| for tool_class_name in self.include_default_tools: | |
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | |
| if tool_class is not None: | |
| runtime_names.add(tool_class.name) | |
| # Add builtin tool names from include_default_tools | |
| # These are runtime names like 'finish', 'think' | |
| # Note: runtime_names now contains a mix of Tool spec class names | |
| # (e.g. "TerminalTool") and builtin runtime names (e.g. "finish"). | |
| # This works because events only reference runtime names. | |
| for tool_class_name in self.include_default_tools: | |
| tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name) | |
| if tool_class is not None: | |
| runtime_names.add(tool_class.name) |
| # to the check. | ||
| result = runtime_agent.verify(persisted_agent, events=events_with_finish) | ||
| assert result is runtime_agent | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 Important: Missing test coverage for the early return bug.
All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.
Missing scenario that exposes the bug:
- Both agents have SAME tools (e.g., both have TerminalTool)
- But different
include_default_tools(one has FinishTool, other doesn't) - Events show a builtin tool was used
This case currently has a bug where verification incorrectly passes (see review body).
Suggested additional test:
def test_agent_verify_same_tools_different_builtins_fails():
"""Test that verify fails when tools match but required builtin is missing."""
llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm")
# Persisted agent has FinishTool
persisted_agent_obj = Agent(
llm=llm,
tools=[Tool(name="TerminalTool")],
include_default_tools=["FinishTool"],
)
# Events show 'finish' was used
events_with_finish = [
ActionEvent(
source="agent",
thought=[],
tool_name="finish",
tool_call_id="call_123",
tool_call=MessageToolCall(
id="call_123",
name="finish",
arguments='{"message": "Done!"}',
origin="completion",
),
llm_response_id="resp_123",
),
]
serialized = persisted_agent_obj.model_dump_json()
persisted_agent = AgentBase.model_validate_json(serialized)
# Runtime agent MISSING FinishTool
runtime_agent = Agent(
llm=llm,
tools=[Tool(name="TerminalTool")], # Same tool!
include_default_tools=[], # Missing FinishTool!
)
# Should fail but currently passes due to early return bug
with pytest.raises(ValueError, match="missing from runtime.*finish"):
runtime_agent.verify(persisted_agent, events=events_with_finish)|
The critical issue outlined above is not a problem because builtin tools are not serialized as part of agent.tools So we only need to include built-in tools when comparing directly against the event history |
Summary
Fix a bug in the
AgentBase.verify()method where builtin tools (likefinishandthink) were not being included when checking event history against available runtime tools. (introduced in #1542)This caused errors when resuming conversations that used builtin tools:
The Problem
When
verify()checks events to see if used tools exist in the runtime agent:runtime_nameswas computed fromself.tools(Tool specs like 'TerminalTool')include_default_toolswere not included'finish','think'for builtin toolsThe Fix
Added builtin tool runtime names from
include_default_toolstoruntime_nameswhen checking against event history:Tests Added
test_agent_verify_builtin_tools_included_in_check- Verifies 'finish' builtin is correctly recognizedtest_agent_verify_think_builtin_tool_included- Verifies 'think' builtin is correctly recognizedtest_agent_verify_missing_builtin_tool_fails- Verifies failure when a used builtin is not configuredChecklist
@malhotra5 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:dc2be73-pythonRun
All tags pushed for this build
About Multi-Architecture Support
dc2be73-python) is a multi-arch manifest supporting both amd64 and arm64dc2be73-python-amd64) are also available if needed