Skip to content

Conversation

@malhotra5
Copy link
Collaborator

@malhotra5 malhotra5 commented Jan 12, 2026

Summary

Fix a bug in the AgentBase.verify() method where builtin tools (like finish and think) were not being included when checking event history against available runtime tools. (introduced in #1542)

This caused errors when resuming conversations that used builtin tools:

Cannot resume conversation: tools that were used in history are missing from runtime: ['finish']. Available tools: ['delegate', 'file_editor', 'task_tracker', 'terminal']

The Problem

When verify() checks events to see if used tools exist in the runtime agent:

  • runtime_names was computed from self.tools (Tool specs like 'TerminalTool')
  • But builtin tools configured via include_default_tools were not included
  • Events contain runtime names like 'finish', 'think' for builtin tools

The Fix

Added builtin tool runtime names from include_default_tools to runtime_names when checking against event history:

# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
    tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
    if tool_class is not None:
        runtime_names.add(tool_class.name)

Tests Added

  • test_agent_verify_builtin_tools_included_in_check - Verifies 'finish' builtin is correctly recognized
  • test_agent_verify_think_builtin_tool_included - Verifies 'think' builtin is correctly recognized
  • test_agent_verify_missing_builtin_tool_fails - Verifies failure when a used builtin is not configured

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@malhotra5 can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:dc2be73-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-dc2be73-python \
  ghcr.io/openhands/agent-server:dc2be73-python

All tags pushed for this build

ghcr.io/openhands/agent-server:dc2be73-golang-amd64
ghcr.io/openhands/agent-server:dc2be73-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:dc2be73-golang-arm64
ghcr.io/openhands/agent-server:dc2be73-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:dc2be73-java-amd64
ghcr.io/openhands/agent-server:dc2be73-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:dc2be73-java-arm64
ghcr.io/openhands/agent-server:dc2be73-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:dc2be73-python-amd64
ghcr.io/openhands/agent-server:dc2be73-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:dc2be73-python-arm64
ghcr.io/openhands/agent-server:dc2be73-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:dc2be73-golang
ghcr.io/openhands/agent-server:dc2be73-java
ghcr.io/openhands/agent-server:dc2be73-python

About Multi-Architecture Support

  • Each variant tag (e.g., dc2be73-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., dc2be73-python-amd64) are also available if needed

The verify method was failing to recognize builtin tools (finish, think)
when checking events against available runtime tools. This caused errors
like:
  'Cannot resume conversation: tools that were used in history are
   missing from runtime: ["finish"]. Available tools: [...]'

The fix adds builtin tool runtime names from include_default_tools to the
runtime_names set when checking against event history.

Co-authored-by: openhands <[email protected]>
@malhotra5
Copy link
Collaborator Author

I've confirmed that this fix works in the CLI

@malhotra5 malhotra5 marked this pull request as ready for review January 12, 2026 22:11
@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   base.py1801691%187, 261–262, 273–275, 285, 295, 303–304, 413, 427, 464–465, 475–476
TOTAL15120443470% 

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

I just saw an issue with these not being recognized too, probably because they're, idk, elsewhere than the usual tools. I think maybe we can sometime rethink if they could be in self.tools 🤔

@malhotra5 malhotra5 merged commit a076309 into main Jan 12, 2026
21 checks passed
@malhotra5 malhotra5 deleted the fix-verify-builtin-tools branch January 12, 2026 22:19
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:

  • Persisted agent has include_default_tools=["FinishTool"]
  • Runtime agent has include_default_tools=[] (missing FinishTool!)
  • Events show 'finish' was used
  • But verification passes because tool specs match (early return taken before checking events)

Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.

Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:

if runtime_names == persisted_names and (
    events is None or self.include_default_tools == persisted.include_default_tools
):
    return self

See inline comments for additional feedback.

Comment on lines +387 to +392
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.

The runtime_names set now contains a mix of:

  • Tool spec class names like "TerminalTool" (from line 372)
  • Builtin runtime names like "finish", "think" (added here)

This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:

Suggested change
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
# Note: runtime_names now contains a mix of Tool spec class names
# (e.g. "TerminalTool") and builtin runtime names (e.g. "finish").
# This works because events only reference runtime names.
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)

# to the check.
result = runtime_agent.verify(persisted_agent, events=events_with_finish)
assert result is runtime_agent

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing test coverage for the early return bug.

All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.

Missing scenario that exposes the bug:

  • Both agents have SAME tools (e.g., both have TerminalTool)
  • But different include_default_tools (one has FinishTool, other doesn't)
  • Events show a builtin tool was used

This case currently has a bug where verification incorrectly passes (see review body).

Suggested additional test:

def test_agent_verify_same_tools_different_builtins_fails():
    """Test that verify fails when tools match but required builtin is missing."""
    llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm")
    
    # Persisted agent has FinishTool
    persisted_agent_obj = Agent(
        llm=llm,
        tools=[Tool(name="TerminalTool")],
        include_default_tools=["FinishTool"],
    )
    
    # Events show 'finish' was used
    events_with_finish = [
        ActionEvent(
            source="agent",
            thought=[],
            tool_name="finish",
            tool_call_id="call_123",
            tool_call=MessageToolCall(
                id="call_123",
                name="finish",
                arguments='{"message": "Done!"}',
                origin="completion",
            ),
            llm_response_id="resp_123",
        ),
    ]
    
    serialized = persisted_agent_obj.model_dump_json()
    persisted_agent = AgentBase.model_validate_json(serialized)
    
    # Runtime agent MISSING FinishTool
    runtime_agent = Agent(
        llm=llm,
        tools=[Tool(name="TerminalTool")],  # Same tool!
        include_default_tools=[],  # Missing FinishTool!
    )
    
    # Should fail but currently passes due to early return bug
    with pytest.raises(ValueError, match="missing from runtime.*finish"):
        runtime_agent.verify(persisted_agent, events=events_with_finish)

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix correctly addresses the immediate issue where builtin tools were not recognized when checking events. However, I found a critical bug in the existing code that this PR doesn't address:

🔴 Critical Issue: Early Return Bypasses Builtin Tool Check

The Bug: Lines 375-376 in base.py return early when runtime_names == persisted_names (tool specs match), WITHOUT checking if builtin tools from include_default_tools match. This allows verification to incorrectly pass when:

  • Persisted agent has include_default_tools=["FinishTool"]
  • Runtime agent has include_default_tools=[] (missing FinishTool!)
  • Events show 'finish' was used
  • But verification passes because tool specs match (early return taken before checking events)

Proof: I created a test case that demonstrates this bug - verification incorrectly passes when it should fail.

Recommended Fix: The early return at line 375-376 should check if include_default_tools matches, or not return early when events are provided:

if runtime_names == persisted_names and (
    events is None or self.include_default_tools == persisted.include_default_tools
):
    return self

See inline comments for additional feedback.

Comment on lines +387 to +392
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The fix is correct, but consider adding a comment about naming consistency.

The runtime_names set now contains a mix of:

  • Tool spec class names like "TerminalTool" (from line 372)
  • Builtin runtime names like "finish", "think" (added here)

This works because events only contain runtime names, but it's conceptually inconsistent. A clarifying comment would help future maintainers:

Suggested change
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)
# Add builtin tool names from include_default_tools
# These are runtime names like 'finish', 'think'
# Note: runtime_names now contains a mix of Tool spec class names
# (e.g. "TerminalTool") and builtin runtime names (e.g. "finish").
# This works because events only reference runtime names.
for tool_class_name in self.include_default_tools:
tool_class = BUILT_IN_TOOL_CLASSES.get(tool_class_name)
if tool_class is not None:
runtime_names.add(tool_class.name)

# to the check.
result = runtime_agent.verify(persisted_agent, events=events_with_finish)
assert result is runtime_agent

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing test coverage for the early return bug.

All three tests have different tool specs between persisted and runtime agents (TerminalTool vs FileEditorTool), so they never trigger the early return at line 375-376 of base.py.

Missing scenario that exposes the bug:

  • Both agents have SAME tools (e.g., both have TerminalTool)
  • But different include_default_tools (one has FinishTool, other doesn't)
  • Events show a builtin tool was used

This case currently has a bug where verification incorrectly passes (see review body).

Suggested additional test:

def test_agent_verify_same_tools_different_builtins_fails():
    """Test that verify fails when tools match but required builtin is missing."""
    llm = LLM(model="gpt-4o-mini", api_key=SecretStr("test-key"), usage_id="test-llm")
    
    # Persisted agent has FinishTool
    persisted_agent_obj = Agent(
        llm=llm,
        tools=[Tool(name="TerminalTool")],
        include_default_tools=["FinishTool"],
    )
    
    # Events show 'finish' was used
    events_with_finish = [
        ActionEvent(
            source="agent",
            thought=[],
            tool_name="finish",
            tool_call_id="call_123",
            tool_call=MessageToolCall(
                id="call_123",
                name="finish",
                arguments='{"message": "Done!"}',
                origin="completion",
            ),
            llm_response_id="resp_123",
        ),
    ]
    
    serialized = persisted_agent_obj.model_dump_json()
    persisted_agent = AgentBase.model_validate_json(serialized)
    
    # Runtime agent MISSING FinishTool
    runtime_agent = Agent(
        llm=llm,
        tools=[Tool(name="TerminalTool")],  # Same tool!
        include_default_tools=[],  # Missing FinishTool!
    )
    
    # Should fail but currently passes due to early return bug
    with pytest.raises(ValueError, match="missing from runtime.*finish"):
        runtime_agent.verify(persisted_agent, events=events_with_finish)

@malhotra5
Copy link
Collaborator Author

The critical issue outlined above is not a problem because builtin tools are not serialized as part of agent.tools

So we only need to include built-in tools when comparing directly against the event history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants