Skip to content

Conversation

@5i2urom
Copy link
Contributor

@5i2urom 5i2urom commented Jan 4, 2026

Proposed change

Changes tool output format from JSON-stringified payloads to plain multi-line text for OpenAI models. Tool responses are now formatted as structured text blocks:

status:
<success|error>

toolName:
<tool-name>

error:
<error text, if any>

output:
<raw tool output text>

This eliminates double/triple JSON-encoding that produces heavily escaped strings, making tool outputs (especially code and logs) more readable for the model.

Implementation details

  • Adds formatToolResponseText() utility that constructs the plain-text format
  • Updates OpenAIProvider.formatToolMessage() to use the new format instead of JSON.stringify()
  • Updates Responses API (buildResponsesRequest.ts) to use the same format for function_call_output
  • Adjusts formatToolResult() to handle object results with output property
  • Updates tests to verify plain-text formatting and truncation behavior
  • Adds documentation in docs/tool-output-format.md explaining the format

Notes

  • All tool outputs now use this format consistently across Chat Completions and Responses APIs
  • Unicode sanitization still applies to the final formatted text
  • Truncation indicators now appear inline (e.g., [truncated]) rather than as JSON metadata
  • Preserves raw multi-line structure of tool outputs (code, logs, etc.)

Related issue

Fixes #1002 — this PR addresses OpenAI tool outputs being double/triple JSON-encoded (over-escaped), which significantly reduces model effectiveness when working with code and complex tool results.

Summary by CodeRabbit

  • Documentation

    • Clarified provider docs to explain tool output format and linked a new guide describing the structured plain multi-line tool response format.
  • Refactor

    • Tool responses now use a human-readable plain multi-line text format (status, tool name, error, output) instead of JSON-encoded payloads.
  • Tests

    • Updated tests and expectations to reflect the new multi-line tool response serialization.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Walkthrough

Refactors OpenAI provider tool-response handling to deliver tool outputs as plain multi-line text (status, toolName, error, output) instead of JSON-stringified payloads; adds docs for the format, updates utilities and provider serialization, and adjusts tests to expect the new textual format.

Changes

Cohort / File(s) Summary
Documentation
docs/cli/providers.md, docs/cli/providers-openai-responses.md, docs/tool-output-format.md
Add docs/tool-output-format.md and update provider docs to reference the new plain multi-line tool-output format and link to the format doc.
OpenAI Provider
packages/core/src/providers/openai/OpenAIProvider.ts, packages/core/src/providers/openai/buildResponsesRequest.ts
Replace JSON.stringify-based serialization with payload builder + formatToolResponseText(); preserve string function-call args, JSON.stringify non-strings; reuse ensureJsonSafe only when required.
Utilities
packages/core/src/providers/utils/toolResponsePayload.ts
Add and export formatToolResponseText(); update formatToolResult() to return object.output when present and string.
Tests
packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts, packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts
Adjust fixtures and assertions to expect multi-line formatted tool output strings (status/toolName/error/output) instead of JSON-encoded payloads or plain strings.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAIProvider
    participant ToolResponseUtil
    participant Model as OpenAI Model

    rect rgb(245, 245, 255)
    Note over Client,Model: Before — JSON-stringified payloads (over-escaped)
    Client->>OpenAIProvider: returns tool result (string/object)
    OpenAIProvider->>ToolResponseUtil: build payload + JSON.stringify(payload)
    ToolResponseUtil-->>OpenAIProvider: "{\"status\":\"success\",\"output\":\"...\"}"
    OpenAIProvider->>OpenAIProvider: ensureJsonSafe() (wrap/escape)
    OpenAIProvider->>Model: escaped JSON string (double-encoded)
    end

    rect rgb(245, 255, 245)
    Note over Client,Model: After — Plain multi-line text format
    Client->>OpenAIProvider: returns tool result (string/object)
    OpenAIProvider->>ToolResponseUtil: build payload
    ToolResponseUtil-->>OpenAIProvider: "status: success\ntoolName: foo\noutput: ..."
    OpenAIProvider->>OpenAIProvider: ensureJsonSafe() only if needed
    OpenAIProvider->>Model: plain multi-line text (readable)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

maintainer:e2e:ok

Suggested reviewers

  • acoliver

Poem

🐰 Hopping through code where escapes once curled,

Plain lines now sing, no JSON world,
Status and name in tidy sight,
Tools speak clearly, day and night,
I nibble bugs and bound with delight.

Pre-merge checks

✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: converting tool responses from JSON-stringified format to raw text format.
Description check ✅ Passed The description provides comprehensive coverage of the change with TLDR, implementation details, and notes, but lacks required Reviewer Test Plan, Testing Matrix, and explicit linked issue reference.
Linked Issues check ✅ Passed All coding objectives from issue #1002 are met: tool outputs are now formatted as plain multi-line text instead of JSON-stringified payloads, applied consistently across APIs, with Unicode sanitization preserved.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing plain-text tool response formatting per issue #1002; no unrelated modifications detected.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2026

Looking at PR #1003, I need to review the implementation of switching tool response format from JSON to plain text to address the over-escaping issue.

LLxprt PR Review – PR #1003

Issue Alignment

Side Effects

  • Shared module toolResponsePayload.ts modified with new formatToolResponseText() function that affects all providers
  • Backwards compatibility maintained as existing tests pass after minimal updates
  • Unicode sanitization preserves behavior while no longer JSON-wrapping payloads

Code Quality

  • Clean separation between structured text construction and JSON serialization
  • Proper handling of edge cases (missing toolName, empty results)
  • Updated formatToolResult() now extracts .output property from objects, improving tool response handling

Tests & Coverage

  • Coverage impact: unchanged – existing tests run successfully but show minimal new coverage for the formatting function
  • Test gaps: No dedicated unit tests for formatToolResponseText() function itself
  • "Mock theater" risk minimal as existing compressToolMessages test verifies actual formatting behavior

Verdict

@5i2urom 5i2urom marked this pull request as ready for review January 4, 2026 08:56
@5i2urom 5i2urom requested a review from acoliver as a code owner January 4, 2026 08:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c260738 and 4d46920.

📒 Files selected for processing (8)
  • docs/cli/providers-openai-responses.md
  • docs/cli/providers.md
  • docs/tool-output-format.md
  • packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts
  • packages/core/src/providers/openai/OpenAIProvider.ts
  • packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts
  • packages/core/src/providers/openai/buildResponsesRequest.ts
  • packages/core/src/providers/utils/toolResponsePayload.ts
🧰 Additional context used
📓 Path-based instructions (2)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation-only changes (*.md files, docs/) do NOT require build/test/lint cycle

Files:

  • docs/cli/providers.md
  • docs/tool-output-format.md
  • docs/cli/providers-openai-responses.md
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Don't use any - Always specify proper types. Use unknown if the type is truly unknown and add proper type guards.
Do not use console.log or console.debug - Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings about any types

Files:

  • packages/core/src/providers/openai/OpenAIProvider.ts
  • packages/core/src/providers/openai/buildResponsesRequest.ts
  • packages/core/src/providers/utils/toolResponsePayload.ts
  • packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts
  • packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.
📚 Learning: 2025-12-18T14:06:22.557Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct `JSON.stringify(toolResponseBlock.result)` and needs to be updated to support ephemeral settings like the other providers.

Applied to files:

  • docs/cli/providers.md
  • packages/core/src/providers/openai/OpenAIProvider.ts
  • packages/core/src/providers/openai/buildResponsesRequest.ts
  • packages/core/src/providers/utils/toolResponsePayload.ts
  • packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts
  • docs/cli/providers-openai-responses.md
  • packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts
📚 Learning: 2025-11-16T22:51:26.374Z
Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like `run_shell_command` use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final `ToolResponseBlock` containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.

Applied to files:

  • docs/cli/providers.md
  • docs/tool-output-format.md
  • packages/core/src/providers/openai/OpenAIProvider.ts
  • packages/core/src/providers/openai/buildResponsesRequest.ts
  • packages/core/src/providers/utils/toolResponsePayload.ts
  • packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts
  • docs/cli/providers-openai-responses.md
  • packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts
📚 Learning: 2025-12-14T16:16:39.939Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-14T16:16:39.939Z
Learning: OpenAI's current model family (as of December 2025) is GPT-5, including gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-thinking, gpt-5-pro, and GPT-5.2 variants (Instant/Thinking/Pro). GPT-4.1 and GPT-4o are outdated.

Applied to files:

  • docs/cli/providers.md
📚 Learning: 2025-12-14T03:59:56.073Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-14T03:59:56.073Z
Learning: GLM-4.6 can be accessed via Z.ai using Anthropic provider with base-url https://api.z.ai/v1 and model zai-glm-4.6, in addition to Synthetic (hf:zai-org/GLM-4.6) and Cerebras (zai-glm-4.6).

Applied to files:

  • docs/cli/providers.md
📚 Learning: 2025-12-18T14:06:22.557Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: In the llxprt-code codebase, the `tool-output-max-items` ephemeral setting only applies to tools that return lists/collections (glob, grep, read-many-files), not to single-output tools like shell. Not all ephemerals are relevant to every tool.

Applied to files:

  • docs/tool-output-format.md
🧬 Code graph analysis (2)
packages/core/src/providers/openai/OpenAIProvider.ts (1)
packages/core/src/providers/utils/toolResponsePayload.ts (1)
  • formatToolResponseText (15-39)
packages/core/src/providers/openai/buildResponsesRequest.ts (1)
packages/core/src/providers/utils/toolResponsePayload.ts (2)
  • buildToolResponsePayload (148-178)
  • formatToolResponseText (15-39)
🪛 markdownlint-cli2 (0.18.1)
docs/tool-output-format.md

11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (ubuntu-latest, 24.x)
  • GitHub Check: Test (windows-latest, 24.x)
  • GitHub Check: Test (macos-latest, 24.x)
🔇 Additional comments (12)
docs/cli/providers-openai-responses.md (1)

324-324: LGTM!

The addition effectively guides users to the tool output format documentation when troubleshooting tool call issues.

docs/cli/providers.md (1)

95-95: LGTM!

The update accurately reflects the new tool output format and provides helpful context with a direct link to the detailed documentation.

packages/core/src/providers/openai/buildResponsesRequest.stripToolCalls.test.ts (1)

71-72: Test expectation correctly updated for new plain-text format.

The expected output structure matches the formatToolResponseText implementation with fields separated by newlines: status, toolName, error (empty in success case), and output containing the raw tool result.

packages/core/src/providers/openai/OpenAIProvider.compressToolMessages.test.ts (2)

25-27: Test content updated to match new plain-text format.

The test now uses the structured multi-line format produced by formatToolResponseText, which correctly reflects the production behavior.


56-58: Truncation assertion appropriately simplified.

Since the content is now plain text rather than JSON, checking for the [truncated substring is the correct approach to verify truncation behavior.

packages/core/src/providers/openai/OpenAIProvider.ts (2)

65-68: Import changes look good.

The new imports correctly bring in both buildToolResponsePayload for constructing the payload and formatToolResponseText for serializing to plain text format.


1128-1141: Tool response content formatting correctly updated.

The implementation properly:

  1. Builds the payload via buildToolResponsePayload
  2. Uses formatToolResponseText with appropriate fallback for toolName (payload.toolName ?? block.toolName)
  3. Applies ensureJsonSafe for Unicode sanitization

This aligns with the PR objective of eliminating JSON encoding while preserving raw text structure.

packages/core/src/providers/openai/buildResponsesRequest.ts (3)

5-9: Imports correctly added for new payload utilities.

The import statement properly brings in both functions needed for the new tool response formatting approach.


231-234: Good fix to preserve string parameters.

Preserving string parameters as-is and only JSON-stringifying non-string parameters prevents double-encoding issues when tool call arguments are already serialized strings.


245-259: Tool response formatting correctly unified with OpenAIProvider.

The implementation follows the same pattern as OpenAIProvider.buildToolResponseContent:

  1. Build payload via buildToolResponsePayload
  2. Format with formatToolResponseText using consistent field mapping
  3. Apply Unicode sanitization conditionally

This ensures consistent tool output format across both Chat Completions and Responses APIs as stated in the PR objectives.

packages/core/src/providers/utils/toolResponsePayload.ts (2)

15-39: New formatToolResponseText utility is well-structured.

The function creates a consistent multi-line format with clear section headers. Each field is separated by blank lines, making the output readable for models. The use of ?? operators with empty strings ensures consistent output structure even when optional fields are absent.


91-101: Good enhancement to handle wrapped output objects.

This change correctly handles tool results that arrive wrapped in an object with an output property (e.g., { output: "actual content" }). Extracting the string directly prevents unnecessary JSON serialization of the wrapper object, which aligns with the PR goal of preserving raw text structure.

The fallback to coerceToString for other object types ensures backward compatibility.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/tool-output-format.md (1)

11-23: Consider adding example output and documenting truncation behavior.

The format specification is clear, but adding a concrete example with actual values would help readers visualize the output. Additionally, the PR objectives mention that "truncation indicators now appear inline (e.g., [truncated])" — documenting this behavior and when it occurs would make the format more complete for users who encounter truncated outputs.

🔎 Suggested addition: example and truncation note

You could add something like:

### Example

```text
status:
success

toolName:
run_shell_command

error:


output:
file1.txt
file2.txt

Truncation

When tool output exceeds internal size limits, it is truncated with [truncated] appended to indicate where the output was cut off.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 4d4692018e063cf7fe77272d6438c3823d7b456a and fdbb2b443e69f74ac87deebe42853a78dcb0479e.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/tool-output-format.md`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (1)</summary>

<details>
<summary>**/*.md</summary>


**📄 CodeRabbit inference engine (CLAUDE.md)**

> Documentation-only changes (*.md files, docs/) do NOT require build/test/lint cycle

Files:
- `docs/tool-output-format.md`

</details>

</details><details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like run_shell_command use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final ToolResponseBlock containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.


Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-18T14:06:22.557Z
Learning: OpenAIResponsesProvider (packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts) currently bypasses the ephemeral truncation system by using direct JSON.stringify(toolResponseBlock.result) and needs to be updated to support ephemeral settings like the other providers.


</details>
<details>
<summary>📚 Learning: 2025-11-16T22:51:26.374Z</summary>

Learnt from: e2720pjk
Repo: vybestack/llxprt-code PR: 583
File: packages/core/src/providers/openai/OpenAIProvider.ts:935-959
Timestamp: 2025-11-16T22:51:26.374Z
Learning: In the llxprt-code codebase (packages/core/src/providers/openai/OpenAIProvider.ts), tools like run_shell_command use internal streaming only for real-time UI updates during execution, but each tool execution produces exactly ONE final ToolResponseBlock containing the full result. The streaming chunks are never sent to the LLM and are not converted into multiple tool messages. The OpenAI Chat Completions API requires that each tool call (tool_call_id) corresponds to exactly one message with role 'tool', so duplicate tool response detection that removes subsequent tool messages with the same tool_call_id is correct and necessary for API compliance.


**Applied to files:**
- `docs/tool-output-format.md`

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary>

* GitHub Check: Test (windows-latest, 24.x)
* GitHub Check: Test (macos-latest, 24.x)
* GitHub Check: Test (ubuntu-latest, 24.x)

</details>

<details>
<summary>🔇 Additional comments (1)</summary><blockquote>

<details>
<summary>docs/tool-output-format.md (1)</summary><blockquote>

`1-29`: **Clear documentation of the new plain-text format.**

The documentation effectively explains the rationale (avoiding double/triple JSON encoding) and clearly specifies the four-field format structure. The code block now properly includes the `text` language identifier, resolving the prior linter concern.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@acoliver acoliver merged commit 85aea50 into vybestack:main Jan 4, 2026
19 checks passed
@5i2urom 5i2urom deleted the fix/toolcall-raw-text-format branch January 5, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI tool outputs are double/triple JSON-encoded (over-escaped)

2 participants