Skip to content

feat(providers): strip chain-of-thought markers from custom provider output#8634

Closed
mvanhorn wants to merge 1 commit intoaaif-goose:mainfrom
mvanhorn:feat/8590-cot-filter
Closed

feat(providers): strip chain-of-thought markers from custom provider output#8634
mvanhorn wants to merge 1 commit intoaaif-goose:mainfrom
mvanhorn:feat/8590-cot-filter

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Custom OpenAI-compatible providers (DeepSeek R1, Qwen-style proxies, LLMProxy, etc.) emit reasoning inline in the content field as <think>...</think> or <thinking>...</thinking> blocks rather than using the structured reasoning_content JSON field. Goose surfaced those blocks directly in chat, cluttering the UI and sometimes triggering repetition.

This change adds a streaming-safe ThinkFilter / split_think_blocks helper in crates/goose/src/providers/base.rs and wires it into both the non-streaming (response_to_message) and streaming (response_to_streaming_message) paths in crates/goose/src/providers/formats/openai.rs. Inline thinking is routed into MessageContent::thinking; cleaned content flows through MessageContent::text. Structured reasoning_content / reasoning fields still take precedence, so providers that already send reasoning as a first-class field do not get duplicated output.

Streaming semantics: when a chunk ends on a partial tag prefix (<thi, <think, </thinkin, etc.), the filter holds that suffix until it can decide whether it belongs to a think block or is regular content. A chunk ending mid-tag will never leak characters to the UI that later turn out to be inside a think block.

Testing

  • cargo fmt --check: clean
  • cargo clippy -p goose --all-targets -- -D warnings: clean
  • cargo test -p goose --lib providers::base::tests: 23 passed (includes new think filter tests)
  • cargo test -p goose --lib providers::formats::openai::tests: all passed (includes new regression tests)

New unit tests cover:

  • Full-content splitter for <think>, <thinking>, attributes (<think class="x">), case-insensitive tags, multiple blocks, plain content, unterminated opens
  • Streaming across partial tag boundaries (chunks like ["<thi", "nk>x</thi", "nk>y"])
  • Non-think tags (<table>) are treated as content
  • Structured reasoning_content path still wins when both are present

Related Issues

Closes #8590

Prior art: zed's similar fix landed in zed-industries/zed#53488 ("Strip think tags from text stream") for an analogous UI-leakage issue.

This contribution was developed with AI assistance (Codex).

…output

Custom OpenAI-compatible providers (DeepSeek R1, Qwen, etc.) embed reasoning
in `<think>...</think>` / `<thinking>...</thinking>` blocks inline in the
content field rather than using the structured `reasoning_content` field.
Goose surfaced these blocks directly in chat, cluttering the UI.

Adds a streaming-safe `ThinkFilter` / `split_think_blocks` helper that
separates inline think blocks from content. Wires it into the OpenAI
response parser (both non-streaming and streaming paths), routing thinking
into `MessageContent::thinking` and emitting cleaned text through
`MessageContent::text`. Structured `reasoning_content` / `reasoning` fields
still take precedence so we do not duplicate reasoning for providers that
already send it as a first-class field.

Streaming buffers partial tag prefixes (`<thi`, `<think`, `</thin`, etc.)
until the decision can be made, so a chunk ending mid-tag never leaks
characters that later turn out to be inside a think block. Unit tests cover
the full-content splitter, streaming boundaries, case-insensitive tags,
attributes, multiple blocks, non-think tags, and unterminated opens.

Closes aaif-goose#8590
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Misroute continues, fixing remote config now.

@mvanhorn mvanhorn closed this Apr 17, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8af5d16ed8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +198 to +200
let name = buffer.get(name_start..idx).unwrap_or_default();
let is_think = name.eq_ignore_ascii_case("think") || name.eq_ignore_ascii_case("thinking");
if !is_think {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce tag boundary when matching think/thinking tags

The parser treats any tag name that starts with think/thinking as a think block because it only consumes alphabetic characters and then compares the prefix, so inputs like <think-time> or <think123> are misclassified as think opens. Their matching closes are not recognized, which can leave ThinkFilter in inside_think mode and route the remainder of the assistant output into hidden thinking content instead of visible text. Require a valid tag-name boundary after think/thinking (for example >, whitespace, or /) before returning ThinkTag.

Useful? React with 👍 / 👎.

Comment on lines +983 to +985
if !saw_structured_reasoning && !filtered.thinking.is_empty() {
content.push(MessageContent::thinking(filtered.thinking, ""));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Defer inline-think emission until reasoning_content is resolved

In streaming responses, inline <think> content is emitted as soon as it is parsed whenever saw_structured_reasoning is still false, but providers can send delta.reasoning_content in later chunks. In that ordering, users receive inline thinking first and then structured reasoning later, which duplicates reasoning and breaks the intended “structured reasoning wins” behavior when both are present. Buffer inline-think output until end-of-stream (or another point where structured reasoning absence is certain) before yielding it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chain-of-Thought Leakage Breaking Chat Readability & Poor User Experience

1 participant