feat(providers): strip chain-of-thought markers from custom provider output#8633
feat(providers): strip chain-of-thought markers from custom provider output#8633mvanhorn wants to merge 1 commit intoaaif-goose:mainfrom
Conversation
…output Custom OpenAI-compatible providers (DeepSeek R1, Qwen, etc.) embed reasoning in `<think>...</think>` / `<thinking>...</thinking>` blocks inline in the content field rather than using the structured `reasoning_content` field. Goose surfaced these blocks directly in chat, cluttering the UI. Adds a streaming-safe `ThinkFilter` / `split_think_blocks` helper that separates inline think blocks from content. Wires it into the OpenAI response parser (both non-streaming and streaming paths), routing thinking into `MessageContent::thinking` and emitting cleaned text through `MessageContent::text`. Structured `reasoning_content` / `reasoning` fields still take precedence so we do not duplicate reasoning for providers that already send it as a first-class field. Streaming buffers partial tag prefixes (`<thi`, `<think`, `</thin`, etc.) until the decision can be made, so a chunk ending mid-tag never leaks characters that later turn out to be inside a think block. Unit tests cover the full-content splitter, streaming boundaries, case-insensitive tags, attributes, multiple blocks, non-think tags, and unterminated opens. Closes aaif-goose#8590
|
Apologies, this was created on the wrong fork. Re-submitting on block/goose. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8af5d16ed8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let name = buffer.get(name_start..idx).unwrap_or_default(); | ||
| let is_think = name.eq_ignore_ascii_case("think") || name.eq_ignore_ascii_case("thinking"); | ||
| if !is_think { |
There was a problem hiding this comment.
Validate think-tag boundaries before treating as reasoning
parse_think_tag treats any tag whose alphabetic prefix is think/thinking as a reasoning tag, even when the next character is part of a different tag name (for example <thinking-mode> or <think1>). Those opens are accepted, but matching closes like </thinking-mode> are not recognized by the close-tag parser, so inside_think can stay true and subsequent assistant text gets diverted into thinking (effectively hidden from normal output). Require a valid delimiter (>, whitespace, or /) immediately after the tag name before classifying it as a think tag.
Useful? React with 👍 / 👎.
| if !saw_structured_reasoning && !filtered.thinking.is_empty() { | ||
| content.push(MessageContent::thinking(filtered.thinking, "")); | ||
| } |
There was a problem hiding this comment.
Buffer inline think chunks until structured reasoning is known
Inline <think>...</think> content is emitted whenever saw_structured_reasoning is still false, but that flag only flips after a non-empty reasoning_content chunk has already been seen. In streams where inline-think text arrives first and structured reasoning arrives later, users will receive both (inline thinking first, structured thinking afterward), which violates the intended precedence behavior and reintroduces duplicate reasoning in the UI. Defer emitting filtered thinking until end-of-stream (or another point where structured reasoning absence is certain).
Useful? React with 👍 / 👎.
Summary
Custom OpenAI-compatible providers (DeepSeek R1, Qwen-style proxies, LLMProxy, etc.) emit reasoning inline in the content field as
<think>...</think>or<thinking>...</thinking>blocks rather than using the structuredreasoning_contentJSON field. Goose surfaced those blocks directly in chat, cluttering the UI and sometimes triggering repetition.This change adds a streaming-safe
ThinkFilter/split_think_blockshelper incrates/goose/src/providers/base.rsand wires it into both the non-streaming (response_to_message) and streaming (response_to_streaming_message) paths incrates/goose/src/providers/formats/openai.rs. Inline thinking is routed intoMessageContent::thinking; cleaned content flows throughMessageContent::text. Structuredreasoning_content/reasoningfields still take precedence, so providers that already send reasoning as a first-class field do not get duplicated output.Streaming semantics: when a chunk ends on a partial tag prefix (
<thi,<think,</thinkin, etc.), the filter holds that suffix until it can decide whether it belongs to a think block or is regular content. A chunk ending mid-tag will never leak characters to the UI that later turn out to be inside a think block.Testing
cargo fmt --check: cleancargo clippy -p goose --all-targets -- -D warnings: cleancargo test -p goose --lib providers::base::tests: 23 passed (includes new think filter tests)cargo test -p goose --lib providers::formats::openai::tests: all passed (includes new regression tests)New unit tests cover:
<think>,<thinking>, attributes (<think class="x">), case-insensitive tags, multiple blocks, plain content, unterminated opens["<thi", "nk>x</thi", "nk>y"])<table>) are treated as contentreasoning_contentpath still wins when both are presentRelated Issues
Closes #8590
Prior art: zed's similar fix landed in zed-industries/zed#53488 ("Strip think tags from text stream") for an analogous UI-leakage issue.
This contribution was developed with AI assistance (Codex).