feat(providers): strip chain-of-thought markers from custom provider output#8635
feat(providers): strip chain-of-thought markers from custom provider output#8635mvanhorn wants to merge 1 commit intoaaif-goose:mainfrom
Conversation
…output Custom OpenAI-compatible providers (DeepSeek R1, Qwen, etc.) embed reasoning in `<think>...</think>` / `<thinking>...</thinking>` blocks inline in the content field rather than using the structured `reasoning_content` field. Goose surfaced these blocks directly in chat, cluttering the UI. Adds a streaming-safe `ThinkFilter` / `split_think_blocks` helper that separates inline think blocks from content. Wires it into the OpenAI response parser (both non-streaming and streaming paths), routing thinking into `MessageContent::thinking` and emitting cleaned text through `MessageContent::text`. Structured `reasoning_content` / `reasoning` fields still take precedence so we do not duplicate reasoning for providers that already send it as a first-class field. Streaming buffers partial tag prefixes (`<thi`, `<think`, `</thin`, etc.) until the decision can be made, so a chunk ending mid-tag never leaks characters that later turn out to be inside a think block. Unit tests cover the full-content splitter, streaming boundaries, case-insensitive tags, attributes, multiple blocks, non-think tags, and unterminated opens. Closes aaif-goose#8590
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8af5d16ed8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !saw_structured_reasoning && !filtered.thinking.is_empty() { | ||
| content.push(MessageContent::thinking(filtered.thinking, "")); | ||
| } |
There was a problem hiding this comment.
Defer inline-think emission until structured reasoning decision
This branch emits filtered.thinking immediately when saw_structured_reasoning is false, but that flag only reflects chunks already seen. If a provider streams <think>...</think> in early content chunks and sends reasoning_content in a later chunk, the inline reasoning has already been yielded and cannot be retracted, so users get duplicated reasoning even though structured reasoning is intended to take precedence. Buffering inline-think output until end-of-stream (or another definitive point) would avoid this regression.
Useful? React with 👍 / 👎.
| while let Some(byte) = bytes.get(idx) { | ||
| if *byte == b'>' { | ||
| return Some((ThinkTag::Open, idx + 1)); | ||
| } |
There was a problem hiding this comment.
Validate think-tag boundary before treating tag as
<think>
After matching the tag name, this open-tag parser accepts any following characters until >, so tags like <thinking-mode> or <thinking123> are incorrectly classified as think blocks and stripped from normal content. That causes data loss whenever model output includes HTML/custom tags whose names start with think/thinking but are not actual reasoning markers. Require a real tag boundary (>, whitespace, or /) immediately after the name before returning ThinkTag::Open.
Useful? React with 👍 / 👎.
|
Still resolving routing. |
|
Apologies for the noise - I misread my tooling and closed this thinking it was misrouted. block/goose and aaif-goose/goose resolve to the same repo (transfer). Reopening. |
Summary
Custom OpenAI-compatible providers (DeepSeek R1, Qwen-style proxies, LLMProxy, etc.) emit reasoning inline in the content field as
<think>...</think>or<thinking>...</thinking>blocks rather than using the structuredreasoning_contentJSON field. Goose surfaced those blocks directly in chat, cluttering the UI and sometimes triggering repetition.This change adds a streaming-safe
ThinkFilter/split_think_blockshelper incrates/goose/src/providers/base.rsand wires it into both the non-streaming (response_to_message) and streaming (response_to_streaming_message) paths incrates/goose/src/providers/formats/openai.rs. Inline thinking is routed intoMessageContent::thinking; cleaned content flows throughMessageContent::text. Structuredreasoning_content/reasoningfields still take precedence, so providers that already send reasoning as a first-class field do not get duplicated output.Streaming semantics: when a chunk ends on a partial tag prefix (
<thi,<think,</thinkin, etc.), the filter holds that suffix until it can decide whether it belongs to a think block or is regular content. A chunk ending mid-tag will never leak characters to the UI that later turn out to be inside a think block.Testing
cargo fmt --check: cleancargo clippy -p goose --all-targets -- -D warnings: cleancargo test -p goose --lib providers::base::tests: 23 passed (includes new think filter tests)cargo test -p goose --lib providers::formats::openai::tests: all passed (includes new regression tests)New unit tests cover:
<think>,<thinking>, attributes (<think class="x">), case-insensitive tags, multiple blocks, plain content, unterminated opens["<thi", "nk>x</thi", "nk>y"])<table>) are treated as contentreasoning_contentpath still wins when both are presentRelated Issues
Closes #8590
Prior art: zed's similar fix landed in zed-industries/zed#53488 ("Strip think tags from text stream") for an analogous UI-leakage issue.
This contribution was developed with AI assistance (Codex).