feat(providers): strip chain-of-thought markers from custom provider output#8635
feat(providers): strip chain-of-thought markers from custom provider output#8635mvanhorn wants to merge 4 commits intoaaif-goose:mainfrom
Conversation
…output Custom OpenAI-compatible providers (DeepSeek R1, Qwen, etc.) embed reasoning in `<think>...</think>` / `<thinking>...</thinking>` blocks inline in the content field rather than using the structured `reasoning_content` field. Goose surfaced these blocks directly in chat, cluttering the UI. Adds a streaming-safe `ThinkFilter` / `split_think_blocks` helper that separates inline think blocks from content. Wires it into the OpenAI response parser (both non-streaming and streaming paths), routing thinking into `MessageContent::thinking` and emitting cleaned text through `MessageContent::text`. Structured `reasoning_content` / `reasoning` fields still take precedence so we do not duplicate reasoning for providers that already send it as a first-class field. Streaming buffers partial tag prefixes (`<thi`, `<think`, `</thin`, etc.) until the decision can be made, so a chunk ending mid-tag never leaks characters that later turn out to be inside a think block. Unit tests cover the full-content splitter, streaming boundaries, case-insensitive tags, attributes, multiple blocks, non-think tags, and unterminated opens. Closes aaif-goose#8590
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8af5d16ed8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Still resolving routing. |
|
Apologies for the noise - I misread my tooling and closed this thinking it was misrouted. block/goose and aaif-goose/goose resolve to the same repo (transfer). Reopening. |
|
Thanks for this @mvanhorn -- can you address the Codex code review feedback (either implement the requested changes or reply to the comments briefly explaining why it's invalid/unnecessary). |
Two fixes to the OpenAI-format provider path, per Codex P2 review on aaif-goose#8635: - base.rs parse_think_tag: require a real tag boundary (>, /, or whitespace) after the name before classifying as ThinkTag::Open. Tags like <thinking-mode> or <thinking123> were being matched as think tags because the parser scanned blindly for > after matching 'think'/'thinking'. Adds regression tests for <thinking-mode>, <thinking123>, <thinker>, and <think data-source="x">. - openai.rs response_to_streaming_message: buffer inline <think>...</think> output in pending_inline_thinking until end-of-stream or the tool-call flush. If reasoning_content arrives in any later chunk, saw_structured_reasoning flips and the buffered inline thinking is discarded so structured reasoning wins cleanly. Previously, inline think chunks were yielded immediately, so a later reasoning_content chunk produced duplicated reasoning. New streaming test covers the race.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06d806af73
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Without this, `<thinking-mode>` or `<thinking123>` would be classified as a | ||
| // think tag and stripped from normal content. | ||
| let valid_open_boundary = match bytes.get(idx) { | ||
| Some(&b) => b == b'>' || b == b'/' || b.is_ascii_whitespace(), |
There was a problem hiding this comment.
Treat self-closing tags as closed
The parser accepts '/' as a valid boundary after think/thinking, but open tags are always emitted as ThinkTag::Open, so <think/> increments think_depth and never closes it. In that case, all following text is routed to thinking and removed from visible assistant text (e.g., <think/>Visible answer becomes hidden reasoning). This is a data-loss regression for providers that emit self-closing tags, so the parser should either reject /> as an open-think marker or treat it as an immediate open+close.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I've not seen an empty think tag rendered as a self-closing tag (<think/>) before, but here codex is quite confidently saying "for providers that emit self-closing tags" implying that there are some providers that do this. It might be worth being defensive here @mvanhorn?
|
Thanks @jh-block. Both Codex P2 items addressed in 06d806a:
Verified: 5 think-filter tests + 58 openai format tests pass, cargo fmt + cargo clippy -D warnings clean. |
A self-closing `<think/>` used to be classified as `ThinkTag::Open` because parse_think_tag looped until `>` without distinguishing the trailing `/`. That incremented think_depth and routed every subsequent chunk into the thinking bucket until the end of the stream, silently dropping the provider's visible output. Add a `ThinkTag::SelfClosing` variant, detect the trailing `/` before `>`, and treat it as no-op (consume the tag text, leave think_depth and inside_think unchanged). Works for `<think/>`, `<think />`, `<thinking/>`, and `<think data-source="x"/>`. New regression tests in base.rs cover the no-op cases and verify a self-closing tag inside an open block does not pre-close it.
|
@jh-block fair, defensive handling added in 7940020. parse_think_tag now distinguishes self-closing ( Regression tests in base.rs cover:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7940020b67
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| static OPEN_RE: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r"(?is)^<(?:t(?:h(?:i(?:n(?:k(?:i(?:n(?:g)?)?)?)?)?)?)?)(?:\s[^>]*)?$").unwrap() | ||
| }); |
There was a problem hiding this comment.
Recognize
<think/ as an incomplete self-closing think tag
is_possible_partial_think_tag does not match the valid in-progress suffix <think/ (or <thinking/), so when a stream chunk splits a self-closing tag at that boundary (e.g. "...<think/" then ">..."), the first chunk is emitted as visible text and the marker is never stripped. That reintroduces leakage for one common chunk split pattern and violates the streaming guarantee that mid-tag boundaries are buffered until disambiguated.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@mvanhorn looks like there's one last issue to resolve here (the streaming parser can emit <think/ or <thinking/ if that appears at the end of a chunk). If you can tweak the regex to handle that I think then we should be able to merge this
The streaming partial-tag regex allowed `<think<whitespace>...` as a partial open tag but rejected a bare trailing `/`. A chunk that ended mid self-closing tag (e.g. `<think/` before `>` arrived in the next chunk) was not buffered, so the `<think/` prefix leaked into visible content. Extend OPEN_RE with the `/` alternative and add a regression test that exercises three partial-boundary cases: `<think/` + `>`, `<thinking/` + `>`, and `<think ` + `/>`. All three now buffer correctly and never leak `<` into visible output.
|
@jh-block addressed in bb9f824. Extended OPEN_RE with a trailing Root cause: the old regex accepted Added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb9f82416f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| while let Some(&byte) = bytes.get(idx) { | ||
| if byte == b'>' { | ||
| let kind = if last_non_ws == Some(b'/') { |
There was a problem hiding this comment.
Skip quoted
> when scanning think-tag end
The open-tag parser treats the first > byte as the end of a <think...> tag without tracking quoted attribute values, so inputs like <think data="a>b"/>Visible are misparsed: the tag closes early, then trailing bytes and even visible answer text can be routed into thinking or dropped. This is a data-corruption path for providers/models that emit > inside attribute values (or tool-generated XML-like tags), so the scanner should ignore > while inside quoted strings.
Useful? React with 👍 / 👎.
Summary
Custom OpenAI-compatible providers (DeepSeek R1, Qwen-style proxies, LLMProxy, etc.) emit reasoning inline in the content field as
<think>...</think>or<thinking>...</thinking>blocks rather than using the structuredreasoning_contentJSON field. Goose surfaced those blocks directly in chat, cluttering the UI and sometimes triggering repetition.This change adds a streaming-safe
ThinkFilter/split_think_blockshelper incrates/goose/src/providers/base.rsand wires it into both the non-streaming (response_to_message) and streaming (response_to_streaming_message) paths incrates/goose/src/providers/formats/openai.rs. Inline thinking is routed intoMessageContent::thinking; cleaned content flows throughMessageContent::text. Structuredreasoning_content/reasoningfields still take precedence, so providers that already send reasoning as a first-class field do not get duplicated output.Streaming semantics: when a chunk ends on a partial tag prefix (
<thi,<think,</thinkin, etc.), the filter holds that suffix until it can decide whether it belongs to a think block or is regular content. A chunk ending mid-tag will never leak characters to the UI that later turn out to be inside a think block.Testing
cargo fmt --check: cleancargo clippy -p goose --all-targets -- -D warnings: cleancargo test -p goose --lib providers::base::tests: 23 passed (includes new think filter tests)cargo test -p goose --lib providers::formats::openai::tests: all passed (includes new regression tests)New unit tests cover:
<think>,<thinking>, attributes (<think class="x">), case-insensitive tags, multiple blocks, plain content, unterminated opens["<thi", "nk>x</thi", "nk>y"])<table>) are treated as contentreasoning_contentpath still wins when both are presentRelated Issues
Closes #8590
Prior art: zed's similar fix landed in zed-industries/zed#53488 ("Strip think tags from text stream") for an analogous UI-leakage issue.
This contribution was developed with AI assistance (Codex).