feat(providers): strip chain-of-thought markers from custom provider output by mvanhorn · Pull Request #8635 · aaif-goose/goose

mvanhorn · 2026-04-17T23:26:29Z

Summary

Custom OpenAI-compatible providers (DeepSeek R1, Qwen-style proxies, LLMProxy, etc.) emit reasoning inline in the content field as <think>...</think> or <thinking>...</thinking> blocks rather than using the structured reasoning_content JSON field. Goose surfaced those blocks directly in chat, cluttering the UI and sometimes triggering repetition.

This change adds a streaming-safe ThinkFilter / split_think_blocks helper in crates/goose/src/providers/base.rs and wires it into both the non-streaming (response_to_message) and streaming (response_to_streaming_message) paths in crates/goose/src/providers/formats/openai.rs. Inline thinking is routed into MessageContent::thinking; cleaned content flows through MessageContent::text. Structured reasoning_content / reasoning fields still take precedence, so providers that already send reasoning as a first-class field do not get duplicated output.

Streaming semantics: when a chunk ends on a partial tag prefix (<thi, <think, </thinkin, etc.), the filter holds that suffix until it can decide whether it belongs to a think block or is regular content. A chunk ending mid-tag will never leak characters to the UI that later turn out to be inside a think block.

Testing

cargo fmt --check: clean
cargo clippy -p goose --all-targets -- -D warnings: clean
cargo test -p goose --lib providers::base::tests: 23 passed (includes new think filter tests)
cargo test -p goose --lib providers::formats::openai::tests: all passed (includes new regression tests)

New unit tests cover:

Full-content splitter for <think>, <thinking>, attributes (<think class="x">), case-insensitive tags, multiple blocks, plain content, unterminated opens
Streaming across partial tag boundaries (chunks like ["<thi", "nk>x</thi", "nk>y"])
Non-think tags (<table>) are treated as content
Structured reasoning_content path still wins when both are present

Related Issues

Closes #8590

Prior art: zed's similar fix landed in zed-industries/zed#53488 ("Strip think tags from text stream") for an analogous UI-leakage issue.

This contribution was developed with AI assistance (Codex).

…output Custom OpenAI-compatible providers (DeepSeek R1, Qwen, etc.) embed reasoning in `<think>...</think>` / `<thinking>...</thinking>` blocks inline in the content field rather than using the structured `reasoning_content` field. Goose surfaced these blocks directly in chat, cluttering the UI. Adds a streaming-safe `ThinkFilter` / `split_think_blocks` helper that separates inline think blocks from content. Wires it into the OpenAI response parser (both non-streaming and streaming paths), routing thinking into `MessageContent::thinking` and emitting cleaned text through `MessageContent::text`. Structured `reasoning_content` / `reasoning` fields still take precedence so we do not duplicate reasoning for providers that already send it as a first-class field. Streaming buffers partial tag prefixes (`<thi`, `<think`, `</thin`, etc.) until the decision can be made, so a chunk ending mid-tag never leaks characters that later turn out to be inside a think block. Unit tests cover the full-content splitter, streaming boundaries, case-insensitive tags, attributes, multiple blocks, non-think tags, and unterminated opens. Closes aaif-goose#8590

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8af5d16ed8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

mvanhorn · 2026-04-17T23:30:40Z

Still resolving routing.

mvanhorn · 2026-04-18T00:27:28Z

Apologies for the noise - I misread my tooling and closed this thinking it was misrouted. block/goose and aaif-goose/goose resolve to the same repo (transfer). Reopening.

jh-block · 2026-04-20T11:49:20Z

Thanks for this @mvanhorn -- can you address the Codex code review feedback (either implement the requested changes or reply to the comments briefly explaining why it's invalid/unnecessary).

Two fixes to the OpenAI-format provider path, per Codex P2 review on aaif-goose#8635: - base.rs parse_think_tag: require a real tag boundary (>, /, or whitespace) after the name before classifying as ThinkTag::Open. Tags like <thinking-mode> or <thinking123> were being matched as think tags because the parser scanned blindly for > after matching 'think'/'thinking'. Adds regression tests for <thinking-mode>, <thinking123>, <thinker>, and <think data-source="x">. - openai.rs response_to_streaming_message: buffer inline <think>...</think> output in pending_inline_thinking until end-of-stream or the tool-call flush. If reasoning_content arrives in any later chunk, saw_structured_reasoning flips and the buffered inline thinking is discarded so structured reasoning wins cleanly. Previously, inline think chunks were yielded immediately, so a later reasoning_content chunk produced duplicated reasoning. New streaming test covers the race.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06d806af73

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T14:17:31Z

+    // Without this, `<thinking-mode>` or `<thinking123>` would be classified as a
+    // think tag and stripped from normal content.
+    let valid_open_boundary = match bytes.get(idx) {
+        Some(&b) => b == b'>' || b == b'/' || b.is_ascii_whitespace(),


Treat self-closing tags as closed

The parser accepts '/' as a valid boundary after think/thinking, but open tags are always emitted as ThinkTag::Open, so <think/> increments think_depth and never closes it. In that case, all following text is routed to thinking and removed from visible assistant text (e.g., <think/>Visible answer becomes hidden reasoning). This is a data-loss regression for providers that emit self-closing tags, so the parser should either reject /> as an open-think marker or treat it as an immediate open+close.

Useful? React with 👍 / 👎.

I've not seen an empty think tag rendered as a self-closing tag (<think/>) before, but here codex is quite confidently saying "for providers that emit self-closing tags" implying that there are some providers that do this. It might be worth being defensive here @mvanhorn?

mvanhorn · 2026-04-20T14:21:48Z

Thanks @jh-block. Both Codex P2 items addressed in 06d806a:

parse_think_tag in crates/goose/src/providers/base.rs: added a real-boundary check (>, /, whitespace) after the name match, so <thinking-mode>, <thinking123>, <thinker> no longer get classified as think tags and stripped. New regression tests in base.rs cover that plus <think data-source="x"> so tags with attributes still work.
response_to_streaming_message in crates/goose/src/providers/formats/openai.rs: inline <think>...</think> chunks now accumulate into pending_inline_thinking and are only flushed at end-of-stream (or at the tool-call flush block). When reasoning_content arrives in any later chunk, saw_structured_reasoning flips and the buffer is cleared, so structured reasoning wins cleanly without duplicated output. New streaming test test_streaming_suppresses_inline_think_when_structured_reasoning_follows covers the race.

Verified: 5 think-filter tests + 58 openai format tests pass, cargo fmt + cargo clippy -D warnings clean.

A self-closing `<think/>` used to be classified as `ThinkTag::Open` because parse_think_tag looped until `>` without distinguishing the trailing `/`. That incremented think_depth and routed every subsequent chunk into the thinking bucket until the end of the stream, silently dropping the provider's visible output. Add a `ThinkTag::SelfClosing` variant, detect the trailing `/` before `>`, and treat it as no-op (consume the tag text, leave think_depth and inside_think unchanged). Works for `<think/>`, `<think />`, `<thinking/>`, and `<think data-source="x"/>`. New regression tests in base.rs cover the no-op cases and verify a self-closing tag inside an open block does not pre-close it.

mvanhorn · 2026-04-21T17:41:13Z

@jh-block fair, defensive handling added in 7940020.

parse_think_tag now distinguishes self-closing (<think/>, <think/> with attrs, <thinking/>, <think />) from plain open. I added a ThinkTag::SelfClosing variant that is treated as a no-op - the tag is consumed from visible output (so <think/> doesn't leak to the user), but think_depth and inside_think are left unchanged, so the rest of the stream isn't silently swallowed into the thinking bucket.

Regression tests in base.rs cover:

<think/>, <think />, <thinking/>, <think data-source="x"/> -> content preserved, thinking empty
self-closing inside an open <think> block doesn't pre-close it (depth stays at 1 until the real </think>)
streaming case: content after <think/> across multiple chunks stays visible

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7940020b67

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T17:45:21Z

+    static OPEN_RE: LazyLock<Regex> = LazyLock::new(|| {
+        Regex::new(r"(?is)^<(?:t(?:h(?:i(?:n(?:k(?:i(?:n(?:g)?)?)?)?)?)?)?)(?:\s[^>]*)?$").unwrap()
+    });


Recognize <think/ as an incomplete self-closing think tag

is_possible_partial_think_tag does not match the valid in-progress suffix <think/ (or <thinking/), so when a stream chunk splits a self-closing tag at that boundary (e.g. "...<think/" then ">..."), the first chunk is emitted as visible text and the marker is never stripped. That reintroduces leakage for one common chunk split pattern and violates the streaming guarantee that mid-tag boundaries are buffered until disambiguated.

Useful? React with 👍 / 👎.

@mvanhorn looks like there's one last issue to resolve here (the streaming parser can emit <think/ or <thinking/ if that appears at the end of a chunk). If you can tweak the regex to handle that I think then we should be able to merge this

The streaming partial-tag regex allowed `<think<whitespace>...` as a partial open tag but rejected a bare trailing `/`. A chunk that ended mid self-closing tag (e.g. `<think/` before `>` arrived in the next chunk) was not buffered, so the `<think/` prefix leaked into visible content. Extend OPEN_RE with the `/` alternative and add a regression test that exercises three partial-boundary cases: `<think/` + `>`, `<thinking/` + `>`, and `<think ` + `/>`. All three now buffer correctly and never leak `<` into visible output.

mvanhorn · 2026-04-22T23:10:08Z

@jh-block addressed in bb9f824. Extended OPEN_RE with a trailing / alternative so <think/ and <thinking/ at a chunk boundary buffer until the > arrives in the next chunk.

Root cause: the old regex accepted <think<whitespace>... but not a bare trailing /, so a chunk ending mid self-closing tag fell out of is_possible_partial_think_tag and leaked the <think/ prefix into visible content.

Added test_think_filter_streaming_across_self_closing_boundary covering three partial-boundary cases: <think/ + >, <thinking/ + >, and <think + />. Asserts no < leaks into content and thinking stays empty (self-closing = no payload). Verified with cargo test -p goose --lib providers::base::tests - 29 tests pass, clippy/fmt clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb9f82416f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-22T23:14:35Z

+    while let Some(&byte) = bytes.get(idx) {
+        if byte == b'>' {
+            let kind = if last_non_ws == Some(b'/') {


Skip quoted > when scanning think-tag end

The open-tag parser treats the first > byte as the end of a <think...> tag without tracking quoted attribute values, so inputs like <think data="a>b"/>Visible are misparsed: the tag closes early, then trailing bytes and even visible answer text can be routed into thinking or dropped. This is a data-corruption path for providers/models that emit > inside attribute values (or tool-generated XML-like tags), so the scanner should ignore > while inside quoted strings.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread crates/goose/src/providers/formats/openai.rs

Comment thread crates/goose/src/providers/base.rs Outdated

mvanhorn closed this Apr 17, 2026

mvanhorn reopened this Apr 18, 2026

jh-block self-assigned this Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 22, 2026

View reviewed changes

Conversation

mvanhorn commented Apr 17, 2026

Summary

Testing

Related Issues

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mvanhorn commented Apr 17, 2026

Uh oh!

mvanhorn commented Apr 18, 2026

Uh oh!

jh-block commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

jh-block Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Apr 20, 2026

Uh oh!

mvanhorn commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

jh-block Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants