Fix: Trim AI "thinking" tags from Deepseek output #59

fyzanshaik · 2025-06-21T00:17:33Z

When using Deepseek models with lumen draft, the AI's internal "thinking" process (enclosed in ... tags) was included in the final output, cluttering the commit message.

To address this, I've implemented the following:

Added trim_thinking_tags option: A new boolean field trim_thinking_tags has been added to the draft section of lumen.config.json. Setting this to true enables the trimming functionality.
Robust commit_types deserialization: The commit_types field in DraftConfig was changed from String to HashMap<String, String>. This corrects a type mismatch between the JSON config and the Rust struct, improving config loading reliability. serde_json::to_string is now used to convert the HashMap back to JSON when building the AI prompt.
Output post-processing: If trim_thinking_tags is true, a regular expression now identifies and removes the ... blocks from the AI's raw response before it's displayed to the user.

These changes were specifically aimed at cleaning the AI's output format. No existing functionality of lumen is broken by these modifications; in fact, config loading should be more robust.

Summary by CodeRabbit

New Features
- Added a configuration option to automatically remove <think>...</think> tags from AI output.
- Improved configuration for commit types, now supporting a structured format for easier customization.
Bug Fixes
- Enhanced output formatting by trimming unwanted whitespace and tags when enabled.
Documentation
- Improved README formatting and clarity.
- Added documentation for the new configuration option and updated examples.
Chores
- Updated dependencies and .gitignore entries for better project management.

Signed-off-by: fyzanshaik <[email protected]>

coderabbitai · 2025-06-21T00:17:39Z

Walkthrough

The changes introduce a new configuration option, trim_thinking_tags, to control the removal of <think>...</think> blocks from AI-generated output. The commit_types field in the configuration is refactored from a JSON string to a native map. Supporting dependencies are added, relevant code is updated, and documentation and formatting improvements are made throughout the project.

Changes

File(s)	Change Summary
.gitignore	Added `lumen.config.json` to ignored files.
Cargo.toml	Added `regex` and `lazy_static` as new dependencies.
README.md	Improved formatting, added `"trim_thinking_tags"` config option, clarified YAML/JSON examples.
src/ai_prompt.rs	Serializes `commit_types` as JSON string for prompts; minor whitespace cleanup.
src/command/draft.rs	Implements trimming of `<think>...</think>` tags using a static regex, controlled by config flag.
src/config/configuration.rs	Changes `commit_types` from `String` to `HashMap`, adds `trim_thinking_tags` boolean, updates defaults.
src/main.rs	Removes an unnecessary blank line; no logic changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DraftCommand
    participant Config
    participant AI

    User->>DraftCommand: execute()
    DraftCommand->>Config: Load draft_config
    DraftCommand->>AI: Generate draft (with commit_types as JSON)
    AI-->>DraftCommand: Returns draft output (may include <think> tags)
    alt trim_thinking_tags enabled
        DraftCommand->>DraftCommand: Remove <think>...</think> blocks
        DraftCommand->>DraftCommand: Trim whitespace
    end
    DraftCommand->>User: Print final draft output

Poem

In the warren of code, a new path appears,
With tags that are trimmed and configs more clear.
Commit types now mapped, not tangled in string,
And prompts that are cleaner—what joy this will bring!
With regex and static, our drafts hop ahead,
A rabbit’s delight in the changes we’ve shed!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

.gitignore (1)

2-2: Confirm ignoring the primary config file
Ignoring lumen.config.json will prevent committing the actual configuration that your code depends on. If the intent is to let users maintain their own local settings, consider instead:

Committing a template (e.g., lumen.config.example.json) with default values.

Ignoring only the real file (e.g., using /lumen.config.json).
This ensures new contributors have a starting point while keeping personal overrides out of VCS.
Cargo.toml (1)
22-23: Consider using std::sync::OnceLock instead of lazy_static for better performance.

The dependencies are appropriate for the new functionality. However, consider replacing lazy_static with std::sync::OnceLock (stable since Rust 1.70) or std::sync::LazyLock (stable since Rust 1.80) for better performance and to reduce dependencies.
-lazy_static = "1.5.0"
Then in src/command/draft.rs, replace the lazy_static usage with:
use std::sync::OnceLock;

static THINK_TAG_REGEX: OnceLock<Regex> = OnceLock::new();

fn get_think_tag_regex() -> &'static Regex {
    THINK_TAG_REGEX.get_or_init(|| Regex::new(r"(?s)<think>.*?</think>\s*").unwrap())
}
src/config/configuration.rs (1)
80-86: Remove commented-out code.

The commented-out deserialize_commit_types function should be removed rather than left as comments, as it's no longer needed after the type change.
-// fn deserialize_commit_types<'de, D>(deserializer: D) -> Result<String, D::Error>
-// where
-//     D: Deserializer<'de>,
-// {
-//     let commit_types_map: HashMap<String, String> = HashMap::deserialize(deserializer)?;
-//     serde_json::to_string(&commit_types_map).map_err(serde::de::Error::custom)
-// }
src/command/draft.rs (1)
27-31: Optimize string operations for better performance.

The thinking tag removal logic is correct, but can be optimized to reduce unnecessary string allocations.
-        let mut result = provider.draft(self).await?;
-        if self.draft_config.trim_thinking_tags {
-            result = THINK_TAG_REGEX.replace_all(&result, "").to_string();
-            result = result.trim().to_string();
-        }
+        let mut result = provider.draft(self).await?;
+        if self.draft_config.trim_thinking_tags {
+            result = THINK_TAG_REGEX.replace_all(&result, "").trim().to_string();
+        }
This combines the operations and eliminates an unnecessary intermediate string allocation.
README.md (1)
55-55: Fix macOS capitalization.

The operating system should be written as "macOS" not "MacOS" according to Apple's official branding.
-#### Using Homebrew (MacOS and Linux)
+#### Using Homebrew (macOS and Linux)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3d067c and fc4dbbd.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

.gitignore (1 hunks)
Cargo.toml (1 hunks)
README.md (7 hunks)
src/ai_prompt.rs (5 hunks)
src/command/draft.rs (3 hunks)
src/config/configuration.rs (3 hunks)
src/main.rs (0 hunks)

💤 Files with no reviewable changes (1)

src/main.rs

🧰 Additional context used

🪛 LanguageTool

README.md

[grammar] ~55-~55: The operating system from Apple is written “macOS”.
Context: ... ### Installation #### Using Homebrew (MacOS and Linux) ```bash brew install jnsaha...

(MAC_OS)

🔇 Additional comments (7)

src/config/configuration.rs (2)

36-38: Excellent type safety improvement.

Changing commit_types from String to HashMap<String, String> improves type safety and eliminates the need for custom deserialization. The new trim_thinking_tags field is properly configured with a sensible default.

56-70: Well-structured default commit types.

The implementation properly initializes the HashMap with all standard conventional commit types. The structure is clear and maintainable.

src/ai_prompt.rs (2)

7-7: Good addition of required import.

The serde_json import is appropriately added to support the new serialization functionality.

126-127: Proper error handling for serialization.

The serialization of the HashMap back to JSON is correctly implemented with appropriate error handling. This maintains the existing prompt format while benefiting from the improved type safety in the configuration.

src/command/draft.rs (2)

4-5: Appropriate imports for the new functionality.

The regex and lazy_static imports are correctly added to support the thinking tag removal feature.

14-16: Well-crafted regex pattern.

The regex pattern (?s)<think>.*?</think>\s* is well-designed:

(?s) enables dotall mode to match newlines within thinking blocks

.*? uses non-greedy matching to avoid over-matching

\s* removes trailing whitespace after each tag

README.md (1)

224-226: Excellent documentation of new feature.

The documentation clearly explains the new trim_thinking_tags configuration option with appropriate context and default value. The comment explaining its purpose is helpful for users.

fyzanshaik · 2025-06-21T01:06:17Z

Fixed : #60

fyzanshaik · 2025-06-21T01:11:40Z

Trimming of thinking needs to be reflected for all commands just checked :(, if this is good to be merged I can work on making it globally executed in another PR! Else I am fine continuing in this

fyzanshaik added 3 commits June 21, 2025 05:25

feat(config): add trim thinking tags configuration

b4b3e9a

Signed-off-by: fyzanshaik <[email protected]>

refactor: migrate commit types to HashMap

b12dffe

Signed-off-by: fyzanshaik <[email protected]>

docs: Reformat and organize README for better readability

fc4dbbd

Signed-off-by: fyzanshaik <[email protected]>

coderabbitai bot reviewed Jun 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Trim AI "thinking" tags from Deepseek output #59

Fix: Trim AI "thinking" tags from Deepseek output #59

Uh oh!

fyzanshaik commented Jun 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Jun 21, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

fyzanshaik commented Jun 21, 2025

Uh oh!

fyzanshaik commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Trim AI "thinking" tags from Deepseek output #59

Are you sure you want to change the base?

Fix: Trim AI "thinking" tags from Deepseek output #59

Uh oh!

Conversation

fyzanshaik commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

fyzanshaik commented Jun 21, 2025

Uh oh!

fyzanshaik commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fyzanshaik commented Jun 21, 2025 •

edited

Loading

coderabbitai bot commented Jun 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)