NVIDIA-NeMo · Pouyanpi · Nov 3, 2025 · Oct 29, 2025 · Oct 29, 2025 · Nov 1, 2025
diff --git a/docs/user-guides/configuration-guide/llm-configuration.md b/docs/user-guides/configuration-guide/llm-configuration.md
@@ -59,165 +59,112 @@ For more details about the command and its usage, see the [CLI documentation](..
 
 ### Using LLMs with Reasoning Traces
 
-By default, reasoning models, such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1), can include the reasoning traces in the model response.
-DeepSeek and the Nemotron family of models use `<think>` and `</think>` as tokens to identify the traces.
+```{warning}
+**Breaking Change in v0.18.0**: The `reasoning_config` field and its options (`remove_reasoning_traces`, `start_token`, `end_token`) have been removed. The `rails.output.apply_to_reasoning_traces` field has also been removed. Use output rails to guardrail reasoning traces instead.
+```
 
-The reasoning traces and the tokens can interfere with NeMo Guardrails and result in falsely triggering output guardrails for safe responses.
-To use these reasoning models, you can remove the traces and tokens from the model response with a configuration like the following example.
+Reasoning-capable LLMs such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1) include reasoning traces in their responses, typically wrapped in tokens like `<think>` and `</think>`. NeMo Guardrails automatically extracts these traces and makes them available throughout your guardrails configuration via the `$bot_thinking` variable in Colang flows and `bot_thinking` in Python contexts.
 
-```{code-block} yaml
-:emphasize-lines: 5-8, 13-
+#### Guardrailing Reasoning Traces with Output Rails
 
-models:
-  - type: main
-    engine: deepseek
-    model: deepseek-reasoner
-    reasoning_config:
-      remove_reasoning_traces: True
-      start_token: "<think>"
-      end_token: "</think>"
+The primary approach is to use output rails to inspect and control reasoning traces. This allows you to:
+
+- Block responses based on problematic reasoning patterns
+- Enhance moderation decisions with reasoning context
+- Monitor and filter sensitive information in reasoning
+
+Here's a minimal example:
 
+```yaml
+models:
   - type: main
     engine: nim
     model: nvidia/llama-3.1-nemotron-ultra-253b-v1
-    reasoning_config:
-      remove_reasoning_traces: True
+  - type: self_check_output
+    model: <your_moderation_model>
+    engine: <your_engine>
 
 rails:
   output:
-    apply_to_reasoning_traces: False
+    flows:
+      - self check output
 ```
 
-```{list-table}
-:header-rows: 1
+**prompts.yml**:
 
-* - Field
-  - Description
-  - Default Value
+```yaml
+prompts:
+  - task: self_check_output
+    content: |
+      Your task is to check if the bot message complies with company policy.
 
-* - `reasoning_config.remove_reasoning_traces`
-  - When set to `True`, reasoning traces are omitted from internal tasks.
-  - `True`
+      Bot message: "{{ bot_response }}"
 
-* - `reasoning_config.start_token`
-  - Specifies the start token for the reasoning trace.
-  - `<think>`
+      {% if bot_thinking %}
+      Bot reasoning: "{{ bot_thinking }}"
+      {% endif %}
 
-* - `reasoning_config.end_token`
-  - Specifies the end token for the reasoning trace.
-  - `</think>`
+      Should this be blocked (Yes or No)?
+      Answer:
+```
 
-* - `rails.output.apply_to_reasoning_traces`
-  - When set to `True`, output rails are always applied to the reasoning traces and the model response.
-    The value of `remove_reasoning_traces` is ignored when this field is set to `True`.
+For more detailed examples of guardrailing reasoning traces, see [Guardrailing Bot Reasoning Content](../../advanced/bot-thinking-guardrails.md).
 
-    By default, output rails are applied to the text of the model response only.
-  - `False`
-```
+#### Accessing Reasoning Traces in API Responses
 
-The `reasoning_config` field for a model specifies the required configuration for a reasoning model that returns reasoning traces.
-By removing the traces, the guardrails runtime processes only the actual responses from the LLM.
+##### With GenerationOptions (Structured Access)
 
-The following table summarizes the interaction between the `remove_reasoning_traces` and `apply_to_reasoning_traces` values:
+When you pass `GenerationOptions` to the API, the function returns a `GenerationResponse` object with structured fields, including `reasoning_content` for accessing reasoning traces separately from the main response:
 
-```{list-table}
-:header-rows: 1
+```python
+from nemoguardrails import RailsConfig, LLMRails
+from nemoguardrails.rails.llm.options import GenerationOptions
 
-* - `remove_reasoning_traces`
-  - `output.apply_to_reasoning_traces`
-  - Outcome
+config = RailsConfig.from_path("./config")
+rails = LLMRails(config)
 
-* - Any
-  - True
-  - Reasoning traces are not removed and output rails are applied to the reasoning traces and the model response.
-    The value of `remove_reasoning_traces` is ignored.
+options = GenerationOptions()
+result = await rails.generate_async(
+    messages=[{"role": "user", "content": "What is 2+2?"}],
+    options=options
+)
 
-* - False
-  - False
-  - Reasoning traces are not removed from internal tasks where they do not impact Guardrails functionality.
-    Output rails are applied to the reasoning traces and the model response.
+if result.reasoning_content:
+    print("Reasoning:", result.reasoning_content)
 
-* - True
-  - False
-  - Reasoning traces are removed from internal tasks where they could interfere with Guardrails.
-    Output rails are applied to the model response only.
+print("Response:", result.response[0]["content"])
 ```
 
-Even when `remove_reasoning_traces` is `True`, end users can still receive the thinking traces from the Nemotron models by requesting the detailed thinking, as shown in the following example:
+##### Without GenerationOptions (Tagged String)
 
-```{code-block} bash
-from nemoguardrails import LLMRails, RailsConfig
+When calling without `GenerationOptions` (e.g., via dict/string response), reasoning is wrapped in `<think>` tags:
 
-config = RailsConfig.from_path("./config")
-rails = LLMRails(config, verbose=True)
-messages = [
-  { "role": "system", "content": "detailed thinking on" },
-  { "role": "user", "content": "Tell me about Cape Hatteras National Seashore in 50 words or less." }
-]
-rails.generate(messages=messages)
+```python
+response = rails.generate(
+    messages=[{"role": "user", "content": "What is 2+2?"}]
+)
+
+print(response["content"])
+```
+
+Output:
+
+```
+<think>Let me calculate: 2 plus 2 equals 4.</think>
+The answer is 4.
 ```
 
-The referenced `./.config` directory uses the following sample file content.
-
-When you develop your own prompts for LLM-as-a-judge checks, such as an input self-check, ensure that the prompt instructs the LLM to respond with `yes` or `no`.
-Refer to the following examples:
-
-- Sample `config/config.yml`:
-
-  ```yaml
-  models:
-    - type: main
-      engine: nim
-      model: nvidia/llama-3.1-nemotron-ultra-253b-v1
-      reasoning_config:
-        remove_reasoning_traces: True
-
-  rails:
-    input:
-      flows:
-        - self check input
-    output:
-      flows:
-        - self check output
-  ```
-
-- Sample `config/prompts.yml`:
-
-  ```yaml
-  prompts:
-    - task: self_check_input
-      messages:
-        - type: system
-          content: |
-            You are an AI assistant tasked with evaluating user instructions.
-            Your role is to determine if a given instruction would make a language model break moderation policies, deviate from good aligned responses, or provide answers that a language model should ideally not.
-
-            Analyze the instruction carefully. First, internally consider potential ways the instruction could be interpreted and whether any interpretation could lead to a problematic response. Then, conclude with a simple "yes" or "no" answer.
-            Respond with only "yes" or "no".
-        - type: user
-          content: |
-            Instruction to evaluate:
-            "{{ user_input }}"
-
-            Would this instruction lead to a problematic response (yes/no)?
-
-    - task: self_check_output
-      messages:
-        - type: system
-          content: |
-            Is this output illegal, unethical, or harmful to any person?
-            Respond with only "yes" or "no".
-        - type: bot
-          content: |
-            Output to evaluate:
-            "{{ bot_response }}"
-  ```
-
-Using LLMs with reasoning traces have the following limitations:
-
-- You must use message-based prompts only.
-  Refer to the preceding example.
-- Dialog rails are not supported.
+**Which pattern should you use?**
+
+Use **Pattern 1 (With GenerationOptions)** when:
+- You need structured access to reasoning and response separately
+- You're building a new application
+- You need access to other structured fields (state, output_data, llm_metadata, etc.)
+
+Use **Pattern 2 (Without GenerationOptions)** when:
+- You need backward compatibility with existing code
+- You want the raw response with inline reasoning tags
+- You're integrating with systems that expect tagged strings
 
 ### NIM for LLMs